Naïve Bayes Classifier
Naïve Bayes is a conditional probability model: given a problem instance to be
classified, represented by a representing
some
features
(independent variables), it assigns to this instance probabilities
for
each of
possible
outcomes or classes
.
The problem with
the above formulation is that if the number of features is large or if
a feature can take on a large number of values, then basing such a model on probability
tables is infeasible. The model must therefore be reformulated to make it more
tractable. Using Bayes' theorem, the conditional probability can be decomposed
as
In plain English,
using Bayesian probability terminology, the above equation can be written as
In
practice, there is interest only in the numerator of that fraction, because the
denominator does not depend on and
the values of the features
are given, so that
the denominator is effectively constant. The numerator is equivalent to the
joint probability model
Which can be rewritten
as follows, using the chain rule for repeated applications of the definition of
conditional probability
Now
the "Naïve" conditional independence assumptions come into play:
assume that all features in are
mutually independent, conditional on the category
.
Under this assumption,
Thus, the joint
model can be expressed as
This
means that under the above independence assumptions, the conditional
distribution over the class is:
where the evidence

Constructing a classifier from the probability model
The Naïve Bayes classifier combines this model with a decision rule. One common rule is to pick the hypothesis that is most probable so as to this minimize the probability of misclassification; this is known as the maximum a posteriori or MAP decision rule.
The corresponding
classifier, a Bayes classifier, is the function that assigns a class label for
some
as follows: