Machine Learning - Deep Learning: Navie Bayes

Naïve Bayes Classifier

Naïve Bayes is a conditional probability model: given a problem instance to be classified, represented by a
representing some features (independent variables), it assigns to this instance probabilities for each of possible outcomes or classes .

The problem with the above formulation is that if the number of features is large or if a feature can take on a large number of values, then basing such a model on probability tables is infeasible. The model must therefore be reformulated to make it more tractable. Using Bayes' theorem, the conditional probability can be decomposed as

In plain English, using Bayesian probability terminology, the above equation can be written as

In practice, there is interest only in the numerator of that fraction, because the denominator does not depend on and the values of the features are given, so that the denominator is effectively constant. The numerator is equivalent to the joint probability model

Which can be rewritten as follows, using the chain rule for repeated applications of the definition of conditional probability

Now the "Naïve" conditional independence assumptions come into play: assume that all features in are mutually independent, conditional on the category .

Under this assumption,

Thus, the joint model can be expressed as

where

denotes proportionality.

This means that under the above independence assumptions, the conditional distribution over the class is:

where the evidence

is a scaling factor dependent only on

Constructing a classifier from the probability model

The Naïve Bayes classifier combines this model with a decision rule. One common rule is to pick the hypothesis that is most probable so as to this minimize the probability of misclassification; this is known as the maximum a posteriori or MAP decision rule.

The corresponding classifier, a Bayes classifier, is the function that assigns a class label for some as follows: