Discriminative and Generative Models

The classification problem can be broken down into two seperate stages:

The inference stage: train data to learn a model for $p(C_k|x)$
The decision stage: use these posterior probabilities to make optimal class assignments

To solve the classification, there are actually three distinct approaches.

Generative Models

To solve the inference problem, we should determine the class-conditional densities $p(x|C_k)$ for each class $C_k$ individually. Also infer the prior class probabilities $p(C_k)$. Then use Bayes’ theorem in the form

$p\left(\mathcal{C}_{k} | \mathbf{x}\right)=\frac{p\left(\mathbf{x} | \mathcal{C}_{k}\right) p\left(\mathcal{C}_{k}\right)}{p(\mathbf{x})}$

to find the posterior class probabilities $p(C_k|x)$.

For the denominator, it can be calculated by $p(\mathbf{x})=\sum_{k} p\left(\mathbf{x} | \mathcal{C}_{k}\right) p\left(\mathcal{C}_{k}\right)$ .

Equivalently, the joint distribution $p(x,C_k)$ can also be modelled directly and then normalize to obtain the posterior probabilities.

Given the posterior probabilities, we use decision theory to determine class membership for each input $x$. This kind of method is called generative models, which model the distribution of inputs as well as the outputs. The name “generative“ is because by sampling from them it is possible to generate synthetic data points in the input space.

The examples of generative models:

Naive Bayes, Latent Dirichlet allocation, Gaussian Process…

Discriminative Models

Solve the inference problem of determining the posterior class probabilities $p(C_k|x)$, and then make prediction using decision theory.

The methods which model the posterior probabilities $p(C_k|x)$ directly are called discriminative models.

Find a function $f(x)$, called a discriminant function, which maps each input $x$ directly onto a class label.

Examples of discriminative models:

kNN, perceptron, decision tree, linear regression, logistics regression, SVM, neural network…

The Merits of Each Method

Generative models are most demanding, since it involve finding the joint distribution over both $x$ and $C_k$. For many application, $X$ have high dimensionality and consequently we may need a large training set in order to be able to determine the class-conditional densities （类条件概率密度，就是后验概率，我们的目标）to reasonable accuracy.
One distinctive use case of the generative models is outlier detection （离群点检测）. The margin density of data $p(x)$ can be determined using the formula menetioned above. It is usefule for detecting new data points that have low probability under the model and for which the predictions may be of low accuracy, which is know as outlier detection and novelty detection.

Discriminative approaches is simpler. The second approach can obtain the posterior probabilities $p(C_k|x)$ directly from the data points. The thrid approach is much simpler, in which we use the training data to find a discriminant function $f(x)$ that maps each $x$ directly onto a class label (It combine the inference and decision stages into a single learning problem). However, in the third method, we no loner have access to posterior probabilities.

Reference

Bishop PRML Chapter 1.5.4