Naive Bayes

What is Generative Models

See the last article: discriminative models and generative models.

The Naive Bayes belongs to the generative models, which model the distribution of the posterior and the process of generating the inputs.

Assumption of Naive Bayes

The naive bayes assumption is that all the data is conditionally independent, so if $D=(d_i|i=1,…,n)$ then

$p(\mathcal{D} | \boldsymbol{\theta})=\prod_{i=1}^{n} p\left(d_{i} | \boldsymbol{\theta}\right)$

(which also shown in the PRML P46)

Example of implementing spam filter

To implement a spam filter we can treat all the words in the email as independent of each other. Given an email $\left\langle w_{1}, w_{2}, \dots, w_{n}\right\rangle$ we can compute the probability of it being spam as

$p(\operatorname{spam} | \mathcal{D})=\frac{\prod_{i=1}^{n} p\left(w_{i} | \operatorname{spam}\right) p(\operatorname{spam})}{p(\mathcal{D})}$

where the $p(spam)$ is the empirically measured frequency of spam emails. To compute the likelihood we use a database of spam and non spam emails

$p\left(w_{i} | s p a m\right)=\frac{\# \text { of occurances of } w_{i} \text { in spam database }}{\# \text { of words in spam database }}$

Here I use the assumption we mentioned above, the likelihood $p(D|spam)$ is defined by the multiplication of each $p(w_i|spam)$. (We might include pseudo counts to make this more robust). The probability of the data $D$ is

$p(\mathcal{D})=p(\mathcal{D} | \operatorname{spam}) p(\operatorname{spam})+p(\mathcal{D} | \neg \operatorname{spam}) p(\neg \operatorname{spam})$

We use exactly the same procedure to compute $p(D|\neg spam)$ as we did to compute the $p(D|spam)$.

By calculating the posterior probabilities, we can get the approximate prediction on whether the email is spam or not.

Reference

Bishop PRML chapter 1.5.4
Shuogh blog: Discriminative and generative models
AML(comp3008) 2012-2013 exam paper