Bayes’ Rule

The form of bayes’ rule is

$\mathbb{P}\left(\mathcal{H}_{i} | \mathcal{D}\right)=\frac{\mathbb{P}\left(\mathcal{D} | \mathcal{H}_{i}\right) \mathbb{P}\left(\mathcal{H}_{i}\right)}{\mathbb{P}(\mathcal{D})}$

$\mathbb{P}\left(\mathcal{H}_{i} | \mathcal{D}\right)$ is the posterior probability of a hypothesis $\mathcal{H}_{i}$ (i.e. the probability of $\mathcal{H}_{i}$ after we know the data)
$\mathbb{P}\left(\mathcal{D} | \mathcal{H}_{i}\right)$ is the likelihood of the data given the hypothesis. Note, that we calculated this from the forward problem
$\mathbb{P}\left(\mathcal{H}_{i}\right)$ is the prior probability (i.e. the probability of $\mathcal{H}_{i}$ before we know the data)
$\mathbb{P}(\mathcal{D})$ is the evidence. It is the normalising constant given by
$\mathbb{P}(\mathcal{D})=\sum_{i=1}^{n} \mathbb{P}\left(\mathcal{H}_{i}, \mathcal{D}\right)=\sum_{i=1}^{n} \mathbb{P}\left(\mathcal{D} | \mathcal{H}_{i}\right) \mathbb{P}\left(\mathcal{H}_{i}\right)$

In most of our task, what we want is the posterior probability.

The bayes’ rule converts this into the forward problem.

Prior and Posterior

(to be continued)

When the posterior is the same as the prior then the likelihood and prior distributions are said to be conjugate. The prior then is the conjugate prior.

In the xxxxxx, we want to maximize the posterior probabilities(MAP), which is different with the fully bayesian inference.(2017-2018 exam paper AML)

MAP and MLE

MAP: maximize the posterior distribution

MLE: maximum likelihood estimation

In MAP, we should add a prior distribution and by adding observations and then to maximize the posterior distribution $p(w|X)$ to get the parameter.

最大似然估计是求参数$\theta$, 使似然函数$P(x_0|\theta)$最大。最大后验概率估计则是想求$\theta$使$P(x_0|\theta)P(\theta)$最大,求得的$ \theta $不单单让似然函数大，$ \theta $自己出现的先验概率(也是得到的后验概率）也得大.

Bayes Rule, Prior and Posterior

Bayes’ Rule

Prior and Posterior

MAP and MLE

Reference