0
1

What is the difference between Maximum a posteriori Estimate and Bayesian Learning, since both involve the use of the prior to get the posterior estimate?

asked Oct 02 '11 at 12:58

Lancelot's gravatar image

Lancelot
9071211


2 Answers:

Maximum likelihood estimation (MLE) and Maximum a posteriori (MAP) give you a fixed point estimate of your parameters. Bayesian methods, on the other hand, give you a probability for every parametrization possible.

So if you want to estimate one parameter a, MLE might tell you a = 2. MAP might tell you a = 2.3. Bayesian will tell you, a is normally distributed with mean 2.3 and std 0.3. In order to get predictions now, you integrate over all possible parametrizations and get something that Bayesians call the predictive distribution.

Why is this useful? For one, Bayesian methods are less prone to overfitting and Bayesian fundamentalists :) even claim that overfitting does not happen. Furthermore, Bayesian feels more 'right' to most people, since it is a proper probabilistic treatment of everything involved.

The downside is, that most models are intractable, since you cannot integrate over their parameter space -- for example you cannot integrate over all possible values of a neural network, but it happens you can do that for linear regression. You can't, however for something as simple as logistic regression. This is why many people came up with approximations (eg. Laplacian approximation by MacKay iirc or the more recent stochastic langevin dynamics by Welling et al).

answered Oct 02 '11 at 15:19

Justin%20Bayer's gravatar image

Justin Bayer
92651828

edited Oct 03 '11 at 03:24

How would overfitting even be defined in a fully Bayesian (for a model where this is computationally feasible) approach?

(Oct 02 '11 at 21:27) gdahl ♦

@gdahl: as long as test-set performance is worse than training-set performance (as evaluated by the true loss function) you can say that overfitting is happenning (this is the weak form of overfitting, the strong one being that tweaking things to increase training-set performance harms test-set performance, which doesn't make sense in a bayesian setting as long as you can afford to tweak absolutely nothing). In this sense bayesian methods can and do overfit, for example, to a given domain.

(Oct 02 '11 at 21:30) Alexandre Passos ♦
-2

If I am not mistaken, when you do MAP you get a point estimate while the full Bayesian modeling gives you the ability to sample the way you want (discriminative vs generative). Disclaimer: I have little experience with Bayesian stuff.

answered Oct 02 '11 at 13:17

ogrisel's gravatar image

ogrisel
398464480

Thanks for the info

(Oct 02 '11 at 13:35) Lancelot
2

Discriminative vs generative is actually a different distinction. You can use Bayesian methods for discriminiative models (eg for logistic regression) or for models where the discriminative/generative distinction is not well defined (absent from supervised learning, for example).

(Oct 02 '11 at 16:46) Justin Bayer
Your answer
toggle preview

powered by OSQA

User submitted content is under Creative Commons: Attribution - Share Alike; Other things copyright (C) 2010, MetaOptimize LLC.