|
I was wondering how one decides which of these approximations to use to do inference in a hierarchical Bayesian model. With 'variational Bayes' I'm referring to the optimization of the parameters of a factorized version of the true model. This must depend on the quality of the approximating variational distribution but I don't have much of an intuitive idea of how to decide on this based on things like for example the complexity of the model and the size of the dataset. Actually I have more experience with variational methods than with empirical Bayes and for me it is more intuitive how the former approximates the true distribution than the latter. The variational approximation might do badly when the independence assumptions of the approximating distribution are strongly violated. It can also be quite sensitive to the initial values of the variational parameters. Can similar things be said about empirical Bayes? |
|
I think of empirical bayes as less of an inference method and more of a modeling method, but then again I know almost nothing of it. Is is, however, quite similar to a variational algorithm in that you put priors on the nodes of your model and optimize the parameters of those priors to maximize something, but in variational algorithms this something is a lower bound on the marginal likelihood of the data and the parameters are on an approximating distribution, while on empirical bayes you're maximizing a point estimate tot the marginal likelihood (that can easily overshoot the marginal likelihood, hence it's not a bound) and the parameters are the actual parameters from your prior. Including, it makes sense to use empirical bayes together with variational methods if you're not that interested in a lower bound on the marginal likelihood and instead want a set of parameters that make sense for the model. For example, in the LDA paper Blei, Ng, and Jordan show both empirical bayes estimates of the hyperparameters alpha and beta and variational inference approaches to integrating them out. Thanks for the answer. I was mainly wondering about this question because I read a paper where the authors mentioned they used empirical Bayes instead of a full variational Bayesian approach because the variational methods were very sensitive to initial parameter settings and that empirical Bayes performs well when plenty of data is available. This seems similar to the convergence of maximum likelihood and Bayesian inference methods in the limit of infinitely many data points that cause the influence of the priors to vanish. I guess I should perhaps see empirical Bayes somewhat as a Bayes/ML hybrid approach that is somewhat in between these extremes but I still need to read more about the subject.
(Mar 13 '11 at 08:11)
Philemon Brakel
|