I have learned a bunch of inference in methods from classes/books, but don't really understand when a particular method is appropriate.

I have seen various descriptions of the pros and cons of various approximate inference methods, such as this and it seems that people just use their favorite, but as a beginner, I don't have a favorite and am roughly familiar with several, though I have yet to use them for something "in reality" beyond textbook examples. I was wondering if there were some general heuristics (or a decision tree sorta thing) for choosing among various approximate inference methods (Laplace, Expectation Propagation, Sampling (Gibbs, Metropolis, Slice, ...), Variational, etc) for practical applications based on computation, ease of use/derivation/coding, popularity, etc. I don't really know if there are ways that I can count out a particular method for a given model or decide right away that a particular inference method is best.

Sorry for such a generic question (I would imagine the answer would be fairly application independent)... I'm looking for a place to go beyond the books and maybe try to become an expert 1 or 2 particular methods. Thanks... any help/suggestions/references much appreciated!

asked Apr 12 '12 at 18:38

ccb's gravatar image

ccb
1111


One Answer:

Unfortunately, as you expected, there is no one true answer to this problem. Indeed, even for specific models (such as latent Dirichlet allocation) different people or different applications use different inference methods and still argue about which are more appropriate.

As a rule of thumb you should be minimally fluent in all of them, enough to know when you want/can apply them, and what are their disadvantages.

  • Sampling-based inference methods tend to be the most dependable ones, as it's always possible to build a sampler and it's generally doable to tune one so it will converge. They are, however, often slower than competing alternatives (when they apply), and unless you have a reason for expecting them to be fast they are more often used for convenience than speed.
  • (Loopy) belief propagation (and expectation propagation) are also generally reliable, and will work well and converge quickly in most settings where (1) things are identifiable and (2) you can afford to store the messages between iterations. They can, however, be unstable, which can be harmful if they are used to, say, compute gradients for an optimizer, but then convergent versions of BP exist and are useful.
  • Mean-field variational algorithms are only easy to derive in some specific circumstances, and in general do worse than BP/EP in capturing the actual variance of the posterior distribution. In cases where the posterior is multimodal, however, BP will tend to not converge or give nonsensical answers, while mean-field algorithms will converge to one of the modes. It is possible to use BP/EP with only a few factors behaving like a mean-field variational relaxation, however, though this is rarely done. When models are conjugate variational algorithms are easy to derive and tend to run fast, and have low memory requirements. There are also ways of learning them online, which are easier to do than sampling (although about as easy as assumed-density filtering, which is online BP/EP).
  • Finally, LP relaxations are something I still don't understand very well, but as far as I know they are useful in bridging the gap between BP and exact inference in accuracy and computational time, and are particularly useful when combining many different models, in things like dual decomposition. They are also very useful for thinking about inference problems and understanding their mathematical properties, which might lead to new results and ideas that apply to the other things above.

answered Apr 12 '12 at 20:10

Alexandre%20Passos's gravatar image

Alexandre Passos ♦
2554154278421

Your answer
toggle preview

powered by OSQA

User submitted content is under Creative Commons: Attribution - Share Alike; Other things copyright (C) 2010, MetaOptimize LLC.