To derive the Gibbs sampling algorithm with LDA (Latent Dirichlet Allocation), you must familiar with the conjugacy between Dirichlet distribution and multinomial distribution. I want know how the conjugacy help solve the problem.

asked Jul 23 '10 at 01:15

charlie's gravatar image

charlie
140121417


2 Answers:

The reasons are mostly the computational simplicity of inference. A conjugate prior is nice because then the posterior distribution also has the same form as the prior, and it makes inference tractable ((e.g., if you are doing sampling, then the posterior is easy to sample from).

answered Jul 23 '10 at 01:19

spinxl39's gravatar image

spinxl39
3458104368

edited Jul 23 '10 at 01:21

There are two ways in which conjugacy helps you in LDA's case. For the uncollapsed gibss sampler, to sample from the posterior probability for the topic-word distributions and the document-topic distributions, conjugacy guarantees that the form of these posteriors is like their priors, only with different parameters. More specifically, if you only have one document with one word, the posterior should be P(theta|z) = P(z|theta)P(theta)/(int dtheta P(z|theta)P(theta)). Since this has an analytic formula equal to the prior, you can do this for all words and documents and get a full posterior for these distributions, from which you can easily sample. This allows you to easily do gibbs sampling (i.e., sample one variable at a time, conditioned on all others), since the sampling distributions will have an easy form.

The other way in which conjugacy helps you is that it allows you to define a collapsed gibbs sampling that doesn't represent those distributions explicitly at all. Due to conjugacy, you can let P(z_i|Z) = int dtheta P(z_i|theta) P(theta|Z), where Z are the other word assignments and z_i is the word you're trying to sample.

So, to summarize, conjugacy makes two hard things easier: sampling from the posterior for a parameter (since it's equal to sampling from the prior) and integrating out continuous parameters (which allows you to deal with them exactly and makes it easier to sample, since sampling from discrete distributions with few possible states is always easier than sampling from continuous distributions).

answered Jul 23 '10 at 09:01

Alexandre%20Passos's gravatar image

Alexandre Passos ♦
1893744214333

edited Jul 23 '10 at 09:07

Your answer
toggle preview

powered by OSQA

User submitted content is under Creative Commons: Attribution - Share Alike; Other things copyright (C) 2010, MetaOptimize LLC.