|
In the paper "finding scientific topics", the authors estimate the posterior for LDA by instead computing the full conditional probabilities. I understand how they come up with equation 5, but I have no idea why the problem of estimating P(z|w) suddenly becomes P(z_{i} | z_{-i}, w). |
|
They used a method called Gibbs Sampling, which is an instance of the more general Markov chain Monte Carlo method for computing the posterior distributions. The idea behind Gibbs sampling is that instead of trying to analytically compute the posterior distribution of lots of correlated variables (which is often very hard) you define a stochastic process so that, in the limit, samples from this process are samples from the true posterior distribution. The way this process is defined in gibbs sampling uses the fact that while sampling all those correlated variables at once can be very hard, computing the marginal posterior for a single variable is often easy. Then, if you can sample from the marginal distribution from each variable in turn, after many iterations, the expected fraction of time that variable has a given value is exactly the probability of that variable having that value, since you're effectively marginalizing the conditional distribution given many samples (I can point you to a proof that this converges if you want). So their method of inference involves sampling each z_i using information from all the other variables in the model, which is why they use the equation P(z_i | z_-1,w). At first I thought there is some basic probability rules that I miss. After all, according to your answer, it is just an approximate numerical solution, I guess. I would also be grateful if you can point to a proof of its convergence.
(Nov 23 '10 at 10:05)
Killua
In general any Markov chain book in the nearest library to you will prove that. I don't remember if Mackay's book ( http://www.inference.phy.cam.ac.uk/mackay/itila/ ) actually proves it or just gives the intuition. This persis diaconis paper http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.89.900&rep=rep1&type=pdf presents a proof in the first page, but it's rather dense.
(Nov 23 '10 at 10:18)
Alexandre Passos ♦
1
the intuition for how the proof of convergence works is pretty simple actually. you show that Gibbs sampling sets up a Markov chain whose stationary distribution is the posterior distribution of interest (as long as the Markov chain is "nice"). then there are a lot of different ways to show that in the limit, the states of a "nice" Markov chain are samples from its stationary distribution (e.g., simple analysis of the total variation distance of the two distributions or a coupling argument).
(Nov 23 '10 at 11:08)
Joseph Austerweil
Wow, you're up early Joe!
(Nov 23 '10 at 11:40)
Kevin Canini
on the east coast for thanksgiving :)
(Nov 23 '10 at 11:58)
Joseph Austerweil
|
|
You might want to take a pick to this document http://web.mit.edu/~wingated/www/introductions/mcmc-gibbs-intro.pdf It has a cool intro on MCMC, shows how and why Markov Chains Converge, and then analyzes Hastings and Gibbs Sampling, with proofs of them being a Markov Chain and thus converging. It also has cool graphical examples. |