|
Recently I've tried to derive the variational inference for LDA assuming that one wants to integrate over both the topic-word distributions and the document-topic distributions (as the original paper only integrated over the document-topic distributions). There will be four terms in the E_q[log p(theta, phi, z, w)], and two are very easy to solve (E_q[log p(theta)] and E_q[log p(phi)]. However, the term for E_q[log p(z|phi)] now has to average over q(phi) as well as q(z), and I'm not sure how to compute this expectation:
if there were no summing over z this would the a standard expectation of log phi under a dirichlet distribution over phi; if there were no integrating over q(phi) this would be a sum of log-probabilities, which makes sense and is easy to compute and differentiate to get the algorithm. Differentiating this is easy enough, so maybe it's not an issue if one just wants to do the algorithm, but I find it odd that I don't know how to compute the exact value of the lower bound. The collapsed variational algorithm for LDA paper seems to consider this variant of VB, with all three parameters, and derives update equations that seem to have an extra digamma somewhere. I feel I'm missing something but I'm not sure what it is. |
|
There is one detailed derivation here: http://www.whatdafact.com/eel6825sp2010/materials/LDA_derivation.pdf This is odd. The author integrates over the document-topic parameters and optimizes the topic-word parameters, while the Blei paper does precisely the opposite. My question was related to what happens if you try to integrate over both parameters, as in the gibbs sampler.
(Dec 29 '10 at 19:00)
Alexandre Passos ♦
I followed both papers and can conclude the derivation. If you don't mind, would you please tell me which Equation are you looking for (in Blei's paper)?
(Dec 29 '10 at 21:43)
Liangjie Hong
I can follow the derocation as well; my question is how to compute that expectation if one integrates out botj parameters.
(Dec 30 '10 at 03:36)
Alexandre Passos ♦
|