Recently I've tried to derive the variational inference for LDA assuming that one wants to integrate over both the topic-word distributions and the document-topic distributions (as the original paper only integrated over the document-topic distributions). There will be four terms in the E_q[log p(theta, phi, z, w)], and two are very easy to solve (E_q[log p(theta)] and E_q[log p(phi)]. However, the term for E_q[log p(z|phi)] now has to average over q(phi) as well as q(z), and I'm not sure how to compute this expectation:

int dphi q(phi) sum_j q(z=j) log phi_j

if there were no summing over z this would the a standard expectation of log phi under a dirichlet distribution over phi; if there were no integrating over q(phi) this would be a sum of log-probabilities, which makes sense and is easy to compute and differentiate to get the algorithm. Differentiating this is easy enough, so maybe it's not an issue if one just wants to do the algorithm, but I find it odd that I don't know how to compute the exact value of the lower bound.

The collapsed variational algorithm for LDA paper seems to consider this variant of VB, with all three parameters, and derives update equations that seem to have an extra digamma somewhere.

I feel I'm missing something but I'm not sure what it is.

asked Dec 26 '10 at 05:20

Alexandre%20Passos's gravatar image

Alexandre Passos ♦
1896744214334


One Answer:

There is one detailed derivation here: http://www.whatdafact.com/eel6825sp2010/materials/LDA_derivation.pdf

answered Dec 29 '10 at 16:21

Liangjie%20Hong's gravatar image

Liangjie Hong
22691720

This is odd. The author integrates over the document-topic parameters and optimizes the topic-word parameters, while the Blei paper does precisely the opposite. My question was related to what happens if you try to integrate over both parameters, as in the gibbs sampler.

(Dec 29 '10 at 19:00) Alexandre Passos ♦

I followed both papers and can conclude the derivation. If you don't mind, would you please tell me which Equation are you looking for (in Blei's paper)?

(Dec 29 '10 at 21:43) Liangjie Hong

I can follow the derocation as well; my question is how to compute that expectation if one integrates out botj parameters.

(Dec 30 '10 at 03:36) Alexandre Passos ♦
Your answer
toggle preview

powered by OSQA

User submitted content is under Creative Commons: Attribution - Share Alike; Other things copyright (C) 2010, MetaOptimize LLC.