|
Q1: In Gibbs sampling for LDA, we sample a topic index for every word occurrence in document, right? Then if a word w occurs 1000 times in a document, we need to sample topic index for different occurrence of w 1000 times, which is very time-consuming for long documents. Therefore, I wonder if it is legal to just do one sampling for each distinct word in a document, while count the topic index mutiple times( maybe equal the frequency of words in the document, or multiply by p(w|z)? ). Q2: Gibbs sampling draw samples from conditional distribution p(z_i|Z_{-i},W) for word w_i. However, since p(z_i|Z_{-i},W) has been figured out, why sampling are needed? Why not just use the exepectation of this conditional distribution, namely p(z_i|Z_{-i},W) to do a "soft" topic assignment as EM algorithms does? If it is allowed, we can complete the Gibbs sampling without sampling but with simple re-assigment operations, which also save cost. Any suggestions or theory judgements to these questions? Thanks! |
|
First, the algorithm you propose in Q2 is known as cvb0. It usually works about as well as gibbs sampling, except that it needs K times the amount of memory. The algorithm you propose in Q1, on the other hand, is biased: actual samples from LDA can be likely to assign different occurrences of the same word to different topics. Even ignoring this, the sampling equation should consider the effect of adding the large counts, and you'd need to change the equation to something closer to the one used to sample in naive bayes. Note that doing this will make it harder for your sampler to move around the sample space because you're effectively removing some degreees of freedom. Great! CVB0 is what I seeked for. I have tested the way that each distinct word is re-assigned to K topics with "soft" counter tf(w)*p(z|w,d), rather than sampling one topic index. It is faster than Gibbs sampling, and with comparable results:)
(Sep 17 '12 at 09:42)
l0he1g
|