|
I try to understand LDA that uses Gibbs sampling to estimate parameters of the model. The paper "finding scientific topics" by Griffiths and Steyvers is written "this [posterior] distribution cannot be computed directly, because the sum in the denominator does not factorize and involves T^n terms, where n is the total number of word instances in the corpus"(p.2). (T is the number of topics) This is unclear to me why the denominator of the below equation involves T^n terms, not T terms. P(z|w)=P(w,z)/sum_over_z(P(w,z)) Could you please point out why there are T^n terms? |
|
The sum is taken over all possible assignments of the T topics to each of the n words, a vector of length n where each term can take on one of T values. So the global topic assignment vector has T^n possible values. I really have to put more effort on studying LDA as I still misunderstood that z is a topic. Many thanks!
(Nov 20 '10 at 12:46)
Killua
|