I try to understand LDA that uses Gibbs sampling to estimate parameters of the model. The paper "finding scientific topics" by Griffiths and Steyvers is written "this [posterior] distribution cannot be computed directly, because the sum in the denominator does not factorize and involves T^n terms, where n is the total number of word instances in the corpus"(p.2). (T is the number of topics)

This is unclear to me why the denominator of the below equation involves T^n terms, not T terms. P(z|w)=P(w,z)/sum_over_z(P(w,z))

Could you please point out why there are T^n terms?

asked Nov 20 '10 at 12:16

Killua's gravatar image

Killua
716811


One Answer:

The sum is taken over all possible assignments of the T topics to each of the n words, a vector of length n where each term can take on one of T values. So the global topic assignment vector has T^n possible values.

answered Nov 20 '10 at 12:37

Kevin%20Canini's gravatar image

Kevin Canini
12001328

I really have to put more effort on studying LDA as I still misunderstood that z is a topic. Many thanks!

(Nov 20 '10 at 12:46) Killua
Your answer
toggle preview

powered by OSQA

User submitted content is under Creative Commons: Attribution - Share Alike; Other things copyright (C) 2010, MetaOptimize LLC.