Hello,

I feel like I understand Gibbs Sampling as it relates to topic models such as LDA quite well -- as far as learning the initial model. We get a number of counts based on iteratively building a distribution for each p(z_i=k | ...) and then on each iteration we sample from that distribution to get a new value for the current word's topic assignment. I.e. let's say there are K=3 topics and we have p(z_i=1) = 0.3, p(z_i=2) = 0.5, p(z_i=3) = 0.2 and we randomly sample this distribution and this time happen to get z_i = 1 so we set that word's topic to 1. Over time things average out and we can then at the end compute our distributions theta and phi along the lines of phi_{w,k} = p(w="word" | k=1) = (n_{w,k} + beta) / (n_{k} + beta*W). So far so good.

Now, however I have two questions about performing inference on a new, unseen document given the model that we've just learned.

  1. We should be able to learn a theta value for this new unseen document but as far as I understand it, phi should stay the same because we want to use an existing set of learned topics. If we do something like use the existing counts and just add to them for this document, won't that change the value of phi since that is learned from those counts as well? How can we rationalize this in a coherent way?

  2. This question goes for both new unseen documents and also for documents that we've just used to learn theta and phi: what is the best (or the most preferable or the "accepted") way to get the topic that each word is assigned to? What I mean by this is, if we compute the distribution p(z_i | ...) for some word w_i, we don't get a discrete topic assignment. Instead, we get a probability that that word was generated from each of the z_i topics. When learning the model and building the Gibbs chain, we randomly sample from this distribution to set the current assignment, but that can't make sense when we're trying to get a "final" assignment. Do people just use the highest value in p(z_i | ...) ? Do we again do several iterations, average over them, and choose the highest? What's the best way?

Thanks so much and I hope I was clear with my questions.

asked Mar 28 '11 at 19:36

James%20Sterling's gravatar image

James Sterling
61123


One Answer:

For 1, just don't increment/decrement the phi counts; instead treating those as fixed probabilities.

For 2, there is not necessarily a single topic each word is assigned to in the posterior distribution; most often the posterior for the z for each word has almost all of its mass over a small number of topics, but rarely just one. Hence, there is not true one topic for each word. Most people, however, is they need such a thing, just take the topic assigned to that word in the last model sample.

answered Mar 29 '11 at 06:58

Alexandre%20Passos's gravatar image

Alexandre Passos ♦
1896744214334

Original poster should also see your more detailed answer to essentially the same question as #2 given here: http://metaoptimize.com/qa/questions/2960/how-can-i-get-topic-assignment-for-each-word-in-lda#2961

I think the last 2 approaches from that answer (an averaging approach) are much more sound than simply using the last topic assignment

(Mar 29 '11 at 23:06) Will Darling
Your answer
toggle preview

powered by OSQA

User submitted content is under Creative Commons: Attribution - Share Alike; Other things copyright (C) 2010, MetaOptimize LLC.