It depends fundamentally on the inference method you're using.
- For variational inference (as described in the original Blei, Ng, and Jordan paper) each word in the corpus has a pseudo-distribution over topics, expressed by the mean field for that word. So you can't technically get the topic assignment for a word, but you can use the mean field to compute the most likely topic for that word and similar combinations.
- If you're using Gibbs sampling, as per the Griffiths and Steyvers paper (or any more recent faster formulation, such as the implementation in MALLET), then at every iteration you have a topic assignment for every word in the corpus. You can then, extract "the topic" for a word in a few different ways:
- just use the last sample's topics
- count, over all samples, the most frequent topic for that word
- use the empirical distribution of topics over words (by looking at all past samples) to compute a finer-grained representation, for example distinguishing words that have mass mostly in a single topic from words whose mass is more spread out over more topics
Most deterministic inference methods (collapsed variational, EM, etc) fit with variational above, and most stochastic ones (non-collapsed gibbs sampling, sampling for HDP-LDA, etc) fit with Gibbs sampling above.
answered
Oct 16 '10 at 08:24
Alexandre Passos ♦
18952●44●214●333