|
I am using gensim for LDA learning and inference. I noticed that for the same text it gives me the non-deterministic topic space representations of text (i.e. the gamma values), even with using inference() method one after another. Is there any LDA implementation which gives deterministic inferences? |
|
To answer your question, while I can't find any implementation of the algorithm in the Sato et al paper Deterministic single-pass algorithm for LDA, it seems to do what you want. With variational algorithms as well, if the initialization is fixed the end result is always the same. However, this is probably not what you want. LDA models, just like clustering models, are inherently unidentifiable: you can swap topic indices and nothing in the model changes. Also, the optimization problem solved by LDA inference is highly nonconvex, and for this reason it pays off to use random initializations and other tricks to try to reach better optima. If you are basing an application on an LDA topic model, keep in mind that any information about the topic vector itself is meaningless and unlikely to be useful or to reflect anything true about your data; it's mainly the aggregate statistics computed from the posterior distribution that can give you reliable information. I didn't understand this part: "it's mainly the aggregate statistics computed from the posterior distribution that can give you reliable information."
(Jun 12 '11 at 07:13)
Xolve
I think that what Alexandre is trying to say is that you have to focus on your posterior distribution, rather than the vectors that you got. Since every time you run LDA, you'll get different topic vectors (Check Definetti's Theorem). So you rather have to worry on the statistical properties of your prior to reach any sensible conlcusion on your data.
(Jun 12 '11 at 08:07)
Leon Palafox
So how I calculate those values?
(Jun 18 '11 at 06:59)
Xolve
|