(LDA = Latent Dirichlet allocation) It is straightforward to understand how to use SVD and random projection. Both create lower dimensional representation of document which saves cosine distance between those documents. I.e cos(Original[i], Original[j]) ~= cos(SVD_OR_RI_TRANSFORMED[i], SVD_OR_RI_TRANSFORMED[j]) and it is easy to understand why, all is about linear algebra.

But how to get same approximation for LDA where result got by probabilistic inference? Is it possible to use LDA for cos approximation? If not what distance metric LDA approximate?

So the question what I can expect from cos(LDA_TOPICS[i], LDA_TOPICS[j]) related to original TFIDF vectors?

asked Jan 20 '12 at 19:02

yura's gravatar image

yura
1025374854

edited Jan 20 '12 at 19:14

LDA is not necessarily about linear algebra, nor does it try to directly approximate the cosine or anything like it. You can get a similarity function by multiplying the document-specific topic probability distributions for two documents as if they were vectors, and this is generally well-behaved.

(Jan 20 '12 at 19:08) Alexandre Passos ♦
1

So there are no relation between cos(Original_TFIDF[i], Original_TFIDF[j]) and cos(LDA_TOPICS[i], LDA_TOPICS[j]) but both works good?

(Jan 20 '12 at 19:14) yura
1

There is no clear relation, as the LDA approximation tries to sparsely represent the words, so documents with high cosine can end up with a low LDA dot product as the same words can be explained by different topics. LDA is not directly concerned with multiplying these document-topic representations, hence the lack of guarantees. However, the feature encoding provided by the LDA representation often turns out to be useful.

(Jan 20 '12 at 20:02) Alexandre Passos ♦
1

I think there's still an interesting question here, specifically: What is an appropriate distance measure in the LDA space?

(Jan 26 '12 at 04:23) Joseph Turian ♦♦

Joseph Turian: I find that an interesting question for which I don't really have an answer. For example, if you have two similar topics (in the sense that they both assign high probability to some of the same words) the inference process will almost never assign those two topics to the same document (an example of topics like this is neuroscience and neural network topics in NIPS data), as it can almost always have a sparser (if worse) solution by using only one of those. So what you would think would be very similar topics (high dot product) turns out to be things which never co-occur in the document-topic vectors, and vice versa.

(Jan 26 '12 at 07:02) Alexandre Passos ♦

Hanna Wallach (p.c.) recommended to me to use Hellinger distance to measure the distance in topic space.

(Sep 15 '12 at 02:53) Joseph Turian ♦♦
showing 5 of 6 show all
Be the first one to answer this question!
toggle preview

powered by OSQA

User submitted content is under Creative Commons: Attribution - Share Alike; Other things copyright (C) 2010, MetaOptimize LLC.