0
1

howie folks,

I have just began looking at LDA and require lots of help here. I have read that LDA can be used in search engines to improve search queries. However, knowing that in general, LDA is predominately used in an offline manner on a fixed number of documents, can someone tell me how is LDA implemented such that it address the online nature, increasing number of documents and varying number of topics of such application?

thanks in advance

asked May 04 '11 at 23:07

flynn's gravatar image

flynn
21577

edited Aug 06 '12 at 07:35

Joseph%20Turian's gravatar image

Joseph Turian ♦♦
579051125146


2 Answers:

If you have a LDA model fit to the documents in your corpus, you can run inference on the query to get a topic distribution for the query and then use that distribution as you would use a word distribution (i.e., tf-idf) in a regular search algorithm. Anecdotal evidence from Daniel Ramage suggests that you should use a dot-product that is 20% from the LDA dot product and 80% from the regular tf-idf dot product.

answered May 05 '11 at 02:24

Alexandre%20Passos's gravatar image

Alexandre Passos ♦
2554154278421

Do you have a citation for the Daniel Ramage information?

(Aug 06 '12 at 07:37) Joseph Turian ♦♦

Vowpal Wabbit has an implementation of online LDA. It is based, I believe, on this paper.

answered May 06 '11 at 07:25

Noel%20Welsh's gravatar image

Noel Welsh
72631023

I, as well as several people with whom I have talked, have had issues getting good results from Vowpal's LDA implementation.

(Sep 15 '12 at 02:55) Joseph Turian ♦♦
Your answer
toggle preview

powered by OSQA

User submitted content is under Creative Commons: Attribution - Share Alike; Other things copyright (C) 2010, MetaOptimize LLC.