|
howie folks, I have just began looking at LDA and require lots of help here. I have read that LDA can be used in search engines to improve search queries. However, knowing that in general, LDA is predominately used in an offline manner on a fixed number of documents, can someone tell me how is LDA implemented such that it address the online nature, increasing number of documents and varying number of topics of such application? thanks in advance |
|
If you have a LDA model fit to the documents in your corpus, you can run inference on the query to get a topic distribution for the query and then use that distribution as you would use a word distribution (i.e., tf-idf) in a regular search algorithm. Anecdotal evidence from Daniel Ramage suggests that you should use a dot-product that is 20% from the LDA dot product and 80% from the regular tf-idf dot product. Do you have a citation for the Daniel Ramage information?
(Aug 06 '12 at 07:37)
Joseph Turian ♦♦
|
|
Vowpal Wabbit has an implementation of online LDA. It is based, I believe, on this paper. I, as well as several people with whom I have talked, have had issues getting good results from Vowpal's LDA implementation.
(Sep 15 '12 at 02:55)
Joseph Turian ♦♦
|