I was looking at LDA from a perspective of it fitting other datasets than words. When would the assumption that allow a dataset to be used with the LDA break down, like for example if I have a dataset with each document replaced by a person and the words replaced by scores/abilities in various subjects?

Assuming I have a trained LDA model if we present it with a datapoint m that has only a few of the values of the features(terms), if I do the inference and get the phi(p(terms|hidden=k)) and the theta(p(hidden|datapoint = m)). Would I be correct in doing the following operations

First per new datapoint calculate:

p(terms) = sum_topic(p(terms|topic)*p(topic|document=m))

Now I believe the p(terms) should now represent the probability of terms most likely with probability values filled in for the missing features(terms). What is wrong with such and assumption?

asked Jul 15 '11 at 05:05

kpx's gravatar image

kpx
541182636

Be the first one to answer this question!
toggle preview

powered by OSQA

User submitted content is under Creative Commons: Attribution - Share Alike; Other things copyright (C) 2010, MetaOptimize LLC.