As far as I know, in general topic models like LDA just use single feature of words (i.e.,term value in the vocabulary list). Do the multiple features about words can be modeled? And how? It is intuitive that if more features can be utilized, better performance would be achieved. It would be much appreciated if someone can give me some explanation.

asked Nov 03 '12 at 14:12

leeshenli's gravatar image

leeshenli
1222


One Answer:

Hello,

There's no reason you can't use something LDA-like with features being something other than word-type indicators, but you'll need to change the model a bit. The reason word-type indicators are used is because you can aggregate them all together from a single document to make a single sample from Multinomial distribution (remember how a Binomial is just the sum of a bunch of Bernoullis? If you treat each word as vector with all entries but one as zero and sum them together, you have a Multinomial). If you'd like to include some more complex features, you'll have to change this Multinomial into something else.

For example, you may instead say that each document has a probability over topics given the document ID, and that each topic is actually a multivariate normal distribution with its own topic-specific mean and covariance. Given the topic of a word, you may sample its feature vector from this normal distribution. Alternatively, if you were given a K non-negative integer features for each word, you might say that each topic has K different Multinomial parameters, one for each set of counts.

The key idea of LDA isn't really the word-representation, but rather that each document has a probability distribution over topics (rather than a single topic, which would just be plain-old clustering!). Once you have that in hand, it's just a matter of choosing how to go from a mixture of topics to what you observe (words in the classic case, features in yours). The reason people usually like to stick to the word-type == feature case is because it's much easier to design MCMC or Variational Inference algorithms in those scenarios.

answered Nov 03 '12 at 16:09

Daniel%20Duckwoth's gravatar image

Daniel Duckwoth
954222938

edited Nov 03 '12 at 16:21

Thank you for your detailed reply!

(Nov 03 '12 at 21:40) leeshenli

Maybe you should also check out http://www.tiberiocaetano.com/papers/2010/PetSmoCaeBunetal10.pdf , as this paper finds a way to incorporate word features into LDA.

(Nov 04 '12 at 09:37) Alexandre Passos ♦
Your answer
toggle preview

powered by OSQA

User submitted content is under Creative Commons: Attribution - Share Alike; Other things copyright (C) 2010, MetaOptimize LLC.