In "Finding scientific topics", model selection is used to decide the appropriate value of topics. The harmonic mean of a set of values P(W|Z,T) is used to approximate P(W|T). For a model with K topics, is it 1/P(W|T=K) approx (1/M)*(sum_over_d 1/P_(d in M)(W|Z,T=K)) with a set M of documents d? P_(d in M)(W|Z,T=K) is the product of phi? In this way, logP(W|T) is a negative number in my experiments, quite different from that with the magnitude 10^7 in Fig.3 in "Finding scientific topics".

I'm confused and not sure whether the formula is right. Anyone can help me? Thanks a lot.

asked Mar 30 '11 at 06:13

lily's gravatar image

lily
16114

edited Apr 04 '11 at 23:31


3 Answers:

P(w|z) is a multinomial distribution, or, in the way it is done in the finding scientific topics, and integrated dirichlet-multinomial pair, so it's either a probability from your topic-word distribution or smoothed normalized counts from that same distribution. Each w then has an independent probability, so P(W|Z) is a product for all words of the appropriate probability of that word being chosen from that topic.

answered Apr 02 '11 at 10:43

Alexandre%20Passos's gravatar image

Alexandre Passos ♦
1896744214334

Thanks,Alexandre.

Is it right that 1/P(W|T=K) approx (1/K)*(sum_over_j 1/P(W|Z=j)),where P(W|Z=j) is the product of P(w=t|Z=j) with V terms? For a corpus with K=200 and V=38898, logP(W|T=K) is -Infinity. So I guess that there is still something wrong.

On the other hand, I tried the formula (2) in "Finding scienific topics" for the same corpus with the counts from Gibbs Sampling. However, it not applicable, since gamma(V*beta) is infinity with V=38898 and beta=0.1. What is the problem?

Could you please give me some hints? Many thanks.

answered Apr 05 '11 at 02:41

lily's gravatar image

lily
16114

You need to work in logspace. So log 1/prod(p) is -sum(log p). This avoids underflow. Similarly use gammmaln instead of computing log gamma yourself.

(Apr 05 '11 at 03:39) Alexandre Passos ♦

Thanks. Does it make sense that the approximizations of P(W|T) are different according to the probability from the topic-word distribution and smoothed normalized counts from that same distribution?

answered Apr 07 '11 at 09:31

lily's gravatar image

lily
16114

Your answer
toggle preview

powered by OSQA

User submitted content is under Creative Commons: Attribution - Share Alike; Other things copyright (C) 2010, MetaOptimize LLC.