|
In Griffiths and Steyvers (2004) "Finding scientific topics" they determine the number of topics T that maximize the log-likelihood: P(w | T) where they define P(W | Z) as
where w represents a word, T is the number of topics, W is the size of the vocabulary, nj^w is the number of times word w is assigned to topic j and nj^. is the number of times any word occurrence is assigned to topic j. I always learned that to find the log-likelihood function to take the product of the joint density of the parameters over all of the data, and then take the log. But here, I am stuck as how to proceed since there is already a product with respect to w within P(w | z). How can I compute the likelihood function and thus the log-likelihood function? NOTE: The above comes from the Collapsed Gibbs Sampling formulation. |
|
Also note that P(w|z) is not the actual likelihood, because it conditions on the unobserved variables z. The joint likelihood would be P(w,z) = P(w|z) P(z) (where P(z) is equation (3) in Griffiths and Steyvers) and the marginal likelihood of w would be P(w) = sum_z P(w|z) P(w). In all of these cases if you want to compute the log of it you'd just do that, as David said. |
|
Because the likelihood has a product of products, you have to apply the rule log(ab) = log(a) + log(b) recursively. That is, you will get a sum over topic terms, each containing a nested sum over vocabulary terms. EDIT: I omitted the fact that you need to marginalize out z, see the reference below for details. Evaluation Methods for Topic Models Hanna M. Wallach, Iain Murray, Ruslan Salakhutdinov, David Mimno http://www.cs.princeton.edu/~mimno/papers/wallach09evaluation.pdf So P(w | z) already is the likelihood function? So to get the log-likelihood I would just take the log? That is a bit different from what I learned way back when, so that is why I am confused (this is a conditional distribution, and is not a product over all the data, or at least I don't think it is?)
(Sep 25 '12 at 16:54)
Ryan Rosario
|
