1
1

I've started digging through the paper Online Inference of Topics with LDA, by Canini/Shi/Griffiths (for the umteenth time; it keeps slipping down in my pile). I seem to be having trouble developing an intuition regarding the variable z_i^(p); for example Eq 6. There's a couple of other bumps, but I think the generation/sampling of z_i^(p) is the primary source of my confusion.

Any insight would be appreciated.

[added: perhaps to be more explicit, the confusion is focused on sampling from the posterior distribution P(z_i^(p) |z_i-1^(p), w_i-1) ]

asked Jun 20 '11 at 12:11

Aengus%20Robinson's gravatar image

Aengus Robinson
21551114

edited Jun 20 '11 at 15:44


2 Answers:

The best way to think of equation (6) is that is where we calculate the probability w_i has a given topic. z_i is the random variable associated with the topic labels. z_i^(p) is the pth particle sampling a topic label for z_i. $sigma_i$ is the weight assigned to each of the particles. You can also think of Equation (6) as a weighted count of empirical distribution of the topic labels in the sampled particles.

answered Jun 23 '11 at 14:46

zaxtax's gravatar image

zaxtax
75191934

edited Jun 23 '11 at 14:53

Thanks, that gets me a little closer. However, from the notation (and from your comment) it seems that I need 'P' particles for each word 'i', where each particle is a possible topic for word 'i'. That seems to be an excessive number of particles ... ?

(Jun 23 '11 at 16:25) Aengus Robinson

Not really. You need to estimate the topic label reasonably for each word. Also, you only need to store in memory the recently generated particles.

(Jun 23 '11 at 17:07) zaxtax

Well, that helps. Still, if I have a vocabulary of V=5K and P=10 particles for each word, it seems that 50K total particles constitutes a chunk of memory. I'll have to code it up to get an appreciation for what's going on. (and I realize that I'll have to deal with hash tables at some time to improve performance.) Thanks for taking the time to provide the insight!

(Jun 23 '11 at 17:13) Aengus Robinson

I don't really get what you are asking for, that is basically a Gibbs sampler, maybe if you read their earlier paper Finding Scientific Topics

You'll get a bit more insight on what they are doing.

If however, you are looking for an explanation of how to get that result, you might try this question (it was mine actually) where Alexandre points a great 50 steps derivation from Carpenter to get that equation

answered Jun 21 '11 at 02:43

Leon%20Palafox's gravatar image

Leon Palafox
31265471107

He is talking about the particle filter for online LDA, not about the gibbs sampler for batch LDA.

(Jun 21 '11 at 02:56) Alexandre Passos ♦

Ah, Ok, Then he should look into Importance Sampling, It is pretty well explained in Bishop's as far as I remember.

Other good reference is "Probabilistic Robotics" by Thrun.

(Jun 21 '11 at 02:58) Leon Palafox

Leon, thanks for the suggestions but the confusion is not with particle filtering per se, but with the on-line implementation within the algorithm purposed by the authors. I'm sure it's probably an easy concept, but I'm missing the connection right now.

(Jun 21 '11 at 13:51) Aengus Robinson
Your answer
toggle preview

powered by OSQA

User submitted content is under Creative Commons: Attribution - Share Alike; Other things copyright (C) 2010, MetaOptimize LLC.