|
By reading the paper, my impression was that the variational bayesian algorithm for LDA proceeded by:
However, looking at the corresponding code, it seems like it does no such thing. I'm sure I'm not understanding some things, so:
|
|
Ok, let me take a stab at this:
So instead of going through steps 3 & 4 above sweeping across all documents, you can instead do that on a per-document basis since the documents are independent conditional on the topics.
Yes, the last term is constant so when phi is normalized, it is irrelevant.
Again, the order you do steps 3 and 4 is just a heuristic. It's debatable which approach is actually better, but as above, because of independence, this is totally valid.
This is coordinate ascent, so again the order is arbitrary. You can update gamma, then update all the phis, then update gamma, then update all the phis, etc. Or you can update gamma, update phi_1, update gamma, update phi_2, and just repeat. I think this approach here gives you empirically better convergence, but it's a toss up really. Thanks a lot :-).
(Aug 09 '10 at 19:20)
Alexandre Passos ♦
|
Yeah, what you said makes sense and it isn't very obvious to me why it's done like this. I myself have only superficially seen the code. :) Maybe you should try dropping a mail at the topic-modelers mailing list about it: https://lists.cs.princeton.edu/mailman/listinfo/topic-models
I'm guessing some people on that mailing list would be having a first-hand experience with the code.
Good idea. I'll do that.
@ Alexandre: Deleted the other comments to avoid any confusions for those reading them. :)