This is my first question in this forum, so hello everyone !

I'm interested in LDA extensions - there are a lot of papers: LDA (2003), Supervised Topics (2007), Supervised Latent Dirichlet Allocation (2009) and so on. There is always some case, some graphical model and some MCMC or Variational method for make a posterior approximation. Frequently, there is a python, matlab or C++ code.

However, I see Hierarchical Bayes Compiler, PYMC, BUGS, JAGS and other tools, for make life easier ... Ad rem, if declaring LDA is so trivial (HBC example):

alpha ~ Gam(0.1,1) eta ~ Gam(0.1,1) beta_{k} ~ DirSym(eta, V) , k in [1,K] theta_{d} ~ DirSym(alpha, K) , d in [1,D] z_{d,n} ~ Mult(theta_{d}) , d in [1,D] , n in [1,N_{d}] w_{d,n} ~ Mult(beta_{z_{d,n}}) , d in [1,D] , n in [1,N_{d}]

--# --define K 2 --# --loadD testW w V D N ; --# --collapse beta --# --collapse theta

Why do researchers make their own code for LDA extentsions, for example Supervised LDA:

[http://www.cs.cmu.edu/~chongw/slda/][1]

Sorry if this question is a bit stupid, and thanks in advance for your replies :-)

asked Sep 24 '13 at 15:41

grzegorz_g's gravatar image

grzegorz_g
1112


One Answer:

This questions may be a bit subjective, but this is a shot to try to answer it:

First of all, half those tools you just mentioned are just recently developed tools, and some of them may have the occasional bug. We have been modeling these kind of structures since a long time ago.

Second, what would happen if you want to use a different optimization model? If you are improving on the optimization?

Writing your own machine learning software lets you understands more deeply how a program works, knowing Gibbs SAmpling is half the problem, the other half is build the sampler and making sure it works. From the academic point of view, is a great exercise for the mind just to re implement an algorithm and understand what is it doing.

An example of this, is how many people use Weka, Sklearn or other Machine Learning Toolboxes, and don't understand any of the parameters, or even the output that they are receiving.

That would be my case for someone implementing their own samplers.

answered Sep 24 '13 at 19:11

Leon%20Palafox's gravatar image

Leon Palafox ♦
40857194128

Your answer
toggle preview

powered by OSQA

User submitted content is under Creative Commons: Attribution - Share Alike; Other things copyright (C) 2010, MetaOptimize LLC.