I'm interested in implementing Ruslan Salakhutdinov + Geoffrey Hinton's RBM based topic modeling method described here: http://www.mit.edu/~rsalakhu/papers/repsoft.pdf
Has anyone tried implementing this?
Is it possible to distribute the algo across processes/machines??
Thanks, Timmy Wilson
Excellent paper, I too have an interest in this. If only we could learn from millions of non labeled documents - the internet is full of text - and then improve classification, clustering and other text tasks.
answered Nov 07 '11 at 13:36
I have started to implement it, but it will not be trivial to parallelize across many machines in a cluster setting. For large vocabularies, it could be parallelized pretty simply with multiple processes on a SMP machine by splitting up the weights for a small degree of parallelization. For small vocabularies (10,000 or so), it can probably run quite quickly with a straightforward GPU implementation.
I don't currently have code finished that I can release, but you might try contacting Dr. Salakhutdinov and asking if he will make matlab code available at any point.
answered Nov 07 '11 at 17:47