I'm interested in implementing Ruslan Salakhutdinov + Geoffrey Hinton's RBM based topic modeling method described here: http://www.mit.edu/~rsalakhu/papers/repsoft.pdf

Has anyone tried implementing this?

Is it possible to distribute the algo across processes/machines??

Thanks, Timmy Wilson

asked Nov 06 '11 at 21:40

timmy%20wilson's gravatar image

timmy wilson

edited Nov 07 '11 at 06:50

A student of mine has a python implementation -- more or less reproducing the paper's result. Drop me a line if you are interested, I'll arrange the rest.

(Nov 08 '11 at 12:01) osdf

Yes -- definitely! python is preferred-- Thanks osdf

(Nov 09 '11 at 07:36) timmy wilson

3 Answers:

Excellent paper, I too have an interest in this. If only we could learn from millions of non labeled documents - the internet is full of text - and then improve classification, clustering and other text tasks.

answered Nov 07 '11 at 13:36

Visarga's gravatar image


I have started to implement it, but it will not be trivial to parallelize across many machines in a cluster setting. For large vocabularies, it could be parallelized pretty simply with multiple processes on a SMP machine by splitting up the weights for a small degree of parallelization. For small vocabularies (10,000 or so), it can probably run quite quickly with a straightforward GPU implementation.

I don't currently have code finished that I can release, but you might try contacting Dr. Salakhutdinov and asking if he will make matlab code available at any point.

answered Nov 07 '11 at 17:47

gdahl's gravatar image

gdahl ♦

I am this student - here it is: http://www.fylance.de/rsm/

answered Nov 10 '11 at 20:09

jola's gravatar image


edited Nov 10 '11 at 20:51

I am interested in this too. Thank you.

(Nov 11 '11 at 04:54) Visarga
Your answer
toggle preview

powered by OSQA

User submitted content is under Creative Commons: Attribution - Share Alike; Other things copyright (C) 2010, MetaOptimize LLC.