|
LDA is nice, but unpredictable as it does not always give me the topics I wanted it to give me. I am looking for something like LDA, but semi-supervised. In the sense, i can pick the seed words for each topic, and then run some system which would figure out words related to those seed words, and then words related to those words, and so on.. and give me topics which are meaningful for my task and I would know, without manually checking, which topic covers which set of ideas. Is there something like this out there? |
|
David Blei's original implementation has pseudo support for that. Run it for one iteration and edit the output model files to give a higher probability in those words on those topics, and run it again. |
|
One hacky solution would be to add a fake document for each topic which only contains the seed words. If possible adjust document parameters to encourage a tighter topic distribution for those documents so the posterior has those words in a single topic. |
|
Two models I've worked on might be relevant:
If you want topics to be grounded to specific concepts you might also be interested in the Concept-Topic Model work by Chemudugunta et al. |