Question: What, if any, existing research relates to the current strategy I'm working on? I'm hoping to not re-invent the wheel or make the same mistakes others may have.
The strategy: I'm creating a topic analysis solution for customer support emails. The problem is that the current topic analysis solution I've used (LDA using MALLET) is not great at picking out common topics: the results I get are mostly noise. The current strategy I'm developing is to use website content as a 'seed'. For example, product names, features, etc. for that appear in the emails customers write are common in a website, so frequently occuring terms on the website could be a good source of 'training' or 'seed' data for any topic analysis solution I create.
asked Sep 29 '11 at 13:14