|
Hi all, Is there a simple, scalable way to automatically discover emerging topics in text such as customer contacts? Thanks! |
|
You are interested in Dynamic Topic Models. David Blei has two great papers (1, 2) papers on this topic. I'm not sure they qualify as simple, but they are elegant. |
|
Can you give some examples of emerging topics in "customer contacts" (not sure what exactly you mean by that -- do you mean in written correspondence with customers?)? Have a look at Sematext's Key Phrase Extractor: http://sematext.com/products/key-phrase-extractor/index.html (the demo link is on the right side) This KPE can extract a few different types of terms/phrases from textual content, including SIPs, which could be used for emerging topic detection or buzz. |
|
I guess it depends on two things, basically: how you define a topic and what is the volume of your customer contacts. For example, if you have relatively low volume and want to catch unusual spikes in word uses as your topics you can keep a moving average of the word counts (maybe stemmed or lemmatized) of the last n emails and display as "trending topics" words whose averages have been increasing for a while. On the other hand, if you have thousands of messages a day and want to find topics as general clusters of documents you can use a standard novelty detection algorithm http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.59.7746&rep=rep1&type=pdf . You can also use online algorithms for topic models if you want a fuzzier definition of topics. Some good examples are http://cocosci.berkeley.edu/tom/papers/topicpf.pdf and (less so) http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.102.3009&rep=rep1&type=pdf . |