2
2

Hi all, Is there a simple, scalable way to automatically discover emerging topics in text such as customer contacts?

Thanks!

asked Jun 29 '10 at 13:54

Zuohua's gravatar image

Zuohua
31123


3 Answers:

You are interested in Dynamic Topic Models. David Blei has two great papers (1, 2) papers on this topic. I'm not sure they qualify as simple, but they are elegant.

answered Jun 30 '10 at 00:35

Tristan's gravatar image

Tristan
27138

edited Jun 30 '10 at 21:45

Can you give some examples of emerging topics in "customer contacts" (not sure what exactly you mean by that -- do you mean in written correspondence with customers?)?

Have a look at Sematext's Key Phrase Extractor: http://sematext.com/products/key-phrase-extractor/index.html (the demo link is on the right side)

This KPE can extract a few different types of terms/phrases from textual content, including SIPs, which could be used for emerging topic detection or buzz.

answered Jun 29 '10 at 23:51

Otis's gravatar image

Otis
613

edited Jun 29 '10 at 23:53

I guess it depends on two things, basically: how you define a topic and what is the volume of your customer contacts.

For example, if you have relatively low volume and want to catch unusual spikes in word uses as your topics you can keep a moving average of the word counts (maybe stemmed or lemmatized) of the last n emails and display as "trending topics" words whose averages have been increasing for a while.

On the other hand, if you have thousands of messages a day and want to find topics as general clusters of documents you can use a standard novelty detection algorithm http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.59.7746&rep=rep1&type=pdf . You can also use online algorithms for topic models if you want a fuzzier definition of topics. Some good examples are http://cocosci.berkeley.edu/tom/papers/topicpf.pdf and (less so) http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.102.3009&rep=rep1&type=pdf .

answered Jun 29 '10 at 21:07

Alexandre%20Passos's gravatar image

Alexandre Passos ♦
2554154278421

Your answer
toggle preview

powered by OSQA

User submitted content is under Creative Commons: Attribution - Share Alike; Other things copyright (C) 2010, MetaOptimize LLC.