2
1

Hi,

I have 1 million tweets that I would like to classify by category(i.e. music, movies, spam, etc - there will be 5 categories). What's a good first-cut algorithm to use? I am looking for 80%+ accuracy, so something easier to implement and tweak would be more preferable to the state of the art black-box.

I work in Python, so any good examples/tutorials or useful links are welcome. :)

Thanks!

asked Feb 14 '12 at 09:30

Vishal%20Goklani's gravatar image

Vishal Goklani
46236

edited Feb 14 '12 at 09:41


2 Answers:

start with Naive Bayes and set a baseline benchmark.

answered Feb 14 '12 at 16:28

Joey%20Markowitz's gravatar image

Joey Markowitz
1

Concerning features, this question can also be relevant:

http://metaoptimize.com/qa/questions/8614/tweet-classifier-features-in-nltk

answered Feb 15 '12 at 04:40

Vam's gravatar image

Vam
9091215

Your answer
toggle preview

powered by OSQA

User submitted content is under Creative Commons: Attribution - Share Alike; Other things copyright (C) 2010, MetaOptimize LLC.