4
3

For a toy project that I want to pursue, I want to analyze a sample of tweets and compute a mood-index (e.g., sad, happy, etc.). I don't intend to invest too much time in implementing a sentiment-analysis tool and would like to know if there are any easy-to use (or easy-to-implement) library. Basically, I'm looking for something that's just a bit more complicated than the "count the number of :)s and :(s" strategy.

Any pointer is appreciated and any language is welcome (though Python is preferred).

Thanks.

asked Aug 15 '10 at 13:53

Ama%C3%A7%20Herda%C4%9Fdelen's gravatar image

Amaç Herdağdelen
1763813


4 Answers:

NLTK provides support for sentiment analysis. If you want to use Java, there is Lingpipe or Mallet. You should check out the Twitter Sentiment API.

BTW, Stackoverflow suggests a number of other options for twitter sentiment analysis in Python.

answered Aug 15 '10 at 14:27

spinxl39's gravatar image

spinxl39
3683114869

edited Aug 15 '10 at 14:47

Answers all my questions. Thanks!

(Aug 15 '10 at 15:29) Amaç Herdağdelen

You seem to be confusing mood ("am I happy or sad") and sentiment ("what are my feelings about X?"). They are not the same thing.

Gilad Mishne did some work on mood analysis a while ago.

What you probably want is a simple, supervised text classifier. This entails collecting some data set of happy/sad tweets which you know are happy/sad, and then training some of-the-shelf classifier (e.g. libsvm) on these examples. You could bootstrap your initial examples set by looking for tweets with strong happy/sad "anchors" (e.g. :-) and :-( in your example) and removing the anchors during training (just keep in mind that while :-( could be a robust indicator for "sad", :-) is not necessarily a robust indicator for "happy").

answered Sep 23 '10 at 11:56

yoavg's gravatar image

yoavg
74182131

You may try EmoLib.

answered Sep 13 '10 at 03:19

Alexandre%20Trilla's gravatar image

Alexandre Trilla
1

edited Sep 13 '10 at 03:44

Joseph%20Turian's gravatar image

Joseph Turian ♦♦
573051124145

I am rather surprised with your "leap of faith" in the Sentiment Analysis tools mentioned. Sentiment Analysis and Opinion Mining has garnered a lot of research interest around the world and it's still an "open" problem, more so for noisy (SMS/chat style) text. If your system is relying heavily on such a tool, I would suggest to go back to the drawing board and make an evaluation of the "off the shelf tool" which you are planning to use.

answered Sep 20 '10 at 07:18

Dexter's gravatar image

Dexter
416243438

edited Sep 20 '10 at 07:19

Thanks for the comment and the warning. As I said in the question, it was for a "toy project" which needed "something that's just a bit more complicated than the 'count the number of :)s and :(s' strategy". The aforementioned libraries largely conform to that requirement.

(Sep 22 '10 at 19:00) Amaç Herdağdelen
2

Great ! I did a similar "toy" project and found the 100 line code in Python (using NLTK) by stream hacker very useful. Since, your domain is tweets (which is an informal language .. SMS/Chat based communication) TweetFeel and other links posted on StackOverflow look very well suited to your task at hand.

(Sep 23 '10 at 05:13) Dexter
1

I guess you refer to the NLTK demo, found here: http://text-processing.com/docs/sentiment.html

I wasn't aware of it and it really looks useful. There is even an API (http://text-processing.com/demo/sentiment/) which lets you 100 requests per day. Thanks!

(Sep 26 '10 at 04:26) Amaç Herdağdelen

Amac, I was referring to this blog-post: http://streamhacker.com/2010/05/10/text-classification-sentiment-analysis-naive-bayes-classifier/ which shows NLTK "can" be used to build a simple sentiment analysis system. Anyways, it looks like StreamHacker is in some way involved with the above API too. However, considering your domain is Twitter which is a "noisy" domain a new dimension gets accrued to your task.

I would suggest using the Twitter Sentiment API (as posted by spinlx39 above) rather than the text-processing API. OR you can try out both and let us know which API gives you more accurate results? :-)

(Sep 26 '10 at 06:41) Dexter
Your answer
toggle preview

powered by OSQA

User submitted content is under Creative Commons: Attribution - Share Alike; Other things copyright (C) 2010, MetaOptimize LLC.