|
For a toy project that I want to pursue, I want to analyze a sample of tweets and compute a mood-index (e.g., sad, happy, etc.). I don't intend to invest too much time in implementing a sentiment-analysis tool and would like to know if there are any easy-to use (or easy-to-implement) library. Basically, I'm looking for something that's just a bit more complicated than the "count the number of :)s and :(s" strategy. Any pointer is appreciated and any language is welcome (though Python is preferred). Thanks. |
|
NLTK provides support for sentiment analysis. If you want to use Java, there is Lingpipe or Mallet. You should check out the Twitter Sentiment API. BTW, Stackoverflow suggests a number of other options for twitter sentiment analysis in Python. Answers all my questions. Thanks!
(Aug 15 '10 at 15:29)
Amaç Herdağdelen
|
|
You seem to be confusing mood ("am I happy or sad") and sentiment ("what are my feelings about X?"). They are not the same thing. Gilad Mishne did some work on mood analysis a while ago. What you probably want is a simple, supervised text classifier. This entails collecting some data set of happy/sad tweets which you know are happy/sad, and then training some of-the-shelf classifier (e.g. libsvm) on these examples. You could bootstrap your initial examples set by looking for tweets with strong happy/sad "anchors" (e.g. :-) and :-( in your example) and removing the anchors during training (just keep in mind that while :-( could be a robust indicator for "sad", :-) is not necessarily a robust indicator for "happy"). |
|
I am rather surprised with your "leap of faith" in the Sentiment Analysis tools mentioned. Sentiment Analysis and Opinion Mining has garnered a lot of research interest around the world and it's still an "open" problem, more so for noisy (SMS/chat style) text. If your system is relying heavily on such a tool, I would suggest to go back to the drawing board and make an evaluation of the "off the shelf tool" which you are planning to use. Thanks for the comment and the warning. As I said in the question, it was for a "toy project" which needed "something that's just a bit more complicated than the 'count the number of :)s and :(s' strategy". The aforementioned libraries largely conform to that requirement.
(Sep 22 '10 at 19:00)
Amaç Herdağdelen
2
Great ! I did a similar "toy" project and found the 100 line code in Python (using NLTK) by stream hacker very useful. Since, your domain is tweets (which is an informal language .. SMS/Chat based communication) TweetFeel and other links posted on StackOverflow look very well suited to your task at hand.
(Sep 23 '10 at 05:13)
Dexter
1
I guess you refer to the NLTK demo, found here: http://text-processing.com/docs/sentiment.html I wasn't aware of it and it really looks useful. There is even an API (http://text-processing.com/demo/sentiment/) which lets you 100 requests per day. Thanks!
(Sep 26 '10 at 04:26)
Amaç Herdağdelen
Amac, I was referring to this blog-post: http://streamhacker.com/2010/05/10/text-classification-sentiment-analysis-naive-bayes-classifier/ which shows NLTK "can" be used to build a simple sentiment analysis system. Anyways, it looks like StreamHacker is in some way involved with the above API too. However, considering your domain is Twitter which is a "noisy" domain a new dimension gets accrued to your task. I would suggest using the Twitter Sentiment API (as posted by spinlx39 above) rather than the text-processing API. OR you can try out both and let us know which API gives you more accurate results? :-)
(Sep 26 '10 at 06:41)
Dexter
|