I have been using a bag-of-words approach for sentiment analysis. Previously I used to train a logistic regression model on these bag of words (basically an array of bag of words and their associated probability number from 0 to 1 goes into the regression model and after regression I can feed in a bag of word and get a numerical value) but I now want to consider things like "not good" or "not good at all" in my predictor set and I want to keep the regression model the same (as much as possible). Meaning that I still want to feed vectors to the regression model. What is the best way to replace my former predictor set with a new model that also considers more complicated n-gram models?

asked May 23 '11 at 04:18

Mark%20Alen's gravatar image

Mark Alen
1323234146

1

Adding n-gram features?

(May 23 '11 at 05:28) Alexandre Passos ♦

POStag features? Or a mixture of n-grams and POS tags ..

(May 23 '11 at 07:49) Svetoslav Marinov

2 Answers:

I think you might be looking for something like this. This actually extracts n-gram features. See if it works for you, and share your findings. I am also interested.

answered May 23 '11 at 08:42

Oliver%20Mitevski's gravatar image

Oliver Mitevski
872173144

On the topic of sentiment analysis using richer (than bag-of-words) features, you may find this paper interesting (uses bag-of-opinions, i.e., "not bad", "not so good", etc): The bag-of-opinions method for review rating prediction from sparse text patterns, COLING, 2010.

answered Jun 15 '11 at 04:46

Georgiana%20Ifrim's gravatar image

Georgiana Ifrim
1664414

Your answer
toggle preview

powered by OSQA

User submitted content is under Creative Commons: Attribution - Share Alike; Other things copyright (C) 2010, MetaOptimize LLC.