|
I have been using a bag-of-words approach for sentiment analysis. Previously I used to train a logistic regression model on these bag of words (basically an array of bag of words and their associated probability number from 0 to 1 goes into the regression model and after regression I can feed in a bag of word and get a numerical value) but I now want to consider things like "not good" or "not good at all" in my predictor set and I want to keep the regression model the same (as much as possible). Meaning that I still want to feed vectors to the regression model. What is the best way to replace my former predictor set with a new model that also considers more complicated n-gram models? |
|
I think you might be looking for something like this. This actually extracts n-gram features. See if it works for you, and share your findings. I am also interested. |
|
On the topic of sentiment analysis using richer (than bag-of-words) features, you may find this paper interesting (uses bag-of-opinions, i.e., "not bad", "not so good", etc): The bag-of-opinions method for review rating prediction from sparse text patterns, COLING, 2010. |
Adding n-gram features?
POStag features? Or a mixture of n-grams and POS tags ..