|
I want to infer the rating that people have give to products on amazon by looking at their comment only. I want to start with a simple model (say using SVM on a featurized vector of the text) I am wondering what would be the best way to featurize the comment so I can captures negations properly (things like: "I do not love this product". that is basically a negative sentence but if I just do a bag of words I would not capture it) |
|
One interesting thing to do is using features from dependency parses. You can, for a small set of negation words, add features like WORD-is-headed-by-not or WORD-is-modified-by-ADVERB, and similar things. A simpler trick would be just using a known set of negation words and adding interaction features on all of them and all other words in sentences in which they occur. On the other extreme you have the dependency tree-based sentiment paper in ACL 2010. |
|
In addition to your bag-of-words baseline, I'd also use a baseline that uses word bigrams (two consecutive words) as features. These features capture simple forms of negation. J. Dillards thesis on valence shifters is also a recommended reading [1]. [1] http://www.tacoma.washington.edu/tech/docs/research/gradresearch/ldillard.pdf |
|
I suggest you take a look at the work by Isaac Councill et al. on negation detection. However, I would suspect that including negations have a relatively strong positive effect when using lexicon based methods, while the effect is somewhat smaller if you do supervised training on a BoW, or richer, representation. The reason is that sentences such as "I do not love this product" are not that common. Instead of negating a strictly positive word, often another construction is used, which typically involves other words that are not used to communicate positive sentiment. At least this is my experience from looking at review text. This is supported by the dependency paper I cited in my answer: even though there is always a gain in using more semantic features the blunt of the performance is usually attained by a simple bag-of-words model with some extra clever features thrown in the mix.
(Oct 18 '11 at 22:45)
Alexandre Passos ♦
|