I want to infer the rating that people have give to products on amazon by looking at their comment only.

I want to start with a simple model (say using SVM on a featurized vector of the text) I am wondering what would be the best way to featurize the comment so I can captures negations properly (things like: "I do not love this product". that is basically a negative sentence but if I just do a bag of words I would not capture it)

asked Oct 17 '11 at 15:48

Mark%20Alen's gravatar image

Mark Alen
1323234146


3 Answers:

One interesting thing to do is using features from dependency parses. You can, for a small set of negation words, add features like WORD-is-headed-by-not or WORD-is-modified-by-ADVERB, and similar things. A simpler trick would be just using a known set of negation words and adding interaction features on all of them and all other words in sentences in which they occur. On the other extreme you have the dependency tree-based sentiment paper in ACL 2010.

answered Oct 17 '11 at 16:04

Alexandre%20Passos's gravatar image

Alexandre Passos ♦
2554154278421

In addition to your bag-of-words baseline, I'd also use a baseline that uses word bigrams (two consecutive words) as features. These features capture simple forms of negation. J. Dillards thesis on valence shifters is also a recommended reading [1].

[1] http://www.tacoma.washington.edu/tech/docs/research/gradresearch/ldillard.pdf

answered Oct 18 '11 at 04:37

Peter%20Prettenhofer's gravatar image

Peter Prettenhofer
5251911

I suggest you take a look at the work by Isaac Councill et al. on negation detection. However, I would suspect that including negations have a relatively strong positive effect when using lexicon based methods, while the effect is somewhat smaller if you do supervised training on a BoW, or richer, representation. The reason is that sentences such as "I do not love this product" are not that common. Instead of negating a strictly positive word, often another construction is used, which typically involves other words that are not used to communicate positive sentiment. At least this is my experience from looking at review text.

answered Oct 18 '11 at 05:57

Oscar%20T%C3%A4ckstr%C3%B6m's gravatar image

Oscar Täckström
2039133450

edited Oct 18 '11 at 06:02

This is supported by the dependency paper I cited in my answer: even though there is always a gain in using more semantic features the blunt of the performance is usually attained by a simple bag-of-words model with some extra clever features thrown in the mix.

(Oct 18 '11 at 22:45) Alexandre Passos ♦
Your answer
toggle preview

Subscription:

Once you sign in you will be able to subscribe for any updates here

Tags:

×2

Asked: Oct 17 '11 at 15:48

Seen: 1,394 times

Last updated: Oct 18 '11 at 22:45

powered by OSQA

User submitted content is under Creative Commons: Attribution - Share Alike; Other things copyright (C) 2010, MetaOptimize LLC.