Hello,

I am training a Naive Bayes classifier to do sentiment analysis. The inputs are bag-of-word representations of sentences and the outputs either positive or negative. My question is if - and if so why and under what assumptions - it would be permissible to state the following: If the posterior class probabilities are close to 0.5 the input sentence is "neutral". In other words, lack of confidence in either class is interpreted as the sentence being somewhere between the two polarities.

Someone suggested this to me and my reflex was to think that the only proper way of taking into account all three sentiments would be to consider the multiclass problem where the training sentences are labeled positive, neutral, negative.

However, when I thought about it more I couldn't find reasons to reject this idea which in turn led me to post here. So what's the deal?

Also, let's say I really had 3 classes in the training data - what kind of ordinal regression method could I test?

Thank you in advance,

/David

asked Dec 27 '10 at 11:56

David%20the%20Dude's gravatar image

David the Dude
61458

Just a small note: naive-bayes derived probabilities are usually really far from 0.5, due to the independence assumptions being grossly violated, so if you want to do some thresholding I really recommend you switch to something less biased such as logistic regression.

For multiclass classification there are literally dozens of different approaches, and you will find them in any machine learning textbook. An easy to use implementation of multiclass classification is in libsvm.

(Dec 27 '10 at 22:14) Alexandre Passos ♦

Hi Alexandre,

Thanks for your reply. Sure, Naive-Bayes is more like the "If it works it works" approach. Good call on the probabilities rarely being close to 0.5, I never really checked but it's true. As for multi-class classification: I would like to take into account the ordering in the output labels or rather the assumed continuous relation that "more positive" in the inputs implies "more positive" in the output. I believe that this is a bit different from off-the-shelf multi-class classification schemes, if there is such a thing.

Thanks again,

/David

(Dec 28 '10 at 04:19) David the Dude

I'm not sure I understand what you mean by "more positive", but the simplest multiclass linear classifier (softmax logistic regression) will do two things that could be what you mean: (1) if there's a feature that stongly implies class C, if a document has more of that feature it will always be pushed harder to class C, (2) it will give you a probability for each class, so you can order labels, sample labels, etc.

Another way you might go about trading off don't-know against positive or negative answers is using decision theory. If you have a minimally reliable probability of an example being positive plus the cost of a false positive, false negative, false unknown, etc, you can compute thresholds that minimize the expected cost. Yet another approach, from this year's NIPS, is "Trading off mistakes and don't know predictions", http://books.nips.cc/papers/files/nips23/NIPS2010_1297.pdf , which does this online if you specify some threshold on the number of mistakes allowed.

(Dec 28 '10 at 07:37) Alexandre Passos ♦

Probably poor choice of wording on my part. I was trying to point out that for the problem I am considering there is an ordering between outputs: postive > neutral > negative thus it bears some characteristics of a regression problem. Multiclass classification per se does not take this ordering into account.

(Dec 28 '10 at 08:08) David the Dude

I see. So you're probably better off either (a) training discriminatively on positive versus negative and thresholding according to the cost of mistakes (or doing like that nips paper) or (b) if you have neutral examples incorporate them in the loss function somehow (the easiest way is to insert each neutral example twice in the training set, once with each class, so that in practice you're regularizing their probabilities to be close to 0.5; of course you could also look into weighting the neutrals differently from the positive and negative examples if there are too many neutral examples).

(Dec 28 '10 at 08:11) Alexandre Passos ♦
Be the first one to answer this question!
toggle preview

powered by OSQA

User submitted content is under Creative Commons: Attribution - Share Alike; Other things copyright (C) 2010, MetaOptimize LLC.