I have two classifiers that I am training on the same training set, and evaluating on the same test set. I want to investigate why one of the classifiers is performing better than the other. The main difference seems to be in the top predictions (i.e., in the precision-recall curve, the initial 10% or so recall, one of the classifiers has much better precision). What I want to see are either simple features or specific test samples where my bad classifier is giving a low score for True Posiitves which the good classifier is putting at the top of the list. Initially, I thought of taking, say True Positives in the top 5% of my good classifier that don't show up in the top 10% of my bad classifier. This seems really ad-hoc. Is there a way to, say rank these "interesting" samples, or to rank the Negatives that the bad classifier is giving good scores for?

In short, I am looking for approaches that can investigate which types of test points one classifier is performing significantly better or worse than the other.

asked Mar 13 '11 at 09:56

probreasoning's gravatar image

probreasoning
1215714

1

One simple approach that's so simple it doesn't even merit its own answer. For every feature (assuming there aren't tons of features) just plot the feature value against the score. You could color class-1 instances red and class-0 instances blue, and if there is a range where blue points get high scores, that would tell you if a particular feature is causing you trouble.

(Mar 13 '11 at 16:57) Troy Raeder

One Answer:

An interesting approach is to learn a classifier that predicts positive whenever a true element in your dataset is classified correctly by your good classifier and incorrectly by the bad classifier. Then you can inspect the structure of this new classifier to see which features or feature combinations (depending on the structure of your classifier) are more predictive of mistakes, in the line of this Liang and Klein paper on analysing the errors of unsupervised learning.

answered Mar 13 '11 at 10:08

Alexandre%20Passos's gravatar image

Alexandre Passos ♦
1896744214334

I should have mentioned that my classifiers only generate a score, and I am ranking the scores to get p-r and roc curves. I'll look at the paper though.

(Mar 13 '11 at 10:11) probreasoning
1

So a good strategy is to use as the positive class of the error classifier the true examples for which your good classifier produces a higher score than your bad classifier, assuming both scores are comparable. If they're not, then fix a recall level and use true/false examples (and maybe do this a few times).

(Mar 13 '11 at 10:13) Alexandre Passos ♦
Your answer
toggle preview

powered by OSQA

User submitted content is under Creative Commons: Attribution - Share Alike; Other things copyright (C) 2010, MetaOptimize LLC.