Hi,

let's say I have 2 datasets for binary classification. Dataset 1 is balanced (50% of samples in class -1 and 50% in class 1) and for dataset 2 (40% of samples in class -1 and 60% in class 1). For dataset 1 luck in at 50% accuracy but for dataset 2 luck is at 60%. Now let's assume my classifier gives by cross-validation 60% accuracy on dataset 1. What would be the equivalent accuracy for dataset 2? Of course getting 60% on dataset 2 would be worse so would it correspond to 65%, 67% ?

Is there a score (different than just the number of misclassified samples) that is immune to unbalanced problems?

thanks

asked Jan 08 '11 at 18:06

Alexandre%20Gramfort's gravatar image

Alexandre Gramfort
91237

edited Jan 08 '11 at 18:09

Just a quick question... why do you need to compare the results across two data sets? If the two data sets are from different domains, I'm not sure that it makes sense to compare the performance of one algorithm on both data sets as @Alexandre mentioned below. If they are from the same domain then the mere fact that one has 60% majority class and one has 50% should be irrelevant.

(Jan 08 '11 at 21:07) Troy Raeder

I am in a setting where the question is "how well can we predict a given target from such features?" and I'd like to have an idea if one target is more predictable than another one.

(Jan 08 '11 at 21:29) Alexandre Gramfort
1

That's an interesting question. A few points:

1) Even if we come up with a "fair" comparison, just because the model does better on Target A than on Target B doesn't necessarily mean that Target A is more predictable... it may just be that Target A is more suited to a different type of model. In other words, we might find that an SVM is better at Target A while a neural network is better at Target B.

2) If you're not concerned about that (i.e. if the model you're using is, for some reason, the Model That Needs To Be Used) then a permutation test might solve your problem. If you randomly permute the class labels while maintaining the level of class imbalance (i.e. 60% class-1) that will give you a baseline performance (by whatever metric) that is dependent only on the level of class imbalance and not the structure of the problem. A percentage improvement over this random baseline might give you what you're looking for. I'm not super-confident on that though... I'd be interested if anyone else has an opinion.

Some info on permutation tests for evaluation can be found in this paper:

http://jmlr.csail.mit.edu/papers/v11/ojala10a.html

(Jan 08 '11 at 21:59) Troy Raeder

3 Answers:

F-score (the harmonic mean of precision and recall) is not immune to unbalanced data sets, but it helps a lot avoiding this problem, as it deals directly with the false-positive and false-negative rate. You can also try AUC (area under the ROC curve), which is perhaps more often recommended.

It feels odd to compare the performance on two datasets, however.

answered Jan 08 '11 at 19:55

Alexandre%20Passos's gravatar image

Alexandre Passos ♦
1896744214334

ROC AUC is the first thing I tried but it seems that it's easier to get a good AUC with very unbalanced classes than with balanced classes. I'll give a try to f-score on these data.

(Jan 08 '11 at 21:32) Alexandre Gramfort

If you want to actually be able to compare them, I think at least you should zero mean them and adjust the variances (ala PCA). For the sets to be at least measurable comparable. Other way different sets, different domains, and of course different (non comparable) results

answered Jan 10 '11 at 20:18

Leon%20Palafox's gravatar image

Leon Palafox
31265471107

Thinking of this purely as a classification problem, you might consider either: the mutual information between predicted and the actual, or the conditional entropy of the actual, given the predicted. See pages 15 - 21 of "Elements of Information Theory", by Cover and Thomas:

"Elements of Information Theory", by Cover and Thomas (Chapter 2)

answered Jan 23 '11 at 05:04

Will%20Dwinnell's gravatar image

Will Dwinnell
312210

Your answer
toggle preview

powered by OSQA

User submitted content is under Creative Commons: Attribution - Share Alike; Other things copyright (C) 2010, MetaOptimize LLC.