|
Hi all, Why is ROC a better evaluation method than simple misclassification error? Is it because ROC evaluates models at all cutoff points? Thanks in advance, Anton |
|
Another important feature of ROC is that it's insensitive to unbalanced datasets (e.g., with way more negative than positive samples). It also evaluates the classifier at a lot of possible tunings of the classifier, helping hide some effects you can get on the misclassification error by just calibrating the probabilities to one given dataset. Also, according to the comment by Mikael Huss, it can be better in highly skewed data sets to use a precision-recall curve rather than a ROC curve. I've realized similar things myself, when running small experiments, but it's nice to know it's empirically and theoretically grounded.
This answer is marked "community wiki".
1
Actually, it has been argued that the ROC curves can be somewhat misleading when applied to highly skewed datasets (e g Davis and Goadrich 2006, http://portal.acm.org/citation.cfm?id=11438749), so that precision-recall curves can be more informative for those cases. I am presently working on a problem like this where the ROC makes the classifier(s) look very good but P-R paints a different picture.
(Jul 07 '10 at 08:03)
Mikael Huss
Thanks for the reference, I updated the answer to reflect this.
(Jul 07 '10 at 08:16)
Alexandre Passos ♦
|
|
I think there are two reasons why the ROC curve (or similar other measures) can give more insight than the simple misclassification error:
|
|
Misclassification error (or accuracy) is a single number that summarizes the performance of your classifier for a given cutoff point/sensitivity. It can give you only so much information. You can't tell what are the trade offs of varying the cutoff point by looking at it. If you want to see the trade offs, you need something like the ROC curve. As you say, ROC curve characterizes the performance of the classifier for different cutoff points so it contains more information about the behavior of your classifier under different circumstances regarding the sensitivity. I'd say more information is better. Two classifiers with very different responses to varying sensitivity may have the same misclassification error for a given cutoff but their ROC curves will different and that's something you can (usually) measure. You can also try to summarize a ROC curve to a single number. Area under the ROC curve (AUC) is one way of doing it. If you don't know at which sensitivity your classifier will/should operate than AUC could be a more sensible way to summarize the performance. |