Say that I've clustered a training dataset of 5 classes containing 1000 instances, to 5 clusters (centers) using for example k-means. Then I've constructed a confusion matrix by validating on a test dataset. I want then to use plot a ROC curve from this, how is it possible to do that ?

asked Mar 27 '12 at 08:35

shn's gravatar image

shn
462414759


One Answer:

The traditional ROC curve is defined for binary classification. David Hand has an extension for multi-class area under the ROC curve, but personally I don't like it. (I think it is prone to give too much credit for discriminating between pairs of classes that are nothing alike.)

If you have a binary prediction task, it is best if you can rank the predictions by some score (e.g., distance to the closest centroid for class 1). The ROC curve is drawn by varying a threshold over the score. You can still draw a curve with crisp predictions, but it will be plotting the point you have for your confusion matrix, and connecting that to the (0,0) and (1,1) corners of the curve with line segments.

There are multiple software packages that will generate a ROC curve, so you don't need to implement it yourself. I personally use PERF; it has the nice property of being a small standalone program.

UPDATE:

Just after posting this I found Peter Flach's tutorial slides on Multi-class ROC. Looks like more has been done in this area than I thought.

answered Mar 27 '12 at 13:21

Art%20Munson's gravatar image

Art Munson
64611316

edited Mar 27 '12 at 13:28

Your answer
toggle preview

powered by OSQA

User submitted content is under Creative Commons: Attribution - Share Alike; Other things copyright (C) 2010, MetaOptimize LLC.