|
What is the best method to evaluate a classifier that places instances into more than two classes? Am I correct in thinking that it makes a difference if the classes are nominal, ordinal or cardinal. Thanks in advance. |
|
Cardinal and nominal mean the same thing in this context. If the classes are well balanced, accuracy is fine, but if they aren't, you can measure precision/recall/f-measure for each class, and report a mean of those values overall (or a per-class value, if you have access to it). For ordinal classes, it's the more complex problem of learning to rank, and the proper way to measure depends on the loss of the real world. Usually one evaluates a group of items at once, having a ML system produce a ranked list and computing a loss function on that list. If you really want to evaluate each example on its own, the techniques from the previous paragraph are appropriate. If not, there are many common ranking losses, such as precision@K, NDCG, and others that penalize the top of the list more strongly. If any element in any wrong position is just as bad, Kendall's tau is a common thing to report. |
|
If you are talking about a multi-label classifier, you might want to look at some set/subset metrics like symmetric set difference. There has been some work done by the Mulan project. |