I am assessing a bunch of classification algorithms for a specific application with multiple classes. The classification algorithms that I am considering are:

  1. Multinomial Logistic Regression (Matlab's 'mnrfit')
  2. Multiclass SVM (K. Crammer and Y. Singer. On the Algorithmic Implementation of Multi-class SVMs, JMLR, 2001.). I will use the code provided by the authors since Matlab's 'svmtrain' only does binary classification.
  3. Neural Networks (Matlab's 'nprtool')
  4. Decision Trees (C4.5 and CART from Matlab's 'classregtree')
  5. k-Nearest Neighbors (Matlab's 'ClassificationKNN')
  6. Naive Bayes Classifier (Matlab's 'naiveBayes')
  7. Discriminant Analysis (Matlab's 'ClassificationDiscriminant')
  8. Random Forests (Matlab's 'TreeBagger')

I have the following questions:

  1. Have I omitted any "obvious" multiclass classification algorithm that's a must-try? Or, are there any binary classifiers that can easily be used for multiclass with one-vs-all method.
  2. Which of these are linear classifiers and which are non-linear classifiers? I know 3, 4 and 5 are non-linear by nature and 2 can be non-linear with non-linear basis functions and the kernel trick.
  3. Which of these are discrete classifiers and which are probabilistic? For example, logistic regression gives a probability for each class, while decision trees give exactly one class. On a related note, if I want to use ROC curves for comparison, how do I compare a discrete classifier with a probabilistic one, as the former gives only a point on the ROC plot?
  4. Which of these are deterministic classifiers and which are stochastic? In other words, which classifiers will yield exactly the same results for multiple runs? This is important for me: For example in logistic regression I can be done with just one run, whereas for neural networks, I will have to train the net multiple times to avoid biased results.

Finally, are there any good papers which show how to compare various classification algorithms. I found this one which is quite good.

asked Nov 11 '13 at 17:47

Prometheus's gravatar image

Prometheus
1112


One Answer:
  1. Any binary classifier can be used for multiclass with the 1-vs-all reduction, or the all-vs-all reduction. This list seems to cover most of the common multiclass algorithms.

  2. Logistic regression and SVMs are linear (though SVMs are linear in kernel space). Neural networks, decision trees, and knn aren't lineasr. Naive bayes and discriminant analysis are linear. Random forests aren't linear.

  3. Logistic regression can give you calibrated probabilities. So can many SVM implementations (though it requires slightly different training). Neural networks can do that too, if using a right loss (softmax). Decision trees and KNN can be probabilistic, though are not particularly well calibrated. Naive bayes does not produce well calibrated probabilities, nor does the discriminant analysis. I'm not sure about random forests, depends on the implementation I think.

  4. All are deterministic except for neural networks and random forests.

Why do you want to compare different classification algorithms? Are you trying to decide which one is the best in general, or just for one application?

If the former, it's not worth doing it, as most claims are rather sketchy and there is no method which can give that kind of conclusion. If the latter, it is well accepted that cross-validation, or comparing performance on a fixed test-set, gives you unbiased results. For multiclass classification it is not always obvious which metric to use, but things like accuracy; per-class precision/recall/f1, per-class AUC, and the confusion matrix are commonly used.

answered Nov 12 '13 at 15:34

Alexandre%20Passos's gravatar image

Alexandre Passos ♦
2554154278421

Thanks for your answer. I've been eagerly waiting for one.

I am now using three variants of multi-class SVM, namely one-vs-all (my own), one-vs-one (my own) and LibSVM (http://www.csie.ntu.edu.tw/~cjlin/papers/libsvm.pdf).

I am testing them for a specific application. I am using error rate on test set for comparisons (as its the easiest to compute) and also because AUC is tricky to calculate for discrete classifiers .

(Nov 13 '13 at 11:01) Prometheus
Your answer
toggle preview

powered by OSQA

User submitted content is under Creative Commons: Attribution - Share Alike; Other things copyright (C) 2010, MetaOptimize LLC.