6
4

I have been working with SVMs for a while now, and while I do see their quality, I am wondering why the machine learning community is so much focused on SVMs, and not on for example (kernelized) logistic regression (KLR) or the import vector machine? AFAIK, their classification performance is comparable, but KLR provides probabilities, and is easier to implement.

Is due to the run-time complexity of the training algorithms? Or perhaps due to simplified theoretical bounds?

asked Sep 22 '11 at 04:03

Bwaas's gravatar image

Bwaas
136138

I wasn't aware of the import vector machine. Assuming its as good as it claims, I'm going to say that the difference comes in momentum/inertia (so much more has been invested in the SVM, and the IVM doesn't appear to offer an easy fundamental leap), the earlier introduction of the SVM, and of course the fantastic notion of large margin in kernel space that Vapnik brought to the fore (with a soft margin, VC bounds don't have the same oomph, but the hard margin is still theoretically fantastic and soft margin is a deliberate analogy).

(Mar 14 '12 at 10:03) Jacob Jensen

5 Answers:
13

SVMs vs Logistic Regression:

I think the story is completely different if you're working with a linear kernel or a non-linear kernel. With a non-linear kernel, Kernel Logistic Regression is not a viable solution (unless you sparsify it one way or another). With a linear kernel, SVMs and Logistic Regression should have similar performance. If you use SGD for optimization, one difference though is that Logistic Regression updates the hypothesis at every iteration while SVMs update only when they make a prediction which is not good enough. This leads to extra computational cost (that with the fact that the log loss requires to compute exponentials). By the way, if you just want probabilistic output, you can use an SVM and learn a Logistic Regression on the SVM's output.

SVMs vs Decision Trees or ANNs:

Compared to other classifiers such as Decision Trees or ANNs, SVMs are often described as a black box classifier: the user doesn't need to make many decisions (except for the regularization hyperparameter) and doesn't need to know how they work. Also, when combined with kernels, SVMs can learn a non-linear hypothesis while keeping the objective convex.

answered Sep 22 '11 at 08:15

Mathieu%20Blondel's gravatar image

Mathieu Blondel
119121615

edited Sep 22 '11 at 08:25

I agree. Somehow I don't like the idea of training a LR on top of a SVM --- I think we should be able to solve the problem in a single step.

(Sep 22 '11 at 09:25) Bwaas
4

People use SVMs because of the high quality free software that exists for training them and running them. Convenient and useful software is the number one way to get an algorithm used, as long as it has some inherent use.

(Mar 14 '12 at 01:07) gdahl ♦
1

The LR on top is nice, but a hack in the end: it is not the likelihood of the data that is maximized during the learning process.

(Aug 30 '12 at 15:50) Justin Bayer

I think that random forests and SVMs are about equivalent in their "blackboxiness". I might even argue that random forests is simpler to use out of the box, because you don't need to do any feature whitening.

(Sep 13 '12 at 04:42) Joseph Turian ♦♦

SVMs promote sparsity in the dual space. Thus, most of the training data points should not be support vectors. In other words, the decision boundary is determined only by a subset of the training data points. Kernelized logistic regression, on the other hand, does not promote sparsity in the dual space and thus the decision boundary is usually determined by all training data points.

This should affect test time significantly (training time too) since in order to make predictions one has to evaluate the kernel function with all training points that determine the decision boundary.

answered Sep 22 '11 at 04:33

Peter%20Prettenhofer's gravatar image

Peter Prettenhofer
5251911

edited Sep 22 '11 at 06:28

This was what I suspected. That is why I mentioned the import vector machine, which demonstrates that sparse kernel logistic regression is feasible. Still, it is not very popular.

(Sep 22 '11 at 09:21) Bwaas

I wasn't aware of that - thanks!

(Sep 22 '11 at 09:43) Peter Prettenhofer

SVMs also tend to work well in very high dimensional features spaces. For example, if you have 10K training examples embedded in a 50K dimensional feature space, SVMs will tend to work better than methods like Logistic Regression. The theoretical bounds on SVMs depend only on the intrinsic dimensionality of the problem, and not the dimensionality of the feature space the data points are embedded in.

answered Sep 22 '11 at 09:04

Yisong%20Yue's gravatar image

Yisong Yue
58631020

But, isn't working in kernel space the reason why SVMs are able to deal with very high-dimensional data? I.e., KLR would be equally fit to these problems (although the lack of sparsity might become a problem).

(Sep 22 '11 at 09:24) Bwaas
1

The lack of sparsity, in addition to making prediction slow, can also hinder generalization performance: a dense solution can overfit.

(Sep 22 '11 at 11:48) Mathieu Blondel

That is a very interesting notion; I didn't think of that. Of course the (L2) regularizer should reduce overfitting, but indeed it would not promote sparse solutions in the dual --- which might have consequences on the performance.

(Sep 22 '11 at 12:31) Bwaas

In the same vein, I would like to ask wherein SVM is not so popular (or maybe unpopular)? Obviously SVM is not a solution for all ML problems. Can people think of specific settings or applications where SVMs are not "the first thing to try that will probably work very well"? I can think one from the top of my head -- structured prediction (correct me if I am wrong).

answered Apr 08 '12 at 04:12

IdleBrain's gravatar image

IdleBrain
91337

I strongly suggest you to make this a separate question.

(Aug 20 '12 at 09:14) Lucian Sasu

I don't think svm could be used for unsupervised learning.

(Sep 16 '12 at 01:56) Comptrol

Depending on your exact definition of unsupervised learning you can. It's called 1 class SVM and can be used for outlier detection etc.

(Sep 16 '12 at 14:37) Justin Bayer

A reason that I haven't seen mentioned so far is that, in my experience, linear SVMs tend to have better robustness compared to logistic regression, in particular with regard to the choice of the regularization parameter.

answered Aug 29 '12 at 03:01

Gael%20Varoquaux's gravatar image

Gael Varoquaux
92141426

2

Funny, Hal Daume III has the opposite experience: http://nlpers.blogspot.jp/2009/08/classifier-performance-alternative.html

(Aug 29 '12 at 09:09) Mathieu Blondel
Your answer
toggle preview

powered by OSQA

User submitted content is under Creative Commons: Attribution - Share Alike; Other things copyright (C) 2010, MetaOptimize LLC.