|
I am using python's scikits.learn library with a linear svm (SVC, since LinearSVC is even slower). I've heard libsvm, which scikits.learn interfaces with, is supposed to be very fast, but I'm finding that my 10000 sample, 10000 feature (dense) vector is taking almost an hour to run! What can I do to speed this up, short of switching to a different language or (even worse) writing my own svm implementation? |
|
Originally libsvm and liblinear were designed to work with sparse representations. In scikit-learn though, libsvm was patched to be able to work directly with dense representations too. This hasn't be the case for liblinear. So, internally, scikit-learn always convert dense matrices to a sparse representation when using LinearSVC. This may well be the reason you see such a difference in performance. Sweet! This also means that if I take the trouble to re-implement their algorithm myself things should work well, right?
(Jul 31 '11 at 02:19)
Jacob Jensen
1
Possibly yes but give the SGD module in scikit-learn a shot too as it can work with dense representations directly.
(Jul 31 '11 at 04:05)
Mathieu Blondel
Relating to patching libsvm and not liblinear: There is a patch here: http://www.cs.berkeley.edu/~smaji/projects/digits/ but I have no dataset to test it on. I used it on mnist and saw no difference. Does someone know this patch?
(Sep 18 '11 at 11:40)
Andreas Mueller
If I'm not mistaken, MNIST is a sparse dataset (although not that sparse). It'd be better to try it on another dataset (more dense and more high dimensional).
(Sep 18 '11 at 15:56)
Mathieu Blondel
That's what I figured... I just didn't know any in SVMLight format that would be suitable and thought maybe someone has some data lying around to try it out.
(Sep 18 '11 at 15:58)
Andreas Mueller
|
|
Not an answer, but fwiw a couple of runtimes for scikits.learn libsvm / liblinear, just to give an idea of how they vary with kernel. ("The most important parameter is the one you haven't thought of.")
The data is Mnist from http://deeplearning.net/data/mnist/mnist.pkl.gz, train the first 5k, test the last 5k of the 70k vecs. My versions: python 2.6.4 numpy 1.6.0 scikits.learn 0.8.1, mac ppc. http://peekaboo-vision.blogspot.com/2010/09/mnist-for-ever.html gets to 98 % -- rbf kernel, default tol .001 (LinearSVC .0001), runtimes unknown. To see how runtimes vary with number of features, one could just use np.repeat( X, repeat, axis=1 ). Again fwiw, here's the code snippet. (It took me a while to realize that, while LinearSVC has multi_class=, SVC is automatically multiclass if y has > 2 values; but it's not clear which multiclass algorithm it then uses.)
About my blogpost: I used a RBF kernel .. which I didn't mention explicitly, but I gave the gamma ;) (fixed now). The results don't really compare, though, since I used all the 60000 training images for training, not only the first 5000. The tolerance was the standard tolerance of the libsvm command line interface, which is 0.001. I don't know the runtime any more but It was quite long as you can imagine ;) btw, SVC uses 1 vs rest multi-class classification afaik. I find your results a little misleading, as far as the accuracy is concerned (the runtime comparison seems pretty reasonable). To compare different kernels in terms of accuracy I feel that you have to do grid searches to find optimal parameters.
(Aug 09 '11 at 11:01)
Andreas Mueller
Andreas, sure, any timing results are misleading, incomplete, ymmv; I just wanted to give a rough rough idea of the large range of runtimes and accuracies here. If some sgd? svm is 10* faster, then that way is up.
(Aug 09 '11 at 12:31)
denis
Yeah I (somewhat) agree. The liblinear should be way faster, which can be seen from your experiments.... but the rbf svm should be a lot better - and also a lot slower - than the linear one. And the "better" part is not so clear from your results ;)
(Aug 09 '11 at 12:45)
Andreas Mueller
These are in line with my runtimes. At least I'm not doing something super-wrong.
(Aug 09 '11 at 16:39)
Jacob Jensen
Jacob, clf = linear_model.SGDClassifier() looks waaay faster. If you would ask about "experience with RFE", I'd put up some numbers on RFE with sgd too.
(Aug 12 '11 at 03:52)
denis
|
|
I totally agree with Lucas that the data set is pretty large. I strongly recommend to drastically reduce the number of features. Liblinear is one of the best SVM solvers out there. Compared to SGD it is really user-friendly: it does not require careful hyper-parameter tuning but you need to keep some things in mind (see LIBSVM guide). Most importantly: center your data since SVMs are not scale invariant. (If you haven't done it, this should be the first thing you should try!) Furthermore, did you use the default parameters for SVC or did you use kernel="linear"? The default kernel of SVC is non-linear (it's a poly kernel). If your data is non-linear it is plausible that SVC converges faster because LinearSVC has a hard time finding a reasonable linear fit (in this case use a smaller C parameter for LinearSVC). Good luck! best, Peter 1
The default kernel of
(Jul 19 '11 at 14:33)
ogrisel
I'm using libsvm kernel = linear. I have tried liblinear, and have not been satisfied with the results (worse and slower, for some reason). Currently I'm in a prototyping phase, so I'll suck it up and use libsvm, but I'll probably make my own svm implementation later.
(Jul 19 '11 at 15:36)
Jacob Jensen
Hey Peter, I have noticed too that liblinear is fairly robusted in terms of setting the C parameter on many datasets. Is their a theoretical justification for this?
(Jul 30 '11 at 06:09)
Gael Varoquaux
Peter, Gael, what do we have for big dense testcases, binary / multiclass ?
(Jul 30 '11 at 10:19)
denis
I do not know, this is a bit far from what I personally work on. I'd be interested by the answer though.
(Jul 31 '11 at 04:50)
Gael Varoquaux
|
|
I'm not sure whether this switch is exposed in scikits but liblinear is able to switch between optimising the primal and the dual. You should try try that. Other than this, I don't think there is much that you can do. And I highly doubt that you can write a SVM that is faster than LibLinear. You might try stochastic gradient descent methods for linear classification. I haven't done much with them myself but I think they should be faster. Cheers, Andy Clearly, go for a library that supports stochastic gradient descent.
(Jun 15 '11 at 19:52)
levesque
Stochastic Gradient Descent is implemented in the scikit-learn: http://scikit-learn.sourceforge.net/stable/modules/sgd.html It's probably a good idea to try it.
(Jul 30 '11 at 06:07)
Gael Varoquaux
|
|
Have you considered using their logistic regression implementation instead? The method is not too different, but it's optimized with SGD. 1
It depends. They are different solvers for the various losses. Both the SVM problem and the logistic regression can be solved using SGD: http://scikit-learn.sourceforge.net/modules/sgd.html it's a matter of changing the loss.
(Jun 18 '11 at 11:27)
Gael Varoquaux
|
|
Before switching to another library you should try to increase the tolerance of the convergence criterion of your model. For instance, most of the time in scikit-learn this is a parameter called As Peter said: don't forget to scale your data. Have look at the chapter on data preprocessing. You can also try to reduce the dimensionality by transforming you data with:
Then you can train your
I have scaled the data, and I have tried messing with the tolerance. The problem is, lower tolerance significantly lowers accuracy in my case.
(Jul 19 '11 at 15:37)
Jacob Jensen
Then dimensionality reduction is probably the way to go. You should also try SGDClassifier with
(Jul 19 '11 at 16:07)
ogrisel
|
|
I don't have much experience with using dense data like that, but I'm not surprised that it takes a long time to run on a 100,000,000 entry dense matrix! You might try some feature selection to reduce the dimensionality of the input. In my experience, the feature selection in scikits.learn can run out of memory on inputs that large, though you can easily alter the code used here to run in an online fashion to improve its memory efficiency. |