|
What SVM packages are capable of handling large datasets? In particular I'd like to use the RBF kernel to perform nonlinear classification? My "large data set" is on the order of hundreds of thousands of data points. I've found libsvm and svmlight, but am sure there are others. What would people recommend? |
|
For large data sets, I tend to default to LIBLINEAR, but if you really need the properties of an RBF kernel that is not going to work for you. A colleague of mine who was training discriminant models for speech recognition said that he could never get LibSVM to converge on his data, but switching to Core Vector Machines drastically reduced training time and gave him a large bump in performance. |
|
Third option: approximate the RBF kernel and use a linear learner such as LIBLINEAR. In my experience, a 100k samples will kill LibSVM. |