Background / Question

I am trying to create a SVM using Scikit-learn. I have a training set (here is the link to it https://dl.dropboxusercontent.com/u/9876125/training_patients.txt) which I load and then use to train the SVM. The training set is 3600 lines long. When I use all 3600 tuples the SVM never finishes training.... BUT when I only use the first 3594 tuples it finishes training in under a minute. I've tried using a variety of different sized training sets and the same thing continues to happen... depending on how many tuples I use the SVM either trains very quickly or it never completes. This has led me to the conclusion that the SVM is having difficulty converging on an answer depeding on the data.

Is my assumption about this being a convergence problem correct? If so, what is the solution? If not, what other problem could it be?

My Code

import pylab as pl  # @UnresolvedImport
from sklearn.datasets import load_svmlight_file

print(doc) import numpy as np from sklearn import svm, datasets

print "loading training setn" X_train, y_train = load_svmlight_file("training_patients.txt")

h = .02 # step size in the mesh C = 1.0 # SVM regularization parameter

print "creating svmn" poly_svc = svm.SVC(kernel='poly', cache_size=600, degree=40, C=C).fit(X_train, y_train)

print "all done"

asked Oct 22 '13 at 15:16

conrad%20sykes's gravatar image

conrad sykes
0112


One Answer:

Update ---- Solved Partially

I have confirmed that YES it was a convergence problem. I also discovered that a "partial solution" is to pass the parameter max_iter = # during the creation of the svm which limits the number of iterations the SVM will go through during creation and thus can prevent an infinite loop situation.

BUT.... when I do this, instead of the SVM looping forever it now prints the following warning

usr/local/lib/python2.7/dist-packages/sklearn/svm/base.py:206: ConvergenceWarning: Solver terminated early (max_iter=400000). Consider pre-processing your data with StandardScaler or MinMaxScaler. % self.max_iter, ConvergenceWarning)

While this is acceptable I would like to avoid this. And YES I have already standardized my data (used the Z-score method).

If anyone has any advice on what to do next, it would be appreciated. Right now I'm just ignoring the warning.

answered Oct 23 '13 at 21:55

conrad%20sykes's gravatar image

conrad sykes
0112

Your answer
toggle preview

powered by OSQA

User submitted content is under Creative Commons: Attribution - Share Alike; Other things copyright (C) 2010, MetaOptimize LLC.