I'm looking to use scikit-learn's SVM libraries with a custom string kernel. I have a string kernel function that computes the distance between strings, like so:

>>> substring_kernel("this is a test string", "so is this")
0.6765380761985251

>>> substring_kernel("a different string", "and another")
0.2638510175945378

(Essentially, it's just a function that takes two strings as input and returns a float.) And I have some test data:

X = array(["test sentence for SVM", "another example"])
Y = array([0, 1])

I want to train a SVM classifier on this data, like so:

from sklearn import svm
svm_model = svm.SVC(kernel=substring_kernel)
svm_model.fit(X, Y)

However, this returns the error:

>>> svm_model.fit(X, Y)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/sklearn/svm/base.py", line 139, in fit
    X = atleast2d_or_csr(X, dtype=np.float64, order='C')
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/sklearn/utils/validation.py", line 134, in atleast2d_or_csr
    "tocsr", force_all_finite)
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/sklearn/utils/validation.py", line 111, in _atleast2d_or_sparse
    force_all_finite=force_all_finite)
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/sklearn/utils/validation.py", line 91, in array2d
    X_2d = np.asarray(np.atleast_2d(X), dtype=dtype, order=order)
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/numpy/core/numeric.py", line 320, in asarray
    return array(a, dtype, copy=False, order=order)
ValueError: could not convert string to float: test sentence for SVM

As far as I can tell, scikit-learn's implementation of SVM is the root cause of the error because it tries to coerce all attributes to numpy.float64 (but I'm not completely sure that this is the case). So, what exactly is going on, and can the problem be avoided?

asked Oct 22 '13 at 19:12

Bill%20M's gravatar image

Bill M
16225


One Answer:
-1

I'm not sure I can answer your question but I do have a couple links worth checking out.

  1. http://scikit-learn.org/dev/modules/feature_extraction.html#text-feature-extraction

  2. http://scikit-learn.org/stable/auto_examples/document_classification_20newsgroups.html

Also I'm currently using the scikit-learn library and am learning about it and SVM for my thesis research. I could sure use a friend to bounce ideas off of and discuss sci-kit learn. If you would like to talk you can add me on Goole+, Facebook, or via email. My website is www.ConradSykes.com and my email is [email protected]

answered Oct 23 '13 at 19:00

conrad%20sykes's gravatar image

conrad sykes
0112

edited Oct 23 '13 at 19:02

Your answer
toggle preview

powered by OSQA

User submitted content is under Creative Commons: Attribution - Share Alike; Other things copyright (C) 2010, MetaOptimize LLC.