|
I am looking to use SVM + a string kernel (Lodhi 2002) for a classification task involving sequences of bytes from a hard drive. I would prefer to use an existing implementation as reference when I write my own. I don't anticipate there will be a tool that will fully perform my classification task using a string kernel, so I am mainly looking for a reference implementation of just the kernel. Does anyone know where I could find such a reference? |
|
The paper talks about subsequence kernel which is basically a variation on the dynamic programming solution to longest common subsequence of two strings. Pseudo code for the subsequence kernel is in book: Kernel Methods for Pattern Analysis (http://www.kernel-methods.net/). Also if you need it, here it's my Python implementation:
It's not commented but, xi and xj are strings, lamb and p are parameters described in the paper. It also caches already seen pairs of xi and xj. could you comment on the applications of such a kernel?
(Jan 19 '11 at 17:02)
Alexandre Gramfort
this is just what i asked for, and now i remember why i asked for a reference implementation :) unfortunately this may be too slow. i have some preliminary experiments using an edit distance kernel (which should be faster than SSK), and it is REALLY slow.
(Jan 19 '11 at 17:19)
Travis Wolfe
What about using k-mers kernel? It should be much faster than SSK and edit distance, since it's linear in the maximum length of both strings.
(Jan 19 '11 at 18:18)
Rok Mocnik
this code can be trivially rewritten with cython to get a huge speed up.
(Jan 20 '11 at 11:43)
Alexandre Gramfort
|
|
hi, it looks like libsvm has an implementation of string kernels: http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/#libsvm_for_string_data hope this helps Alex Their package implements an edit distance kernel (which apparently doesn't always produce a proper kernel according to them). It should help nonetheless, thanks.
(Jan 09 '11 at 23:25)
Travis Wolfe
if you come up with a good implementation BSD compatible, I'm sure the scikit-learn folks [1] would be interested. [1] http://scikit-learn.sourceforge.net/
(Jan 10 '11 at 09:42)
Alexandre Gramfort
|
|
Someone (offline) gave me this solution as well. http://ace.cs.ohiou.edu/~razvan/code/ssk_core.tar.gz. It implements the SSK (subsequence string kernel) in Java to be integrated with libsvm. I haven't tried it yet, but it looks legit. |