|
Has anyone experience with explicit feature maps for SVM training? There has been some recent process that looks very promising, but I haven't seen the techniques used much. I am implementing and trying them out at the moment and I was wondering if anyone has tried these techniques yet. Also, I am interested in their relation to "encoding" techniques like Locally linear embeddings and Fisher vectors. Has anyone compared these different approaches to transform the feature space? There is an interesting survey, but it does not compare against explicit feature maps.
This question is marked "community wiki".
showing 5 of 9
show all
|
Thanks for sharing this, I never heard of the Fourier series approach to kernels before. That looks pretty cool!
If you are interested, there is matlab code by Fuxin, the author of the "skewed chi2" paper here: http://sminchisescu.ins.uni-bonn.de/code/randfeat/ I am currently playing with it in python and want to include it into scikit-learn. You can find my branch here: https://github.com/amueller/scikit-learn/tree/kernel_approximation and the code here: https://github.com/amueller/scikit-learn/blob/kernel_approximation/sklearn/feature_extraction/kernel_approximation.py It currently implements the rbf and skewed chi2.
Check out Liefeng Bo's page: http://www.cs.washington.edu/homes/lfb/
He has a similar fourier approximation scheme, and it's meant for use with vision applications, where large feature vectors tend to limit us to linear SVMs.
Thanks for the link. Which paper are you referring to?
Have you looked at the random kitchen sinks http://berkeley.intel-research.net/arahimi/papers/basis2.pdf ? Have you looked at hash kernels for structured learning http://hunch.net/~jl/projects/hash_reps/hash_kernels/hashkernel.pdf ?
Both these things are very similar to explicit feature maps and perform reasonably.
There are also more efficient ways of training SVMs if one is allowed to use these maps. See for example http://arxiv.org/abs/1111.0432 , which doesn't sound like explicit feature maps but actually uses them in their results.
I know the random kitchen sinks but not the other two. Did you use any of these techniques yourself?
No, not so far. I'd be interested in seeing these things applied, however.
In http://arxiv.org/abs/1111.0432 it seems to me the time comparisons are unfair, as they use another stopping criterion for their methods as for the others. What do you think about that?
I don't have much experience tuning stopping criterion for the solvers compared there.
In section 4.2 they ran their solvers until they achieved comparable accuracy to those other solvers. If an order of magnitude of runtime on the other solvers did not improve test error I'd say their default stopping criteria are pretty bad. In section 4.3 they only compared against LASVM, and one pass only.
I'm not terribly bothered by the results because the margin is huge, so it does not seem like they won on a technicality.