2
3

I am not an expert of SVM and Kernel. My question is how to select suitable kernel function for a problem? LibSVM Guide suggests 'cross-validation' techniques for this purpose starting with RBF kernel function. Then I read some papers which consider incorporating prior knowledge about the problem in hand in kernel selection. Particularly, class-invariance property and knowledge about data. I am curious to know, how exactly information about data set can be extracted and be used for kernel selection? Or is there other way to guide kernel selection process?

So far, after reading related articles I have some idea about selecting suitable kernel using prior knowledge (though I am not sure this is meaningful or not) :

  1. I need to check whether my problem is class-invariant or not. Based on the result we can choose suitable kernel.

  2. Generally the training data are in high-dimentional space. Multidimentional scaling can be used to get a visual information about the data points based on their similarity/dissimilarity. And based on that plot we can find which function will be useful to separate the data points. Thus we can select that kernel function (gaussian/polynomial)

Any help would be appreciable and very sorry if I bother you. Thanks in advance.

asked May 27 '12 at 10:28

Raihana's gravatar image

Raihana
31124

edited May 27 '12 at 10:47


3 Answers:

To follow Leon and Andreas' answer you might want to look at "Kernel Methods in Computer Vision" by Christoph Lampert. There, he explains that since a kernel is viewed as some sort of similarity measure, then the following heuristic can help in choosing one.

A good kernel should have high values when applied to two similar objects, and it should have low values when applied to two dissimilar objects

Chapter 3 of the same article talks about incorporating invariance into the kernel instead of the feature extraction procedure and various kinds of kernels which is what I think you were looking for.

answered May 28 '12 at 14:19

Pardis's gravatar image

Pardis
563410

edited May 28 '12 at 14:21

1

Yeah this is a good book and definitely a good place to get a feeling for this methods.

Though afaik Christoph usually uses chi2 and rbf himself ;)

(May 28 '12 at 15:18) Andreas Mueller

@Pardis - Thank you very much for the help.

(May 28 '12 at 16:02) Raihana

Hi Raihanna. I'm not sure what the class invariant property is, so I can not comment on that. In general, the kind of kernel you use is often dictated by your representation of the data. For example there is a lot of work on kernels on sequences and trees and general graphs or sets. There are also domain specific kernels, developed in certain communities.

If your data representation is just a real vector, most people use linear or RBF kernels. Linear is good since it is fast and needs barely any tuning, RBF is usually the best performing non-linear one. I haven't really seen any other kernels being used in many applications.

One exception is computer vision. Here the data is often represented as histograms (so they are non-negative and sum to one) where chi2 and intersection kernels have proven to be more effective than rbf.

This answer is marked "community wiki".

answered May 28 '12 at 08:58

Andreas%20Mueller's gravatar image

Andreas Mueller
2686185893

@Andreas Mueller - My data representation is just a real vector. Each sample is represented using 40 features (30 of them are binary and the remaining containing value between 1-20). I got to know that RBF is usually the first choice so I've started with linear and then RBF. But finally I got good result using polynomial kernel. Can you give me any idea/explanation why polynomial works better than RBF? Thanks in advance.

(May 29 '12 at 04:29) Raihana

No. ;)

It might depend on the data normalization. For RBF, using zero mean, unit variance is good. But it is usually still quite sensitive to the kernel width gamma.

Doing some handwaving, the polynomial kernel might work better if the true function is easily expressed as a polynomial.

(May 29 '12 at 04:33) Andreas Mueller

Usually you use Cross-Validation and test the error of the kernel you used over a held out data set.

For example:

You divide your data set in 3 (Training, CV, and Test):

First you train lets say 5 SVMs with different kernels each (or different parameters for the same Kernel)

Then you test your trained results in your CV sets for each approach.

After this, you choose the one that had the best performance in its CV set to test in the final test set.

Usually when you use Cross Validation, you try to rotate the CV sets over all the possibilities, so you can have an average solution.

Check this lecture by Andrew NG, he discusses it for classification, but the overall principle is also good for SVM

answered May 28 '12 at 03:08

Leon%20Palafox's gravatar image

Leon Palafox ♦
40857194128

@Leon Palafox - Thanks for your answer. Besides 'cross-validation' approach, can you give me any idea (or any explanation or reference) about incorporating prior knowledge (about the learning problem) in kernel selection. Particularly, how and what information can be extracted from the training samples to influence the kernel selection. Thanks again.

(May 28 '12 at 09:32) Raihana
Your answer
toggle preview

powered by OSQA

User submitted content is under Creative Commons: Attribution - Share Alike; Other things copyright (C) 2010, MetaOptimize LLC.