It might be a naive question, but the problem is that i didn't study SVM and kernels in my collage. I have to use them in my graduation project and i have no time to read them and study them well.

well, I knew that kernels have several types Linear kernel, Polynomial kernel and Radial basis kernel.

The question is

Do I try several kernels and choose the one which minimize the error?

I other words according to what I use a specific kernel?

Another question

If I am applying the polynomial kernel on a data Matrix D (n-by-p matrix), just all I do is that

(1 + (x.x'))^m where m = 2,3,...,

and 1 here is a vector or I add it to the (x.x') ?

thank you all

asked May 04 '11 at 13:17

Omar%20Osama's gravatar image

Omar Osama
35678


4 Answers:

Adding to Alexandre's answer, you can sometimes guide your choice based on your data-set if you know it well enough.

Each of the various kernel functions is a way of comparing the similarity of two samples with each-other. I'll illustrate a few examples:

  • The Gaussian radial basis kernel ignores relative location. Take 3 samples X1, X2, and X3. If you calculate X1-X2 and X1-X3, they may be completely different vectors that point in wildly different directions. However, the RBF kernel calculates exp(-g*||Xj - Xk||). In short, the location in sample space is completely irrelevant (obviously assuming the distance metric is euclidean), as long as the magnitude of the differences are similar. I would roughly compare this to clustering. Similar samples will have similar classifications based on how close they are (based on your distance metric) to a given support vector, not where they are relative to the sample.
  • The polynomial kernel is more or less a straight-up dot product. Sample similarity is based on whether the samples are parallel or orthogonal to each-other. Parallel results in a large distance metric, orthogonal is zero.

You can apply similar logic to the various kernels out there to figure out where those kernels will shine, then apply that logic to your data and try to determine (whether emperically, logically, or a combination thereof) what would be most appropriate.

As an example, with hand-written digit classification I'd be inclined to go with some type of convolutional kernel that measures the total energy of the correlation of two samples.

This answer is marked "community wiki".

answered May 05 '11 at 12:55

Brian%20Vandenberg's gravatar image

Brian Vandenberg
824213746

edited May 05 '11 at 12:59

I can recommend reading a brief guide for using SVMs called, "A practical guide to SVM classification", link text, available from the LibSVM website.

answered May 04 '11 at 14:26

amair's gravatar image

amair
2452312

seems to be nice guide.

thank you

(May 04 '11 at 15:03) Omar Osama

I'd use the kernel which works best on a hold-out set. Training set error is not very useful for model selection.

Re: polynomial kernel: 1 should be a scalar, not a vector (note that the result of x.x' is a scalar). See Wikipedia for a reference.

answered May 04 '11 at 14:01

jrennie's gravatar image

jrennie
12124

There are many ways of picking kernels; usually you pick the one that leads to smallest test error (or cross-validation error). The 1 in the formula for the polynomial kernel is a scalar, not a vector, as x · x' is also a scalar.

answered May 04 '11 at 13:58

Alexandre%20Passos's gravatar image

Alexandre Passos ♦
2554154278421

thank you while x.x' is a scalar, so x is an observation vector.

now, if I have the dataset D (n-by-p matrix). what is the result after applying polynomial kernel with degree 2 on D? And what is the dimensionality of the result matrix?

(May 04 '11 at 14:33) Omar Osama

@omar - The algorithm goes through a pruning stage to select support vectors. Assuming you choose m < n support vectors, the kernel matrix is m x m.

(May 05 '11 at 12:57) Brian Vandenberg
Your answer
toggle preview

powered by OSQA

User submitted content is under Creative Commons: Attribution - Share Alike; Other things copyright (C) 2010, MetaOptimize LLC.