When reading most machine learning papers on kernel-based methods, almost always the underlying RKHS is only used to define a Mercer kernel between instances and any other property of it is promptly left aside. Are there any algorithms that depend on some property of a RKHS that is deeper than just the existence of a projected inner product?

Also, can the RKHS formalism allow for something more complex, such as connecting somehow features in the original space to some quantity in the projected space? For example, is it possible to use kernels in an SVM that is regularized with the l1 norm of the weight vector, or is the l2 norm really necessary for a proper kernelization of the algorithm?

asked Jul 03 '10 at 15:29

Alexandre%20Passos's gravatar image

Alexandre Passos ♦
2554154278421

An l1 regularized SVM could be interesting in situation in which you expect the projected feature vector to be sparse. For example, with a quadratic kernel, if you think just a few multiplicative interactions of pairs of features should pretty much determine the classifier.

(Jul 04 '10 at 07:50) Alexandre Passos ♦

2 Answers:

For the first part of your question, I think you are interested in sections 13.3 and 13.4 (and the theorem within) of this note.

answered Jul 04 '10 at 08:32

osdf's gravatar image

osdf
67031119

I don't see how it answers my question. You pointed out the reproducing property (that every function in a RKHS is a linear combination of the kernels) and the representer theorem (that every solution to a square-norm-regularized loss minimization problem can be expressed as a linear combination of kernels).

I originally meant to ask if there was anything more than these two properties that motivated the study of RKHS for machine learning.

http://users.rsise.anu.edu.au/~williams/papers/P139.pdf seems to answer the second part of my question with "yes", but it's too vague for me to understand in a cursory reading.

(Jul 04 '10 at 09:24) Alexandre Passos ♦
2

I see. I didn't know that the representer theorem falls under 'the existence of a projected inner product' ;).

(Jul 04 '10 at 09:35) osdf

RKHSs are also nice because, if you put together combinations of spaces that cover different sets of functions, the inner products in those spaces are still expressible in terms fairly well-behaved functions of the reproducing kernels for those two spaces.

The first example off the top of my head where this is useful that comes to mind is smoothing splines, where you split your RKHS with your solution functions into two orthogonal subspaces: one that has 'smooth enough' functions that you penalize based on lack-of-fit, and one that has all of the roughness information that you penalize based on the norm of your fitted function projected into that subspace. Because you do the whole thing in a RKHS you can use Mercer's theorem again to solve the minimization problem (it ends up being a bit harder, due to some numerical issues, but the basic steps are pretty much the same from that point on).

answered Jul 16 '10 at 12:55

Rich's gravatar image

Rich
313

That's interesting; I didn't know that.

(Jul 16 '10 at 12:58) Alexandre Passos ♦
Your answer
toggle preview

powered by OSQA

User submitted content is under Creative Commons: Attribution - Share Alike; Other things copyright (C) 2010, MetaOptimize LLC.