|
I found a lot of papers opt to use a chi-square kernel for classification, and just say that it was found to give better results empirically, especially when a bag of features approach is used. Is there a mathematical argument as to why this works? Thanks in advance for any help. |
|
I think the idea is that he chi square kernel is a more natural distance measure between histograms than the euclidean distance. You might find these papers helpful: http://sminchisescu.ins.uni-bonn.de/papers/lis_dagm10.pdf http://www.cc.gatech.edu/~fli/chebyshev-longver.pdf http://eprints.pascal-network.org/archive/00006964/01/vedaldi10.pdf ps: I think it is good to differentiate between the exponential and the additive chi2 kernels. Hi Thanks a lot for the links. I am specifically talking about the exponential chi2 kernels, sorry for forgetting to mention that earlier. Presently I am going through the papers you referred, I have another question. Is the choice of the kernel influenced by the fact that the histograms in question are generally sparse? Thanks,
(May 10 '12 at 06:38)
chinmayduvedi
I don't think the sparsity plays a role. Also, in general, exponentiated kernels seem to work better than additive ones. Btw, there are implementations of the approximation methods described in the paper here: http://scikit-learn.org/dev/modules/kernel_approximation.html
(May 10 '12 at 07:26)
Andreas Mueller
Hi Yes I checked that paper on kernel approximations. Here is a thought I had while studying the exponential Chi2 kernel versus the RBF kernel. The kernel functions look more or less similar except that in Chi2, there is a denominator term ||xi + yi|| , instead of the 2*sigma^2 used in case of RBF. So my understanding is that this gives a more "local" taste to the kernel measure, instead of the more general, global parameter, sigma. Does this make any sense? Thanks
(May 14 '12 at 01:40)
chinmayduvedi
|