|
Hi All, I have consistently observed gains using supervised shearing-type feature normalization (no dimensionality reduction, only transform in feature space) prior to Kernel SVM classification. I am absolutely sure that these gains are not because of cross-validation, data pollution or any fluke. I have observed this is in benchmark web image and video datasets. Can anyone give some intuition why this is happening and why RBF SVMs are not able to learn such a transform on its own? |
|
I don't know what you mean by shearing normalization either, but in general the answer to a question like "why does preprocessing method X change the results when using kernels? shouldn't a kernel be able to learn it?" is that while a kernel method can indeed approximate any function, including the functions it learns from your preprocessed data, their sample complexity is often big enough that you don't have enough data for them to learn these transformations. An RBF kernel, as I'm sure you know, approximates the decision boundary by exponentiating the euclidean distance between test point and all training points. When you normalize you change the distance metric, so you change which points are the closest points. If you had sufficiently many points your kernel method could reweight the training data to compensate for the distorted metric, but in general you can almost always improve its performance by fiddling with the metric. This is the reason why people investigate, for example, multiple kernel learning methods. |
|
Could you specify a bit about what you mean by shearing-type feature normalization? |