|
I want to implement an algorithm in a paper which uses Kernel SVD to decompose a data matrix. So I have been reading materials about Kernel methods and kernel PCA etc. But it still is very obscure to me especially when it comes to those mathematical (linear algebra) proof and formulae. I have a few questions regarding Kernel -based methods and kernel PCA which are very basic because I am a beginner.
|
|
Like you say, kernel methods are useful for transforming data to high dimensions, non-linearly, in the hope that this transformation will make the data "linear" (lie close to linear manifold, or be linearly separable in case of classification). Kernel trick is a separate issue, that allows you to compute the kernel cheaply using only the inner products without having to explicitly compute the kernel function. But you still get the same kernel, just computed differently. The point in kernel PCA is that once you've transformed to high dimensions, the data might be easily explained by a few directions (principal components). So it's not weird to combine, it's just looking for a simple explanation in non-linear feature space instead of in linear input space. That said, I haven't seen an example in practice where kernel PCA gives you advantage over plain linear PCA, but there's no harm as RBF kernel with decent hyperparameters can capture anything a linear kernel can. |