|
Typically kernel SVMs are penalized by L2 norm (I'm thinking of the primal representation here). Is there a point in L1 penalization here? Unlike in non-kernelized algorithms (lasso), you're not really penalizing the feature weights but the weights associated with samples, because of the kernel. |