|
I understand that in the dual form of the model for support vector machines, the feature vectors are expressed only as a dot product. Mapping the feature vectors to a higher dimensional space can accommodate classes that are not linearly separable in the original feature space, but calculating this mapping and working with the higher-dimensional feature vectors is computationally prohibitive. Instead, kernels can be used to efficiently calculate the same value as the dot product of the mapped vectors. How do support vector machines avoid overfitting? Is maximizing the margin of the decision boundary the only trick that they use, or am I missing something? |
|
Overfitting is controlled by the "soft-margin" concept and its hyperparameter C (or nu in the nu-svm formulation)- the idea that there may be data points in the margins (and even in the other side of the hyperplane). This has nothing to do with the kernel, which is a trick not to be limited to linear separator. |