|
My friend mentioned he heard a machine learning expert claim in a talk that kNNs have similar "behavior" to SVMs. This blew my mind. To me, they seem very different, and the only thing I can think of that they have in common is that they're both supervised classification algorithms. Anybody know what this might be referring to? Sorry the question is so vague and open-ended. |
|
If you use SVM with gaussian kernels and almost no regularization (that is, very high C) you're essentially classifying new points by the average class of training points weighted by their distance to the new point (handwaving a bit). When you add more regularization you're still doing something similar, the main difference being that you start choosing which training points to consider. This, however, is only true of SVMs with the gaussian kernel. |
|
Suppose you have a training dataset T = {(x_i,y_i) for i = 1...N}, where each x is the multi-dimensional input, and each y is a binary output (+1 or -1). If you're using Gaussian kernels, then the K-NN classifier is h(x) = sum( y_i * k(x,x_i) for (x_i,y_i) in N(x)), where N(x) is the K closest training points to x and k(.,.) is the Gaussian kernel. The predicted output is the sign of h(x). The SVM classifier is h(x) = sum( y_i * a_i * k(x,x_i) forall (x_i,y_i) in T), where the vector of (a_i) variables is the model learned by the SVM training algorithm (also known as the dual variables). The predicted output is again the sign of h(x). So the two classifiers are structurally similar. The K-NN classifier only looks at the K closest training points to the input point x, and weights them by their kernel similarity to x. The SVM classifier globally weights all the training points using the learned model (a_i), and multiplies that to the kernel similarity of each training point to x. In other words, if you want to phrase the K-NN classifier in the exact form of the SVM classifier, then the K-NN classifer sets a_i = 1 for every x_i in N(x) and 0 otherwise (this is done uniquely for every input x, whereas the a_i model parameters learned by the SVM are global and fixed after training). |
|
In a vaguely general sense, I can see this being true. kNN is a clustering algorithm, so you're classifying (or whatever) based on how close something falls to the centroid of a cluster. SVMs, on the other hand, use 'clusters' that are chosen not by the centroid of a particular region of input space but rather a hyperplane that segregates one region from another. At the heart of it, though, both systems employ a mechanism whereby you're comparing one sample to another to measure how similar they are to each-other. In kNN it's [usually] some sort of distance metric; one of the simplest examples is the L2 norm, which uses a simple inner product to induce a metric on the vector space of inputs. SVMs do the same thing, but the inner product used can vary quite a bit. For example, for inputs that have temporal behaviors, you can use a convolution of one sample with another, then square & sum the results. The resulting value is a symmetric semi-definite inner product -- but the same inner product could be used for kNN, if appropriately adapted. The main difference -- and you'll hopefully forgive my lack of experience with kNN here, I'm sure I'll botch something as I have only studied that algorithm, not implemented it -- between them is the 'kernel trick', and the fact that kNN works off clustering not partitioning regions of input space. However, I seem to recall seeing some clustering techniques that make use of the kernel trick. -Brian
This answer is marked "community wiki".
1
I think you might be mixing up kNN with k-means clustering here. kNN doesn't cluster your data but classifies new datapoints based on the labels of the k nearest datapoints from the train set.
(Jan 22 '11 at 09:27)
Philemon Brakel
Yep, you're right. Most of what I said still applies to kNN, just without the stuff about clustering.
(Jan 22 '11 at 12:22)
Brian Vandenberg
Brian, I think that kNN also partitions the input space like SVM or any other classification scheme.kNN map the input to some reference point(s) based on some distance function. So because of this mapping, the input space will be partitioned around the the referece point(s) according to the distance function used in the kNN.
(Dec 02 '11 at 16:15)
Irteza khan
|