|
How does one prove that a separating hyper-plane that can be represented as a linear combination of the training samples exists for a linearly separable pattern? Although, it looks pretty obvious physically, I was interested to know if it could be proved mathematically. |
|
I think there is a simple direct proof for the stated problem. The training samples form a (possibly over-complete) basis for the subspace spanned by them. The intersection of the separating hyperplane with the spanning subspace is a (possibly lower dimensional) separating hyperplane in this subspace. As such it can be represented in terms of the basis (the training samples). Note however that if the samples only span a proper subspace, then the separator in the subspace can be extended to the full space in an infinity of ways, but none of those are representable by the sample, since it extends into dimensions not spanned by the sample. However if the space also has a dot product, you can also represent a hyperplane by its normal vector. There is then a unique extension of the hyperplane to the original full space that is represented by the same normal vector. Err.. so how could the rest of us overlook something that obvious? I think I read to much about SVMs in the last couple of days...
(Aug 22 '11 at 04:16)
Andreas Mueller
Daniel, thanks for the answer. I am new to pattern classification and and I am not sure, if I follow your answer completely. Can you please point me to some text which describes your answer in detail.
(Aug 22 '11 at 05:41)
aseembehl
1
This is basic linear algebra and doesn't really have anything to do with pattern classification. Look up vector space and basis of a vector space on wikipedia. That might help.
(Aug 22 '11 at 05:46)
Andreas Mueller
I have gone through the wikipedia pages. One thing which is still not clear to me, is that by the above explanation any hyper-plane in the subspace of training samples would be a linear combination of the training samples, even if it is not a separating hyper-plane.
(Aug 22 '11 at 13:25)
aseembehl
Moreover, even if the samples are not linearly seperable, still by the above explanation, the hyperplane in the sub-space of the samples would be their linear combination. Am I missing something here?
(Aug 22 '11 at 13:26)
aseembehl
Yes, any vector in the span of a set of vectors is a linear combination of the vectors. That's the definition of span. If a hyperplane is given as a normal vector (as usually the case), any hyperplane restricted to the span can be expressed as a linear combination. So the conclusion is really trivial in a way. You assume that there is a separating hyperpane. And as any hyperplane restricted to the span of the training examples can be expressed with these examples, in particular the separating hyperplane can. I don't really think that adds to the understanding of the algorithms, though ;) You might want to rethink your question to get a more enlightening answer ...
(Aug 22 '11 at 13:56)
Andreas Mueller
The "separatingness" of the hyperplane ensures that there is a nondegenrate intersection between the hyperlane and the spanned subset. Non separating hyperplanes have a null intersection with the subspace, or they may cover it, in which case there is no real normal vector lying in the subspace.
(Aug 22 '11 at 14:07)
Daniel Mahler
You're right, I meant to say "hyperplane in the span of the training examples".
(Aug 22 '11 at 14:22)
Andreas Mueller
showing 5 of 8
show all
|
|
The simplest way to prove that such a hyperplane exists is by constructing one. There is no analytic way to write down such a hyperplane as there is no easy way to guarantee that the points are indeed linearly separable. You can, however, construct one by solving the SVM optimization problem or by following the perceptron rule, which is guaranteed to find a separating hyperplane if one exists (though if no such plane exists it will loop forever) There is actually a paper about comparing all the algorithms to test linear separability and there is one that is faster than both SVM and Perceptron. If forgot the name, though :(
(Aug 21 '11 at 15:35)
Andreas Mueller
|
|
I think what you are looking for is the "Representer Theorem". It says that a solution to the SVM optimisation problem is a linear combination of kernel functions with support vectors. That actually holds for general kernels, not only linear ones. I am not sure which paper is the best to look up the proof but if you google "Representer Theorem" and SVM, I'm sure you'll find it.
This answer is marked "community wiki".
Hm, this is a possibility. If you just care about any separating hyperplane, however, I find the proof in Freund and Schapire's paper on the averaged perceptron ("Large margin classification using the perceptron algorithm" http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.48.8200&rep=rep1&type=pdf ) very simple and easy to understand. It's theorem 1 in the paper, and it proves that the perceptron algorithm converges to a separating hyperplane in less than the (1/margin)² iterations (assuming everything has unit norm).
(Aug 21 '11 at 15:38)
Alexandre Passos ♦
1
I don't know the proof by heard so I am not sure whether it answers the second part of the question: That the solution is a linear combination of the input patterns. Thinking about it now, it probably does ;)
(Aug 21 '11 at 15:41)
Andreas Mueller
|