|
In support vector machines classifier, the final output is w. We classify unseen data using this vector. If I have this vector, what is the information I can estimate from from w about the data ? thank you |
|
SVM's are not a probabilistic approach and as such, you cannot actually estimate anything from your w Try looking at it as a simple linear classification. The w vector are the parameters that define a Normal vector perpendicular to the hyperplane that separates the data. The W vector has to be minimized under the restrain conditions that it separates the data. You can get little information about the data if you are only given this vector. Since SVM's are not a generative algorithm and as such, you cannot properly estimate anything from it. There are probabilistic approaches to SVM's where you can probably do something with the p(w|x) but that would be besides the point in a standard SVM approach. |
|
i frequently use such parameter vectors to get a sense of what features are most important to the predictive model. this is more of a common sense sanity check than a rigorous statistical study. if the most indicative feature is obviously some noise, or, perhaps some hidden "tell" that came from some problem artifact, then i know immediately that i need to revise things. this is often much faster than actually deploying such a w and discovering later that there is funny business. I notice that this is a fairly common practice for people that use SVMs. Has there been any more formal methods to exploit this characteristic? It seems like it would make SVMs a potential preprocessing method.
(May 16 '11 at 10:30)
crdrn
|
|
Another indicator is the sign of a weight. In two class setting with a linear kernel the sign of a weight w_i says which class the corresponding feature is contributing to. In case of a polynomial or rbf kernel you do not get a weight and a similar statement is not possible. Features having a weight with low magnitude may be noisy, and generalization performance might be enhanced if you ommit such features. But this is a "hands on" approach and not sound theory. |
|
I want to take Uwe's any my own comments a bit farther. There are two additional actions I often take when building a predictive system with a linear model. 1) think before hand about what features are likely to indicate the classes, and investigate their respective coefficients. This is often merely interesting, but it often tells you when revision is necessary. 2) its often much easier to do a post-hoc revision of the coefficients (the w_i 's ) than it is to set priors. This is particularly true in high dimensional problems where it's unlikely you can estimate a prior value on everything in some reasonable amount of time. rather, it becomes much easier to look at what the model thinks the most positive and negative features should be, and "get rid" of the mistakes, for instance, by setting their coefficeints to 0. The same can be done for the kind of analysis used in step 1), that is, looking at the features YOU think are most interesting and predictive, but the model disagrees on. I haven't read any of this stuff in research literature, but these are amongst the tricks i've learned when actually deploying ML systems. |