|
I use LIBSVM to perform the text classification work. And usually, I always find some "positive" files, which are pre-labelled for performance testing purpose, tend to have very low score, sometimes even being lower than 0.1. This fact indicates that there have some strong similarities between negative training files and these positive test files. Generally, what are the common approaches to solve this problem, or at least increase the score of these positive testing files without affecting the classification performance as a whole. |
|
Did you try switching the Kernel that you are using. The kernel is the one in charge of defining the metric to use during the optimization, sounds to me as if the boundary between classification is doing a crappy job at finding the correct support vectors. Which parameters are you using? I tried linear kernel, and RBF kernel with C=0 and gamma=0.
(Oct 13 '13 at 22:22)
huaiyanggongzi
|
|
Also with SVM, sometimes it performs really poorly when there are only a few features (eg your small example set). This is a problem I have encountered when using SVMs for text classification. Listen to Leon, but also consider testing on the rest of the data to see if the issue goes away. |