I use LIBSVM to perform the text classification work. And usually, I always find some "positive" files, which are pre-labelled for performance testing purpose, tend to have very low score, sometimes even being lower than 0.1. This fact indicates that there have some strong similarities between negative training files and these positive test files. Generally, what are the common approaches to solve this problem, or at least increase the score of these positive testing files without affecting the classification performance as a whole.

asked Oct 09 '13 at 14:24

huaiyanggongzi's gravatar image

huaiyanggongzi
71447


2 Answers:

Did you try switching the Kernel that you are using.

The kernel is the one in charge of defining the metric to use during the optimization, sounds to me as if the boundary between classification is doing a crappy job at finding the correct support vectors.

Which parameters are you using?

answered Oct 10 '13 at 16:05

Leon%20Palafox's gravatar image

Leon Palafox ♦
40857194128

I tried linear kernel, and RBF kernel with C=0 and gamma=0.

(Oct 13 '13 at 22:22) huaiyanggongzi

Also with SVM, sometimes it performs really poorly when there are only a few features (eg your small example set). This is a problem I have encountered when using SVMs for text classification. Listen to Leon, but also consider testing on the rest of the data to see if the issue goes away.

answered Oct 14 '13 at 12:33

rakirk's gravatar image

rakirk
31113

Your answer
toggle preview

powered by OSQA

User submitted content is under Creative Commons: Attribution - Share Alike; Other things copyright (C) 2010, MetaOptimize LLC.