|
I am working on a large multiclass SVM classification system using the one against all approach. It seems to be working fine according to 5-fold testing and independent set testing. However, in these tests I have simply been taking the highest positive value for each input as the prediction. I worry there could cases where there the second highest prediction could be near to the highest prediction. For example 1.6515952 and the next highest value is -0.99935411. This case is a good classification but what if I have first and second highest values that like these 0.59976528 -0.09927958 or even 0.59976528 0.09927958 there could be a point where the predictions are too close to be reliable. So far I am planning to normalize all the predictions for an input and only accept the prediction if the difference of the normalized values if greater than some value. But I have no idea what that value should be or if this is even a reasonable thing to worry about/solve this problem. |
|
For multiclass SVM I don't know of any such probabilistic guarantees, but if you had something like a multiclass logistic regression (that is, something which predicts calibrated probabilities) then you can set a threshold and only accept answers which are true at least X% of the time by just thresholding the probabilities. There are techniques to produce calibrated probabilities from support vector machines, which would then allow you to do that (libsvm implements one). |