4
2

High-level thought-experiment...!

I have a Support Vector Machine (SVM) that is already trained. I measure its performance and determine that I want to build a better SVM. I still have all the training data, so I want to add more training data to optimally improve performance.

Obviously, more training data is better, and I can manually select examples and manually label them.

Given that I only want to create N new samples to add to the training data...Should I choose samples...

  • (1) so that I try to equalize the total number of positive and negative samples (get close to 50/50 ratio)
  • (2) so that I try to make the positive/negative sample ratio close to what it would be in real data
    • i.e. if normally 10% of samples are positive, I should try to train with a 10/90 positive/negative ratio in training samples
  • (3) that are close to the decision boundary
    • The SVM I currently have can return a 'confidence' measure indicating how close the sample is to the boundary
  • (4) that have a range of confidences
    • e.g.
    • c<-4 (SVM very confident it is negative)
    • c<-1 (SVM confident it is negative)
    • |c|<1 (SVM is uncertain)
    • c>1 (SVM confident it is positive)
    • c>4 (SVM very confident it is positive)
  • (5) using some other criteria...

Note: the labeling is the expensive part. I can automatically generate samples and get predictions/confidence values from the current SVM. If I can only do N labelings, then want to know how best to select those samples I should label.

Thanks!

asked Apr 05 '11 at 13:45

Ciar%C3%A1n's gravatar image

Ciarán
2313912

edited Apr 11 '11 at 04:46

Joseph%20Turian's gravatar image

Joseph Turian ♦♦
579051125146


2 Answers:

There is a large subfield of machine learning called active learning that studies the best ways to choose samples to improve a given classifier. A very easy to implement solution, and one that is theoretically justified in some ways, is Leon Bottou's suggestion in the LASVM paper to select examples where the svm is uncertain (small |c|).

The justification is that examples with small |c| will certainly change the classifier if they are added to the training set, while the same cannot be said of examples with a large |c|, as those might as well be correctly classified. Also, hopefully, adding these small |c| examples will steer the hyperplane in a direction that will help it recognize gross misclassifications by assigning them a small |c| in later iterations.

answered Apr 05 '11 at 14:15

Alexandre%20Passos's gravatar image

Alexandre Passos ♦
2554154278421

Thank you Alexandre, that was my intuition about the problem (option 3). Much appreciated.

(Apr 06 '11 at 10:52) Ciarán

As Alexandre says, select an example that is close to the decision boundary. If you are selecting a batch of examples at this step in active learning, for labeling, you also want examples that are diverse. For example in "Incorporating Diversity in Active Learning with Support Vector Machines" (Brinker, 2003), about 200 examples close to the decision boundary are found, and then an incremental strategy is used to find a small number of examples that maximize the angle diversity.

(Apr 11 '11 at 04:52) Joseph Turian ♦♦

Selecting near the decision boundary is sensible, but you should importance weight the data to remain asymptotically consistent. Check out http://hunch.net/?cat=22

answered Apr 23 '11 at 01:08

Paul%20Mineiro's gravatar image

Paul Mineiro
91115

I don't really understand the Importance Weighted Active Learning algorithm by Alina Beygelzimer et al:

1) I don't understand how they train a supervised classifier on the queried data where each data is weighted by 1/p_t, do they need a special supervised classifier to take this weights into account when training ? or we can use any existing one (SVM, NaiveBayes,KNN or whetever), how ?

2) It is not clear how they compute confidence p_t in their algorithm. It is defined as a function of some value DELTA_t, which is defined as the increase in training error rate if learner is forced to change its prediction on the new unlabeled point x_t. What does this practically mean ?

(Mar 06 '13 at 12:59) shn
Your answer
toggle preview

powered by OSQA

User submitted content is under Creative Commons: Attribution - Share Alike; Other things copyright (C) 2010, MetaOptimize LLC.