|
A lot of the simpler active learning algorithms revolve around these general steps:
If the classifier is SVM, then you have at least one - if not 2-3 - hyperparameters to tweak. How would you go about finding the right hyperparameters for this in a practical setting? I can see a few alternatives:
1 is hardly realistic given how fiddly svm can be. 2 is likely to be slow. I also have concerns with overfitting, especially with such a small number of examples (k-fold cross-validation over a handful of examples?). I'm not really sure how 3 would work since many of the hyperparameters may just be plain wrong, unless you introduce some sort of weighing scheme based on the classifier's performance. Any ideas? Edit: Just to give a bit more context: there will be a single user, the number of labeled examples added per iteration will be on the order of about 50 (maybe less, depending on how much time the training takes), and you can't expect the user to be willing to sit through more than 100 iterations. The data is fairly low dimensional (15 to 200). |
|
your problem is considered almost exactly in this paper: A large-scale active learning system for topical categorization on the web |
|
While your options seem sensible, do keep in mind that the data obtained from active learning is biased, so cross-validation error is not an accurate estimate of held-out training error. Hence I'd keep a small set of IID points labeled and use validation error on these points to do model selection or, preferrably, model averaging. |
|
Interesting question. This is a very practical problem which is probably often omitted in papers. If you receive enough labeled data in your step 1, I'd say you can get away with estimating the hyperparameters via cross-validation on this set. I like your 3rd idea too. You could use something simpler than a combined metric such as: 1) choose a classifier with uniform probability 2) select an instance with your usual criterion (e.g., margin) on the chosen classifier 3) receive label 4) update all classifiers with the new instance. This idea probably works best if you retrain the classifiers after each received label (or use an online algorithm). Another idea is to use logistic regression instead of SVM, as it is supposedly less sensitive to the hyperparameter choice. 1
If moving from SVM towards an explicitly probabilistic classifier, I would argue for using confidence weighted learning (http://www.aclweb.org/anthology-new/P/P08/P08-2059.pdf).
(Oct 31 '11 at 12:22)
Oscar Täckström
1
Oops, I didn't read the link, which turned out to mention confidence weighted learning :)
(Oct 31 '11 at 12:32)
Oscar Täckström
1
Indeed, CW can be very useful in an active learning setting.
(Oct 31 '11 at 14:27)
Mathieu Blondel
|
|
I would probably do a combination of 1 and 2. Since you're doing active learning the number of instances should be reasonably low, so I don't see how a simple grid search could be a problem (also, it's trivial to parallelize). I would also consider starting with a heavily regularized model and then gradually relax the regularization as more data arrives. |