|
From the discussion about the Importance Weighted Online Active Leaning (by A Beygelzimer, Sanjoy et al) algorithm at http://hunch.net/?cat=22 , I saw the algorithm at slide page 47 to 49 of https://github.com/JohnLangford/vowpal_wabbit/wiki/v5.1_tutorial.pdf but I don't understand two points: 1) I don't understand how they train a supervised classifier on the queried data where each data is weighted by 1/p_t, do they need a special supervised classifier to take this weights into account when training ? or we can use any existing one (SVM, NaiveBayes,KNN or whetever), how ? 2) It is not clear how they compute confidence p_t in their algorithm. It is defined as a function of some value DELTA_t, which is defined as the increase in training error rate if learner is forced to change its prediction on the new unlabeled point x_t. What does this practically mean ? Is there someone here, who already understood how this algorithm works, answer this two questions ? |
|
1) They use stochastic gradient descent-based classifiers that are adjusted to handle importance weights. This paper describes how such algorithms work. 2) The answer is in the last slide you referred to, slide 49. It essentially computes what importance weight would be required for a single gradient update to change the prediction on that label. To compute that you seem to need the equations in the importance-aware learning algorithms from the paper I linked to in point 1. In slide 49 (the equation of DELTA_t), is h(x_t) the predicted label for data-point x_t ?! How can we get the max between h(x_t) and 1-h(x_t) ?! Is this just for binary classification where we have labels 0 and 1 ?! Or is h(x_t) the probability of predicting the most probable label ? Also, what does the TETA_t symbol in the denomination mean ?
(Mar 07 '13 at 05:47)
shn
h(x_t) is the predicted label, yes. As vw is binary h is just a number, so subtracting it from 1 is easy. Eta_t is probably a learning rate. I recommend that you read the active learning papers, not just the slides, for more information on the equations and how to interpret them, the papers are linked from the blog post.
(Mar 07 '13 at 08:35)
Alexandre Passos ♦
so if the labels are 0 or 1, then max{h(x_t), 1-h(x_t)} will always return 0, in the equation of DELTA_t !
(Mar 07 '13 at 10:48)
shn
Sorry, the h(x_t) are predicted probabilities. Also max of (x, 1-x) for binary x is always 1, not always zero.
(Mar 07 '13 at 12:12)
Alexandre Passos ♦
|