|
For an online unsupervised learning algorithm, data-points are learned sequentially. The performance may improve if in addition to the unlabelled data we have some labelled data-points (i.e. semi-supervised learning with a small amount of labelled data). In this situation, it may be attractive to let the algorithm decide which data-points to label, that is, when the algorithm get a new data-point, it may actively decide to request the label of this data-point from the user, because it judges that it is an "important" example. As far as I know, this is called "active learning". My question is: how or in which situations can a learning algorithm decide that the current example (data-point) is important, and thus requesting its label ? that is, which measures or criteria can allow us to know if we should or not request the label of a given data-point ? |
|
As far as I know this blog post by John Langford accurately describes (and links to the relevant papers) the state-of-the-art of theoretically well-founded online active learning. The main idea seems to be creating a probability distribution over which points to label, choosing points randomly from this distribution, and scaling the updates by the inverse of this probability, to let the final classifier remain unbiased. Can you please, look at this question which is about the algorithm you pointed to: http://metaoptimize.com/qa/questions/12294/importance-weighted-online-active-leaning
(Mar 06 '13 at 13:14)
shn
|
Also asked here: http://stats.stackexchange.com/questions/41765/choosing-which-data-point-to-label-active-learning
@RobRenaud Yes it is. Is there any problem about asking a question on two different websites where we can get different constructing answers ?!
No, nothing wrong, it's just good for potential answerers to have the context of other answers, in case they'd rather not duplicate a good answer.