For an online unsupervised learning algorithm, data-points are learned sequentially. The performance may improve if in addition to the unlabelled data we have some labelled data-points (i.e. semi-supervised learning with a small amount of labelled data). In this situation, it may be attractive to let the algorithm decide which data-points to label, that is, when the algorithm get a new data-point, it may actively decide to request the label of this data-point from the user, because it judges that it is an "important" example. As far as I know, this is called "active learning".

My question is: how or in which situations can a learning algorithm decide that the current example (data-point) is important, and thus requesting its label ? that is, which measures or criteria can allow us to know if we should or not request the label of a given data-point ?

asked Nov 02 '12 at 12:32

shn's gravatar image

shn
462414759

edited Nov 02 '12 at 12:34

Also asked here: http://stats.stackexchange.com/questions/41765/choosing-which-data-point-to-label-active-learning

(Nov 02 '12 at 16:05) Rob Renaud

@RobRenaud Yes it is. Is there any problem about asking a question on two different websites where we can get different constructing answers ?!

(Nov 02 '12 at 16:37) shn

No, nothing wrong, it's just good for potential answerers to have the context of other answers, in case they'd rather not duplicate a good answer.

(Nov 02 '12 at 17:26) Rob Renaud

One Answer:

As far as I know this blog post by John Langford accurately describes (and links to the relevant papers) the state-of-the-art of theoretically well-founded online active learning. The main idea seems to be creating a probability distribution over which points to label, choosing points randomly from this distribution, and scaling the updates by the inverse of this probability, to let the final classifier remain unbiased.

answered Nov 02 '12 at 19:00

Alexandre%20Passos's gravatar image

Alexandre Passos ♦
2554154278421

Can you please, look at this question which is about the algorithm you pointed to: http://metaoptimize.com/qa/questions/12294/importance-weighted-online-active-leaning

(Mar 06 '13 at 13:14) shn
Your answer
toggle preview

powered by OSQA

User submitted content is under Creative Commons: Attribution - Share Alike; Other things copyright (C) 2010, MetaOptimize LLC.