We have a data-point x and many classes. Let P(c|x) the probability that x is of class c. We note c1 the most probable class for x (P1=P(c1|x) is the highest probability), c2 the second most probable class for x (P2=P(c2|x) is the second highest probability (P1 > P2)).

We define u to be a threshold value for the difference between the highest and the second highest probability (a threshold for P_1-P2); thus, if P1-P2 > u then we can (with some cost) ask for the true class of x (let's note it cx).

c1 (the predicted class for x) is usually equals to cx, but may sometimes not be equal. Given this problem, I want to learn a good value for the parameter u. To do that, currently, I just set u at an initial value (e.g. u=0.2) and then adjust this value according to whether or not c1 equals cx:

if c1 = cx then we get more confident and thus decrease the value of u (e.g. u=u-epsilon), otherwise (when c1 != cx) we get less confident and thus increase the value of u (e.g. u=u+epsilon), where epsilon=0.01 for example.

Question:

Is there any better why to "learn" a value for u ? (Assuming that we can start with a hight initial value of u in order to get labelled data at the beginning, or assuming that I have a subset of labelled data that I can to use to learn the value of u).

asked Jan 23 '13 at 09:52

shn's gravatar image

shn
462414759

edited Jan 24 '13 at 08:46


One Answer:

Have you seen TrueSkill? I'm not sure I understand what you are talking about or whether it matches, but there are some similarities.

Or do you mean active learning?

answered Jan 23 '13 at 21:12

Alexandre%20Passos's gravatar image

Alexandre Passos ♦
2554154278421

@AlexandrePassos it is about learning a threshold value (u), that we use to decide to reject a given data-point instead of trusting the prediction made by our classifier. We can however see that as defining a threshold (u) for selective (i.e. online) active learning, that we use to decide to query the true class-label of a new data-point instead of trusting the prediction made by our classifier. So the value "u" that I want to learn, is a sort of confidence value. I want either (an I prefer that) to optimize it after processing each new data-point, or by learning a good value for it using a labeled data-set.

(Jan 24 '13 at 08:41) shn

Is the goal to learn a model using less labels (not querying the labels of things on which you're confident) or is it to make less errors as you predict?

The first is called active learning, and has been actively researched.

The second I don't know the name but you can find papers about it, likehttp://www.machinelearning.org/proceedings/icml2005/papers/084_Abstaining_Pietraszek.pdf (the keyword I used was abstaining classifier, though I'm sure there are others, you can find more by looking at the citations of this paper and the papers who cite it)

(Jan 24 '13 at 11:35) Alexandre Passos ♦

@AlexandrePassos (1) many active learning methods use uncertainty sampling to query only labels of data-points on which we are not confident (uncertain). (2) many classification methods use an uncertainly measure in order to reject uncertain data-points in order to make less errors. In this sense, the measure used for (1) to query the label of a data-point is the same as the measure used in (2) to reject an uncertain data-point. So, even if my question concern more point (2), it is the same for active learning with uncertainty sampling. Thank you for the paper you linked to. However my question concerns more how to learn a real value which constitute a threshold for the uncertainty measure that I defined in my question.

(Jan 24 '13 at 14:53) shn

Do you really think there is any difference between the measure used for (1) "active learning with uncertainty sampling" in order to use less labels, and (2) rejecting uncertain labels in order to make less errors ?

(Jan 25 '13 at 04:59) shn
Your answer
toggle preview

powered by OSQA

User submitted content is under Creative Commons: Attribution - Share Alike; Other things copyright (C) 2010, MetaOptimize LLC.