|
Minimizing the number of labeled samples required reduces cost, but there are many specific cases (experimental setups) that come to mind for how this might be required. For instance, we could have a "budget" that allows k total number of labels to be queried from a pool. Alternatively each label could have a different cost and the sum of those costs must be less than k. This would get even more interesting if the cost is unknown prior to querying the label, and thus must be estimated. This could also be done from a stream rather than a pool, where we either do or don't know the total number of instances that will be presented to us and must make decisions to query in real time. Another interesting setup would be where each label query randomly (or not randomly) removes unlabeled samples from the pool, or alternatively adds noise to the remaining samples. Thus the cost is implicit rather than explicit. One could also have a situation where you could query multiple experts, each of whom costs differently. You could also add noise to their responses to mimic "attention slips" in annotators, and thus there may even be value in just re-querying the same cheap person multiple times. What might be some other interesting setups? Note that this is not about approaches to solve these problems (though they would be welcome), but more of just brainstorming of interesting setups that may or may not have been considered before. |