|
I have been working on an "active learning" approach for jointly reducing the cost and number of labeled samples (instances) in "instance-based" methods and ensembles (specifically General Regression Neural Networks, Probabilistic Neural Networks, k-Nearest Neighbor, and ensembles based on these methods). The basic idea is that active learning will be used to first select from a pool of unlabeled samples, and then once labeled, determine whether said sample should be "stored" as an instance. Alternatively, this could be done for a stream of samples, where each new sample is chosen randomly and then requires a decision to label it, and a decision to keep it or not. Does this seem like it would make sense as an approach, or would it be better just to perform some kind of direct sample reduction technique (a la "Reduction Techniques for Instance-Based Learning Algorithms" by Wilson and Martizon, 2000)? Has something like this been attempted, or would it not really provide any value anyways? I am concerned that the reduction will basically eliminate what is one of the main advantages of these methods, that they are better at handling problems of higher complexity. |
|
I can only see a value for active learning in your scenario if there is a huge class imbalance, e.g. if you expect the number of instances of one class to be less than 0.0001 of the number of instances of the other class. Then it would definitely make sense to use active learning to direct your sample choosing machinery towards the decision boundary to pick samples from there. However, you would not need to have another decision whether to keep or not to keep it since you are already in an interesting sample space and all instances will be worth to keep. Well, certainly you would keep each labeled sample for use in active learning (not to mention cross validation), but part of the goal is to minimize the number of instances kept in the classifier itself since each one adds another similarity evaluation every time it makes a prediction. In other words, if I can reduce the number of instances kept for the classifier from 10,000 to 5,000, I cut the prediction time for a new sample significantly. Also, wouldn't there naturally be cases that would only be interesting if their label ended up being unusual? If we were using disagreement among a set of hypotheses to determine who should be labeled, we would cast a very broad net for labeling but find that many of the labeled samples barely adjust the decision boundary.
(Dec 27 '12 at 12:03)
Daniel E Margolis
Is it a prediction time constraint rather than a memory constraint? If so, you could consider keeping all of the instances, but stopping evaluation of the similarity once you are reasonably confident it won't change. I can dig up some papers on this approach in the context of using only a subset of ensemble learners to do classification. This gives a fluid tradeoff between classification time and prediction accuracy.
(Dec 27 '12 at 17:38)
Rob Renaud
The two primary constraints are actually prediction time and cost, which would likely reduce memory usage as well. I would like to see those papers, though the cost constraint would work against labeling all instances. Actually, this is leading me to a great question to ask about label cost minimization, which I think is worth asking in a separate Q&A.
(Dec 28 '12 at 14:13)
Daniel E Margolis
Daniel Could you add some calculations (big O ) ? I would have thought that halving the number of instances stored would not actually speed prediction time significantly. in other words what is the order of the search. a quick peek at wikipedia on k-d trees says nearest neighbour search is O(log N) for randomly dist points and O(k N^(1-1/k)) in worst case
(Dec 28 '12 at 15:26)
SeanV
Sean, while the efficient kNN case is generally O(log n), it is less true in GRNN and PNN (which under my current setup are O(n)) and especially not true in my ensembles which are all at least O(n) or greater.
(Dec 28 '12 at 16:25)
Daniel E Margolis
|