How does an ensemble classifier merge the predictions of its constituent classifiers? I'm having difficulty finding a clear description. In some code examples I've found, the ensemble just averages the predictions, but I don't see how this could possible make a "better" overall accuracy.
Consider the following case. An ensemble classifier is composed of 10 classifiers. One classifier has an accuracy of 100% in data subset X, and 0% in all other data sets. All other classifiers have an accuracy of 0% in data subset X, and 100% in all other data sets.
Using an averaging formula, where classifier accuracy is ignored, the ensemble classifier would have, at best, 50% accuracy. Is this correct, or am I missing something? How can taking the average prediction from N potentially clueless classifiers possibly create a better prediction then a single classifier that's an expert in a specific domain?
First, remember that if you assume that the different classifiers' predictions are equivalently accurate (a priori) independent estimates of the same random variable and that your loss function is convex then you can't lose.
Of course, as you pointed out, sometimes they are not equivalently accurate. What most ensemble methods do, then, is use a weighted average of the predictions of the base classifiers, which is really the same thing as learning a linear classifier whose features are the responses of the other classifiers. Preferrably uou should calibrate these weights on held-out data, as then you're optimizing the test loss. It is possible to use methods that can compete with (that is, never be much worse than) the best singe classifier, or even the best linear classifier, and to do that online with only one pass through the validation set. See for example the Cesa-Bianchi et al paper on online gradient descent.
answered Jan 22 '12 at 15:41
Alexandre Passos ♦