I recently implemented AdaBoost using decision stumps as weak learners and found that sometimes the training error slightly increases when a new classifier is added, even though it goes down eventually. Initially I thought it might just be a mistake in my implementation but then I saw the plots in this paper by Schapire et al. that also display eventual increases in classification error on the train set.

As far as I know AdaBoost should directly minimize the exponential classification error and you'd expect the actual classification error always to go down. Can it be that AdaBoost actually minimizes some upper bound on this error instead?

I'm also quite sure the weak learners always get less than 50% error individually.

Thanks in advance,

Philemon

asked Oct 15 '10 at 06:06

Philemon%20Brakel's gravatar image

Philemon Brakel
2445103560


One Answer:

Adaboost minimizes the exponential loss, not the 0-1 classification error (as minimizing that directly is a hard combinatorial problem). The exponential loss (exp(-y*f(x))) upper bounds the training error but gives a much higher weight to strongly misclassified examples, so if you have any of these it will trade of correctly classifying them with less incorrectly misclassifying others. Thankfully, the base learners are usually flexible enough that this is not an issue, since it is usually possible to learn a model that will correctly classify all the training data with a high margin.

answered Oct 15 '10 at 06:35

Alexandre%20Passos's gravatar image

Alexandre Passos ♦
2554154278421

Thanks for the fast reply! This actually makes perfect sense and is of course what can happen with any approximation to the 0-1 error. Funny how it has always been obvious to me for cross-entropy and mse but that I forgot about it for the exponential loss.

(Oct 15 '10 at 07:08) Philemon Brakel
Your answer
toggle preview

powered by OSQA

User submitted content is under Creative Commons: Attribution - Share Alike; Other things copyright (C) 2010, MetaOptimize LLC.