|
Hi I am right now coding Adaboost.M1 by myself and I came up with this problem. Each round of iteration, the algorithm updata the weight of the training set, the misclassified example is weighted higher, right ones are weighted lower. My question is after one iteration, what kind of data should I use to train for the next iteration? I am confused with these following two scenarios:
Thank you. |
|
I think the training set is always the same set, and the only thing that changes is the weight of the examples. Does this answer your question? The only issue I can think with this approach is that it will necessarily be slow, but this is a bit unavoidable, otherwise you're not really optimizing the exponential loss. Then in this way, how can the weighted examples influence the classifier, the training set is basically the same, only the misclassified ones appended a weight?
(Sep 23 '10 at 03:02)
Zhibo Xiao
The classifier has to be able to incorporate weight information is its decision (for example, by preferring to get a single high-weight example right than lots of low-weight examples). Many classifiers do have this property, including the decision stumps that are commonly used with adaboost.
(Sep 23 '10 at 03:04)
Alexandre Passos ♦
Ok, I get it now. After one iteration, the weight distribution is changed, and then I can choose the top most high weighted examples to keep the training set size stable. And then, this ordering and picking process goes on till the iteration stops. Thank you for your explanation.
(Sep 23 '10 at 03:08)
Zhibo Xiao
Specifically, you want to iterate over ALL the examples. It's just that some examples have a higher weight multiple on the training loss.
(Sep 23 '10 at 16:52)
Joseph Turian ♦♦
|
|
Like Alexandre said, you should give the weighted training set to the classification algorithm. But if your classification algorithm cannot handle weighted examples, then in each iteration you can just sample a new training set according to the weight distribution. That also works. Thank you, I think I know how to do it in the right way now.
(Sep 23 '10 at 04:17)
Zhibo Xiao
|
|
Is it ok if every iteration I use some random resample with the same size like actual input data, and then I choose the best resample from them which has the least error rate for every iteration?? |