Hi I am right now coding Adaboost.M1 by myself and I came up with this problem. Each round of iteration, the algorithm updata the weight of the training set, the misclassified example is weighted higher, right ones are weighted lower.

My question is after one iteration, what kind of data should I use to train for the next iteration? I am confused with these following two scenarios:

  1. Only the misclassified one in the training set. In this way, the misclassified example will get fewer and fewer, so I am not sure whether this is right.
  2. Dump the right classified data and use misclassified ones in all previous iterations plus some other data to keep the number of training data at a constant level.

Thank you.

asked Sep 23 '10 at 02:49

Zhibo%20Xiao's gravatar image

Zhibo Xiao
26571213

edited Sep 23 '10 at 23:56

Joseph%20Turian's gravatar image

Joseph Turian ♦♦
579051125146


3 Answers:

I think the training set is always the same set, and the only thing that changes is the weight of the examples. Does this answer your question?

The only issue I can think with this approach is that it will necessarily be slow, but this is a bit unavoidable, otherwise you're not really optimizing the exponential loss.

answered Sep 23 '10 at 02:57

Alexandre%20Passos's gravatar image

Alexandre Passos ♦
2554154278421

Then in this way, how can the weighted examples influence the classifier, the training set is basically the same, only the misclassified ones appended a weight?

(Sep 23 '10 at 03:02) Zhibo Xiao

The classifier has to be able to incorporate weight information is its decision (for example, by preferring to get a single high-weight example right than lots of low-weight examples). Many classifiers do have this property, including the decision stumps that are commonly used with adaboost.

(Sep 23 '10 at 03:04) Alexandre Passos ♦

Ok, I get it now. After one iteration, the weight distribution is changed, and then I can choose the top most high weighted examples to keep the training set size stable. And then, this ordering and picking process goes on till the iteration stops. Thank you for your explanation.

(Sep 23 '10 at 03:08) Zhibo Xiao

Specifically, you want to iterate over ALL the examples. It's just that some examples have a higher weight multiple on the training loss.

(Sep 23 '10 at 16:52) Joseph Turian ♦♦

Like Alexandre said, you should give the weighted training set to the classification algorithm. But if your classification algorithm cannot handle weighted examples, then in each iteration you can just sample a new training set according to the weight distribution. That also works.

answered Sep 23 '10 at 03:56

Umar's gravatar image

Umar
904

Thank you, I think I know how to do it in the right way now.

(Sep 23 '10 at 04:17) Zhibo Xiao

Is it ok if every iteration I use some random resample with the same size like actual input data, and then I choose the best resample from them which has the least error rate for every iteration??

answered May 16 '12 at 03:25

tiopramayudi's gravatar image

tiopramayudi
46235

Your answer
toggle preview

powered by OSQA

User submitted content is under Creative Commons: Attribution - Share Alike; Other things copyright (C) 2010, MetaOptimize LLC.