In 'What does classifying more than 10,000 image categories tell us?' they use the Imagenet dataset which consists of more than 10,000 classes with a total of 9 million images. The training is done with 200 to 1500 training images per category, with an average of 450.

My question is, how do you train this data correctly? I learned that the training set needs to be unbiased, meaning that there should be the same number of images for all classes. If this is not the case, one needs to use weighting. But they don't mention any special procedures, so I thought, maybe it is not necessary after all.

[update]

I just saw this MO post Handling data imbalance in classification, but I doubt, that they generated more images to fill up the small classes.

[update 2] Okay, I searched further and found a similar question at Quora: In classification, how do you handle an unbalanced training set?

My short summary would be:

  • resample the unbalanced training set into several balanced sets. "Running an ensemble of classifiers on these sets could produce a much better result than one classifier alone." via What's a good approach to binary classification when the target rate is minimal?
  • "Do not use accuracy alone as a metric. That way, we would get 99% accuracy with everything classified as the majority class, which would not mean anything. Precision & Recall, with respect to each class might be a better one."
  • Try cost sensitive classifier (http://weka.wikispaces.com/CostS...)
  • create more examples artificially
  • resample (if possible)

asked Sep 17 '12 at 08:57

Nick%20R's gravatar image

Nick R
1517913

edited Sep 18 '12 at 13:11


2 Answers:

Your answers are essentially correct.

It's an empirical question which approach will work best.

Before you do anything, take a step back and think about the right evaluation measure.

For example, if a false negative is much worse than a false positive, then you should reweight the examples.

If a false negative is just as bad as a false positive, then you might not want to do anything. This is because, on the original training set, the classifier will predict MANY negatives and thus have many true negatives, at the expense of a few false negatives.

answered Sep 18 '12 at 20:00

Joseph%20Turian's gravatar image

Joseph Turian ♦♦
579051125146

Your answer
toggle preview

powered by OSQA

User submitted content is under Creative Commons: Attribution - Share Alike; Other things copyright (C) 2010, MetaOptimize LLC.