1
1

I have a multiclass classification problem with 5 classes. However, the dataset is quite unbalanced. Now I am wondering what would be the best approach:

1) Use a multiclass learner => Is there a learner that automatically adjusts the weights to deal with the unbalancedness?

2) Treat the problem as multiple binary classification problem (a manual one-vs-rest), where I do manual weighting?

3) Option 2, but with a technique that automatically weights the examples to deal with the unbalancedness, if any?

asked Dec 08 '13 at 13:38

Vam's gravatar image

Vam
105111415

I think "active learning" is what deals with this kind of problem. Have you tried papers like this: http://www.ele.uri.edu/faculty/he/PDFfiles/ImbalancedLearning.pdf

(Dec 13 '13 at 11:11) eder

2 Answers:

for 3), come classifier in scikit-learn can set class_weight to auto. e.g. http://scikit-learn.org/stable/modules/generated/sklearn.svm.SVC.html

answered Dec 14 '13 at 08:48

tpeng's gravatar image

tpeng
11

How unbalanced? Have you already tried repeating the full set of data for the less populated labels?

answered Dec 11 '13 at 13:09

Paul%20Denya's gravatar image

Paul Denya
1112

repeating the dataset may only bias the classifier... this should be the same as weighting the samples, but I guess there should be more elaborated strategies properly weight the observation data.

(Dec 13 '13 at 11:13) eder
Your answer
toggle preview

powered by OSQA

User submitted content is under Creative Commons: Attribution - Share Alike; Other things copyright (C) 2010, MetaOptimize LLC.