|
I have a multiclass classification problem with 5 classes. However, the dataset is quite unbalanced. Now I am wondering what would be the best approach: 1) Use a multiclass learner => Is there a learner that automatically adjusts the weights to deal with the unbalancedness? 2) Treat the problem as multiple binary classification problem (a manual one-vs-rest), where I do manual weighting? 3) Option 2, but with a technique that automatically weights the examples to deal with the unbalancedness, if any? |
|
for 3), come classifier in scikit-learn can set class_weight to auto. e.g. http://scikit-learn.org/stable/modules/generated/sklearn.svm.SVC.html |
|
How unbalanced? Have you already tried repeating the full set of data for the less populated labels? repeating the dataset may only bias the classifier... this should be the same as weighting the samples, but I guess there should be more elaborated strategies properly weight the observation data.
(Dec 13 '13 at 11:13)
eder
|
I think "active learning" is what deals with this kind of problem. Have you tried papers like this: http://www.ele.uri.edu/faculty/he/PDFfiles/ImbalancedLearning.pdf