Is there any specific classifier which can handle unbalanced data set? I have a data set which contains 80% of instances from one class and rest from another class. When I trained classifiers like SVM, MaxEnt , it predicts every class as the one which is majority class. Could someone please suggest me some idea on it? How to improve the prediction accuracy.

asked Jan 03 '13 at 11:08

Kuri_kuri's gravatar image

Kuri_kuri
293273040


3 Answers:

Class and instance weighting is a great way to go. (As is under-sampling or over-sampling. Though in some situations these are equivalent.)

Decision Trees and their variants (AdaBoost, Random Forests, etc.) also perform fairly well on imbalanced classes. The information gain (and other entropy-based) splitting criteria are more sensitive to the relative distribution of minority classes than ML, or MCE-like loss functions.

answered Jan 03 '13 at 19:25

Andrew%20Rosenberg's gravatar image

Andrew Rosenberg
173772540

One comment about random forests and class imbalance: I've found that the rank ordering is just fine, and on several data sets was equally good as the bag balancing Daniel mentions. The problem is you need to adjust the threshold between classes to account for the class imbalance.

(Jan 08 '13 at 14:36) Art Munson

Many of the available implementations support either class or instance weights. Weighing the the minority class 4 times the weight of the majority class weight should work for your data with those classifiers. Many of the classifiers in scikit take the string 'auto' as the weights parameter & it will set the weights to rebalance the training data.

If your classifier does not support weights, you can resample your training data and either sample the minority class items repeatedly or just undersample the majority class.

You can also try a classifier that optimizes a loss function that does not have this problem. AUC is is one such measure. Philip Kegelmeyer recommends Hellinger distance. Unfortunately not many freely available libraries offer this, so you may need to implement these yourself, but I believe Vowpal Wabbit, Sofia-ML, and the mboost library for R offer AUC optimization.

answered Jan 03 '13 at 16:05

Daniel%20Mahler's gravatar image

Daniel Mahler
122631322

edited Jan 06 '13 at 02:35

One way to approach the class imbalance problem is by transforming the classification problem to ranking. I recently read this excellent blog post which discusses the practical machine learning tricks from the KDD 2011 best industry paper published by several googlers. Handling the class imbalance problem is one of them. I suggest you to take a look at the blog post as well as the original paper. I haven't had a chance to try it yet, so if you decided to do it, share your experiences.

answered Jan 05 '13 at 22:36

Martin%20SAVESKI's gravatar image

Martin SAVESKI
15634

edited Jan 05 '13 at 22:37

Your answer
toggle preview

powered by OSQA

User submitted content is under Creative Commons: Attribution - Share Alike; Other things copyright (C) 2010, MetaOptimize LLC.