I am working on an imbalanced dataset with only 20% as positive class, and the rest belong to the negative class. I have tried using typical techniques for dealing with skewed datasets: undersample, oversample, Ensemble learning algorithms such as Random Forests, boosting algorithms, so on and so forth.

However, my results have plateaued with 0.8 AUC. I'm hoping for Feature Extraction methods or some sort of mapping the dataset which could in turn ameliorate the classification results, given that I only have the number of features ranging from 31 to 60, and samples ranging from 3000 to 10,000.

So what do you suggest ? Thanks in advance..!

asked Apr 10 '13 at 02:41

Issam%20Laradji's gravatar image

Issam Laradji
1217912

edited Apr 10 '13 at 04:49


2 Answers:

I presume you have already tried choosing the positive and negative in equal proportion. Here is a write up on the Naive Bayesian approach which is slightly related to your question

answered Apr 12 '13 at 10:41

Broccoli's gravatar image

Broccoli
15112

You may find the paper [PDF] Learning from Imbalanced Data of interest.

answered Apr 10 '13 at 06:45

amair's gravatar image

amair
2452312

Your answer
toggle preview

powered by OSQA

User submitted content is under Creative Commons: Attribution - Share Alike; Other things copyright (C) 2010, MetaOptimize LLC.