I have a large dataset with numerical, categorical, and ordinal variables. The response variable has four classes. The troubles are- (1) most of the variables don't have a clear intuitive relationship with response and other variables in order to experiment, (2) there are too many missing values. The majority class has ~70% of the data.

Now, I found that the correlation between the variables is poor. Also, initial attempts like naive bayes were very poor. Any suggestions regarding how to go further? Also, kindly suggest what are the things to consider while binning the data.

asked Sep 02 '12 at 23:18

Ankur%20Pandey's gravatar image

Ankur Pandey
1224

edited Sep 02 '12 at 23:21


One Answer:

Naive bayes might not give results, especially if you have to model interactions between the features.

A decent baseline would be random forests. It will pick out the features that are useful.

What do you mean "binning the data"?

answered Sep 08 '12 at 05:32

Joseph%20Turian's gravatar image

Joseph Turian ♦♦
579051125146

Your answer
toggle preview

powered by OSQA

User submitted content is under Creative Commons: Attribution - Share Alike; Other things copyright (C) 2010, MetaOptimize LLC.