|
I have a large dataset with numerical, categorical, and ordinal variables. The response variable has four classes. The troubles are- (1) most of the variables don't have a clear intuitive relationship with response and other variables in order to experiment, (2) there are too many missing values. The majority class has ~70% of the data. Now, I found that the correlation between the variables is poor. Also, initial attempts like naive bayes were very poor. Any suggestions regarding how to go further? Also, kindly suggest what are the things to consider while binning the data. |
|
Naive bayes might not give results, especially if you have to model interactions between the features. A decent baseline would be random forests. It will pick out the features that are useful. What do you mean "binning the data"? |