|
Was hoping someone here might have some pointers for me as I'm clearly doing something wrong. I have a dataset where there aren't many samples (less than 4000) and it's got a large number of features (over 2000) with many of those features missing data. The feature values vary with binary, and real numbered values, some landing between 0 and 1, others somewhere between 0 and 10. The goal is binary classification. My best result yet (which has still been rather poor) was to feed it into the DBN code provided in the deep learning tutorials without any modifications to the data. Since then, I have been unable to improve performance. In general, I've been trying to use a stacked contractive autoencoder to initialize the weights of stacked RBMs, then use back propogation. Essentially, the autoencoders are trained one at a time, with each one feeding it's output (using sigmoid activation function) into the next layer. Upon training all of them, I create a series of RBMs and initialize them with the weights learned by the autoencoder, and I add one more RBM layer on top of that with randomly initialized weights which seemed to be the method suggested in http://www.icml-2011.org/papers/455_icmlpaper.pdf. I have tried various learning rates, various number of layers as well as various numbers of hidden units. I have tried normalizing the data first to between 0 and 1. I have tried PCA to reduce the features first (although I think that's the autoencoder's job). I have tried converting the data to binary values (adding a large number of new features in the process). I have tried simply sorting by features that have the least null values and including varying numbers of the most included features (this actually seemed to work reasonably well, better than other methods). I'm pretty sure I'm doing something wrong because none of this has worked, so I wonder if anyone could shed some light on what I might try instead. Many thanks. |
|
It sounds like you don't have enough data to learn a deep model. In general, there are many possible deep models, and you must have enough data to tell them apart, or you won't generalize. Are you sure you can't get equivalent results with much simpler models, like clustering with k-means, or classification with linear/gaussian svms? 1
I think you should try simple models first as well. Since you have only little data, a well engineered generative approach (model p(x|class) and p(class) and use Bayes to get p(class|x)) might as well work very well.
(Jun 06 '12 at 05:46)
Justin Bayer
|
|
Start with simple models first before trying deep nets. Maybe a deep neural net of some sort will help, but you won't have a good baseline to compare it to without trying some simpler things first (and it is quite possible your problem isn't well suited to a massive DBN). Here are two straightforward things I think you should try before you do anything elaborate:
There are a million other things you can try after these two, but keep things simple. |