|
I'm looking for a paper that could help in giving a guideline on how to choose the hyperparameters of a deep architecture, like stacked auto-encoders or deep believe networks. There are a lot of hyperparameters and I'm very confused on how to choose them. Also using cross-validation is not an option since training really takes a lot of time! |
|
James Bergstra's work is a good place to start on this topic: http://www.eng.uwaterloo.ca/~jbergstr/research.html#modelsearch http://jaberg.github.io/hyperopt/ If you have too much training data to be able to explore many different configuration of hyper-parameters, try using a random sample from your training data for the hyper-parameter search. Once you have found a good set of hyper-parameters, use them to train on the full set of training data. Hopefully, the hyper-parameters that gave the best results on a random sample will also give good results with the full training set. |