This applies to any kind of deep network model where you're using layer by layer pretraining for some kind of MLP. Each layer has multiple hyperparameters, but measures like validation set classification performance are only available if you've constructed the whole model. If you want to try k different hyperparameter settings per layer and the network has depth D, then you would end up having to train k^D networks, which is far too expensive. I can think of a few ways around this, but I'm curious which other people are using in practice: -Randomly sample k sets of hyperparameters for the whole network, and train those k networks. -Randomly sample k sets of hyperparameters for layer N, train layer N, and before building layer N+1, pick the best 1 set of hyperparameters for layer N. Criteria for choosing the best could be validation set error using an MLP trained off of only the layers used so far, or computing some measure of the invariance properties of the features learned by the layer.