Momentum and weight decay seem to be helpful techniques to stabilize and the optimization of parameters when training DBNs with layered RBMs (even more so with large and complex convolutional networks). I'd like to use these techniques for a stacked de-noising autoencoder (SdA) model, but it is unclear to me exactly how to apply them to this model. An SdA has two aspects: one is a stack of de-noising auto-encoders, each with its own weight matrix and pair of biases (hidden and visible), and the other is as an MLP where each layer has a bias and weight matrix. The weight matrices and visible biases are shared between the aspects.

I'm pre-training an SdA using unlabeled data, which means I'm doing stochastic gradient descent on mini-batches of a large data set. Currently, each step (or mini-batch) involves one gradient times learning rate update to each of the de-noising autoencoder parameters.

Should I be using momentum smoothing and weight decay updates for all my dA parameters? Or should I just use them for the shared parameters? Will my objective function diverge if the shared parameters (weigh matrices and visible biases) are updated with momentum, but the others are not?

asked Jun 06 '13 at 16:28

LeeZamparo's gravatar image

LeeZamparo
56247


One Answer:

An update: I've done some experiments, and can report that varying weight decay, momentum and corruption values had surprisingly little effect on reconstruction error when pre-training with a fixed model architecture for the data set I used.

What did have a noticeable effect was the value of the learning rate (confirmed that smaller is better for Gaussian distributed input data / visible units, 0.01 being a good initial value), and the number of units in each layer.

I suppose the take home message is to first worry about choosing the right model architecture for your problem, and then about setting the learning rate. Only after these steps do you need to consider more elaborate techniques for regularization and optimization.

answered Jun 11 '13 at 19:00

LeeZamparo's gravatar image

LeeZamparo
56247

Your answer
toggle preview

powered by OSQA

User submitted content is under Creative Commons: Attribution - Share Alike; Other things copyright (C) 2010, MetaOptimize LLC.