|
Hi, I am currently training higher-order autoencoder (hae). I verify that the hae is correctly by getting those awesome (transformation) filters. Now, I am trying to train on more interesting dataset, like face data. However, I am facing the problem where the costs decreases gradually up to around 50 epoch, and suddenly the cost and weights blow up. The data set is real value data and I am using sigmoid on the hidden (mapping) layer and no activation function on the reconstruction layer. During the training, I am using weight decay and zeromask 30% noise corruption. I am not exactly sure why it happens or how to fix it. |
|
Try changing the learning rate, sparsity parameter (if any) and the weight decay parameters. Reduce the values. Usually its the problem with high learning rate. I got it to work. But the observation is that the learning rate has to be very small in order for the model to be stable when working with gated auto-encoder or any multiplicative interaction models). So learning takes very long time, specially when you have a real value data and normalizing the data makes the learning more slower...
(Apr 02 '14 at 11:53)
Dannnnn
Using optimization techniques other than stochastic gradient descent with constant learning rate (which I assume is what you are using) may help speed up learning. For instance, using a learning rate decaying in 1 / (1 + nb_updates) and momentum are simple techniques that can work well. Adagrad and adadelta are other methods you could consider implementing and trying out.
(Apr 07 '14 at 05:58)
Pascal Lamblin
RMSProp (being closely related to adadelta) sounds reasonable as well. So does RPROP.
(Apr 07 '14 at 15:19)
Justin Bayer
A question. I thought RMSProp is good when we are on plateau and not at the local minima (when using sigmoid activation function, we are likely to be at plateau), but does it also work well when the activation function is Relu??
(Apr 08 '14 at 11:13)
Dannnnn
|