|
Hi there, Rectified linear units (ReLU) converge like gangbusters and have lovely sparsity properties. However, I cannot stop them from diverging to infinity or NaNity unless I sandwich them between two saturating layers (e.g., sigmoid or tanh.) Has anyone gotten these things to be stable? If so, how? I've tried L1 and L2 on the activations and weights, but without success. My next attempt will be to try local contrast normalization. I'm currently implementing with theano. Thanks, Bo |
|
What's the cost function you're using for training? Since ReLUs don't saturate, you need to choose your cost function carefully or it will sail off into nans. |
|
I have the same problem How do you sample input layer in a pure ReLU network? |
|
Try a hyper-parameter search to find a learning rate that works for you. I've spent a fair amount of time training auto-encoders in theano, and found that when using ReLU layers your learning rate must be at least an order of magnitude smaller than for saturating layers. |