Hi there,

Rectified linear units (ReLU) converge like gangbusters and have lovely sparsity properties. However, I cannot stop them from diverging to infinity or NaNity unless I sandwich them between two saturating layers (e.g., sigmoid or tanh.)

Has anyone gotten these things to be stable? If so, how? I've tried L1 and L2 on the activations and weights, but without success. My next attempt will be to try local contrast normalization. I'm currently implementing with theano.

Thanks, Bo

asked Jun 24 '14 at 21:19

Bo%20Anderson's gravatar image

Bo Anderson
1112

edited Jun 24 '14 at 21:23


3 Answers:

What's the cost function you're using for training? Since ReLUs don't saturate, you need to choose your cost function carefully or it will sail off into nans.

answered Jul 21 '14 at 19:09

cwb's gravatar image

cwb
1112

I have the same problem

How do you sample input layer in a pure ReLU network?

answered Jul 21 '14 at 11:59

drgs's gravatar image

drgs
393

Try a hyper-parameter search to find a learning rate that works for you. I've spent a fair amount of time training auto-encoders in theano, and found that when using ReLU layers your learning rate must be at least an order of magnitude smaller than for saturating layers.

answered Jul 07 '14 at 16:22

LeeZamparo's gravatar image

LeeZamparo
56247

Your answer
toggle preview

powered by OSQA

User submitted content is under Creative Commons: Attribution - Share Alike; Other things copyright (C) 2010, MetaOptimize LLC.