|
Hi there, I am current working on an auto-encoder using the dropout technique and pretraining. For the theory I refer to Srivastava - 2013 - MSC on Dropout. On page 3 at the bottom Srivastava writes that dropout is applied on every layer. Does it mean that I drop out units in the input layer and in the output layer too? Or do I only drop out units on every hidden layer? In the case on pretraining, as described on page 4, I will train every layer subsequently with a 3-layer autoencoder (1 hidden layer). There might be 4 cases: Will I have to apply dropout
|
|
The original dropout paper uses a drop out rate of .2 for the input and .5 for the hidden layers. This would mean I only have to apply dropout in the input layer and the hidden layer. So I will leave the output layer alone.
(Sep 16 '13 at 12:54)
gerard
In general, dropout only happens in layers that serve as input to some later layer. Also it often seems to work better to use a lower dropout rate in the original input layer.
(Sep 16 '13 at 15:25)
gdahl ♦
|