Hi there,

I am current working on an auto-encoder using the dropout technique and pretraining. For the theory I refer to Srivastava - 2013 - MSC on Dropout.

On page 3 at the bottom Srivastava writes that dropout is applied on every layer. Does it mean that I drop out units in the input layer and in the output layer too? Or do I only drop out units on every hidden layer? In the case on pretraining, as described on page 4, I will train every layer subsequently with a 3-layer autoencoder (1 hidden layer). There might be 4 cases: Will I have to apply dropout

  1. only in the hidden layer
  2. in the hidden layer and the output layer
  3. in the input layer and the hidden layer
  4. in all the three layers.

asked Sep 16 '13 at 07:35

gerard's gravatar image

gerard
767912


One Answer:

The original dropout paper uses a drop out rate of .2 for the input and .5 for the hidden layers.

answered Sep 16 '13 at 08:42

Justin%20Bayer's gravatar image

Justin Bayer
170693045

This would mean I only have to apply dropout in the input layer and the hidden layer. So I will leave the output layer alone.

(Sep 16 '13 at 12:54) gerard

In general, dropout only happens in layers that serve as input to some later layer. Also it often seems to work better to use a lower dropout rate in the original input layer.

(Sep 16 '13 at 15:25) gdahl ♦
Your answer
toggle preview

powered by OSQA

User submitted content is under Creative Commons: Attribution - Share Alike; Other things copyright (C) 2010, MetaOptimize LLC.