I wanted to get something cleared up regarding the training of neural networks with dropout using minibatch gradient descent.

The original paper by Hinton et al. says:

On each presentation of each training case, each hidden unit is randomly omitted from the network with a probability of 0.5 [...]

This implies that a dropout mask is sampled for every training example. Since an update is computed based on a minibatch of K training examples, that means K different masks are sampled for this update.

In a more recent paper by Goodfellow et al. on Maxout networks:

In this regime, each update can be seen as making a significant update to a different model on a different subset of the training set.

Based on this, it would make more sense to sample only a single dropout mask for the given minibatch, and use the same one for all examples in the minibatch (since this update is then effectively operating on a single model, and not K different models).

But on the next page of this paper:

On each presentation of a training example, we train a different sub-model [...]

So then it sounds like K different dropout masks are used, and thus K different models are updated.

I guess I'm just a bit confused by the wording and terminology, so I'd like to know which method is 'correct' (i.e. how it's usually done): do I sample a new dropout mask per example or per update? Right now I'm leaning towards the former, which would probably be easier to implement as well.

I imagine both approaches will work (I haven't tried it yet), but maybe one works significantly better than the other.

Just to clarify: I know that I need to sample a fresh dropout mask every time the same training example is reused (i.e. keeping it constant throughout training is incorrect).

asked Apr 02 '13 at 10:16

Sander%20Dieleman's gravatar image

Sander Dieleman
155672734


One Answer:

Per example is how it is usually done. The intuition for why it should be better is the same as for why in SGD it makes sense to get a fresh minibatch of data instead of doing a second update on the same minibatch.

answered Apr 02 '13 at 18:59

gdahl's gravatar image

gdahl ♦
341453559

Your answer
toggle preview

powered by OSQA

User submitted content is under Creative Commons: Attribution - Share Alike; Other things copyright (C) 2010, MetaOptimize LLC.