Thanks. |
|
There is no problem at all. Dropout is an elementwise operation. You need to generate a mask of the right size and do an elementwise multiplication to get the new activations. If you have a minibatch of m cases and you have n units then just generate a matrix with rand() of that shape, threshold it at the dropout probability and multiply it into the matrix of activations. If you are still stuck you can see how I do it in the fpropDropout method in the dbn.py file in code downloadable from my website. Don't think of it as dropping connections. Keep all the connections and just zero out activations of dropped units. Then everything should be simple. If you want to use it inside a convolutional neural net as well there should also not be a problem. Ah, I thought that some sparse matrix computation would be involved because if the dropout rate is 0.5, the number of connections would be quartered. Could you also address my next question on using dropout with convolutional layer? (question updated)
(Apr 10 '13 at 22:00)
Zer0ne
Does dropout mean that we suppress the neurons' outputs during forward propagation or only during backpropagation? When we multiply the dropout mask after forward propagation with all outputs of all neurons of the networks that means we only suppress the output during backprop or do we actually have a dropout mask for each layer that is applied after calculating the output of the layer?
(Apr 11 '13 at 05:20)
alfa
2
Dropout does not change the backward pass, just make sure you use the activations after they have been masked from the forward pass.
(Apr 18 '13 at 00:44)
gdahl ♦
|
|
@alfa, dropout is usually applied to both forward and backward calculations. In my own implementations its often possible to apply it during forward propagation and have it automatically applied during backward propagation because the neuron activity is now zero. @Zer0ne. this is an interesting question. Most of literature I've read has not used dropout on CNNs, only applying it to fully connected NNs later in the network. Given the small size of CNNs I would worry about the effect of dropout on them. I think the worry is that there isn't a lot or redundancy in a single CNN. i would be very interested to read about any results you know of using dropout on a CNN, I think it has promise. If I were to apply it I would want to block individual neurons in the CNN rather than pixels, my feeling being that I would not want to block the possibility of another one of my CNNs to learn from that pixel on this example. Besdes with the masking suggestion above (which is the natural way to do it) this approach would be consistent across CNNs and FCNNs. OK, thanks. :) So you actually multiply the dropout mask for each layer separately after the forward propagation of the corresponding layer.
(Apr 12 '13 at 03:12)
alfa
I am open to correction but that is how I do it. Best of luck with your work.
(Apr 12 '13 at 12:09)
DwoaC
|
|
Have a look at the master thesis of Nitish Srivastava: http://www.cs.toronto.edu/~nitish/msc_thesis.pdf. One page 3, the formula for the forward pass is given. You can derive the backward pass in a straight forward manner from it. It should fit easily into your vectorized code. Furthermore, if you are interested in dropout for CNNs, have a look at 'Stochastic Pooling for Regularization of Deep Convolutional Neural Networks' by Zeiler and Fergus. You also may like 'Maxout Networks' by Ian Goodfellow et al. (http://arxiv.org/abs/1302.4389). |