|
Does using max pooling (choosing the max value from a set of inputs) instead of subsampling (averaging set of inputs) affect how backpropagation is performed for a convolutional neural net? I ask because max pooling is an unusual operation, but none of the papers using maxpooling mention differences in backpropagation. |
|
Sorry I'm in a hurry so just a quick answer. Take a look at this paper: Evaluation of Pooling Operations in Convolutional Architectures for Object Recognition. I think it has quite some detail. The simple answer is that the error signal backpropagates only through the "max" feature, which makes a lot of sense, and should have been my first guess. The paper states that this "results in sparse error signals" which is a big computational bonus, actually.
(May 22 '11 at 19:56)
Jacob Jensen
|
|
Yes, you need to compute the jacobian (gradient of a vector function) of each stacked element of your network to be able to do back-propagation. As the strict max-operation is not derivable (and not even continuous) you cannot use max-pooling in a CNN. However you can probably approximate it with a smooth version such as soft-max. I wonder if it brings any practical improvement over the much simpler averaging step. Edit: this answer is completely wrong, see the comments for details. 1
But people do use max pooling for CNNs. A lot of research in object recognition shows that max pooling works better than averaging since rare features will get "averaged out" in averaging, but will be maintained by max pooling. See: http://www.idsia.ch/~juergen/vision.html http://deeplearning.net/tutorial/lenet.html The latter seems to indicate a regular training process. It would be quite odd to train the net as if it were providing a different output than it is, but stranger heuristics have been used.
(May 20 '11 at 21:51)
Jacob Jensen
1
Indeed I made a mistake, the multivariate max operator is perfectly derivable except on some hyperplanes which can be ignored in practice when doing SGD.
(May 20 '11 at 22:05)
ogrisel
@ogrisel: I think this is the correct interpretation :)
(May 21 '11 at 05:53)
Andreas Mueller
Where can I find a paper that derives the gradient for max pooling and describes backprop with max pooling?
(Apr 27 '14 at 06:08)
twerdster
|