2
1

Does using max pooling (choosing the max value from a set of inputs) instead of subsampling (averaging set of inputs) affect how backpropagation is performed for a convolutional neural net? I ask because max pooling is an unusual operation, but none of the papers using maxpooling mention differences in backpropagation.

asked May 20 '11 at 20:59

Jacob%20Jensen's gravatar image

Jacob Jensen
1914315663


2 Answers:

Sorry I'm in a hurry so just a quick answer. Take a look at this paper: Evaluation of Pooling Operations in Convolutional Architectures for Object Recognition.

I think it has quite some detail.

answered May 21 '11 at 05:52

Andreas%20Mueller's gravatar image

Andreas Mueller
2686185893

edited May 21 '11 at 06:07

ogrisel's gravatar image

ogrisel
498995591

The simple answer is that the error signal backpropagates only through the "max" feature, which makes a lot of sense, and should have been my first guess. The paper states that this "results in sparse error signals" which is a big computational bonus, actually.

(May 22 '11 at 19:56) Jacob Jensen
-1

Yes, you need to compute the jacobian (gradient of a vector function) of each stacked element of your network to be able to do back-propagation. As the strict max-operation is not derivable (and not even continuous) you cannot use max-pooling in a CNN. However you can probably approximate it with a smooth version such as soft-max. I wonder if it brings any practical improvement over the much simpler averaging step.

Edit: this answer is completely wrong, see the comments for details.

answered May 20 '11 at 21:38

ogrisel's gravatar image

ogrisel
498995591

edited May 21 '11 at 06:02

1

But people do use max pooling for CNNs. A lot of research in object recognition shows that max pooling works better than averaging since rare features will get "averaged out" in averaging, but will be maintained by max pooling. See: http://www.idsia.ch/~juergen/vision.html http://deeplearning.net/tutorial/lenet.html

The latter seems to indicate a regular training process.

It would be quite odd to train the net as if it were providing a different output than it is, but stranger heuristics have been used.

(May 20 '11 at 21:51) Jacob Jensen
1

Indeed I made a mistake, the multivariate max operator is perfectly derivable except on some hyperplanes which can be ignored in practice when doing SGD.

(May 20 '11 at 22:05) ogrisel

@ogrisel: I think this is the correct interpretation :)

(May 21 '11 at 05:53) Andreas Mueller

Where can I find a paper that derives the gradient for max pooling and describes backprop with max pooling?

(Apr 27 '14 at 06:08) twerdster
Your answer
toggle preview

powered by OSQA

User submitted content is under Creative Commons: Attribution - Share Alike; Other things copyright (C) 2010, MetaOptimize LLC.