|
I'm having some difficulty in conceptualizing the translation invariance property for convolutional networks. Is the amount of translation invariance primarily dependent on the pooling size? Consider the following single layer convnet for my reason for asking:
Now suppose the network is trained on nothing but the numbers that reside in the top-left (16x16) corner. So the 8 learned kernels will end up training to edges/contours/etc. However, through pooling, only the output neurons in the TOP LEFT ever activate. There is nothing elsewhere in the image. If we test on a number that reside in the bottom-right (16x16), do we expect it to fail because during training, no output units in the bottom-right were activated? |
|
Yes, you are right. Convolutional network doesnt provide translational invariance. It provides what is called equivariance. So the activities change corresponding to the input. When you add the pooling layer, this provides the invariance. You can check Hinton's video on Youtube on this: Achieving viewpoint invariance Thanks for the lecture link. Hinton discusses in part drawing bounding boxes. I suppose this is often a necessity when working with large, dense images that may have many objects in different locations. I.e. localization akin to what Over feat recently accomplished? The only alternative I see is to have many convolutional layers that eventually feed into a final pooling layer with many feature maps, but a single unit output that covers the entire image. But I can imagine that may potentially wash out any interesting structure discovered?
(Mar 27 '14 at 06:54)
Tom Szumowski
I am not sure what Overfeat accomplished. Didnt read the paper. You can also check Deconvolutional Networks. They have a way to unpool! Otherwise generally its a convolutional layer followed by a pooling layer.
(Mar 27 '14 at 06:58)
Sharath Chandra
|
|
I think I may have answered my own question, but I'd like others' thoughts for confirmation. I was looking at the UFLDL Tutorial on Pooling. It says:
I suspect given that, the convolutional network certainly doesn't provide global translational invariance ... unless your pool covers the entire image (in which case too much data is lost to do anything anyway). Thoughts? |