|
Developed by Masci et al., Convolution Auto-encoder algorithm is two-fold: Feature Maps Construction and Reconstruction. Let us forget pooling for brevity. 1) Feature Maps Construction (Understood) Suppose x is an
So, h^k is of size 2) Reconstruction (Problem lies here) After obtaining the feature maps
where Knowing that,
how would Thanks in advance. |
|
As you mentioned, the convolution in the forward direction is a valid convolution, but the convolution for reconstruction should be a full convolution to get the same size as the input. I'm not familiar with this particular model, but in the case of convolutional RBMs, that's how it's done. The size of h^k is (n - m + 1) x (n - m + 1) by the way, you switched m and n I think. The output size of a full convolution is input_size + filter_size - 1, so in this case you get (n - m + 1) + m - 1 = n, as desired. If you only have an implementation for the valid convolution, you can use it to do a full convolution by padding the input with m - 1 zeros on each side. I guessed the reconstruction involved full-convolution - thanks for confirming this. :)
(Mar 10 '14 at 06:37)
Issam Laradji
what I don't understand is why W needs to be flipped
(Mar 11 '14 at 05:11)
Ng0323
In the default definition of 'convolution', it is required that the filters are flipped (I always assumed this is because it makes the math prettier). If you don't flip the filters, what you are computing is called a 'correlation'. However, when you are learning the filters from data, it doesn't actually matter whether the filters are flipped or not (if you flip them in your code, the learnt filters will just be flipped as well). People usually include this flip so what they do adheres to the standard definition of a convolution, but in this context there really is no need for that. In fact, I believe the convolutions in cuda-convnet don't do it, for example (so they are actually correlations).
(Mar 11 '14 at 06:14)
Sander Dieleman
Maybe he is asking why the decoders are defined as the flipped version of the encoders... If that is the case, this is just a regularization choice.
(Mar 11 '14 at 20:06)
eder
thanks for the inputs! so then to flip or not to flip the decoder W is another hyperparameter. I'm quite surprised to learn that in the code it can be either convolution or correlation.
(Mar 11 '14 at 22:13)
Ng0323
|

