This question is based on Honglak Lee's paper "Convolutional Deep Belief Networks for Scalable Unsupervised Learning of Hierarchical Representations".

In chapter "4.3 Handwritten digit classification", it is written:

We trained 40 first layer bases from MNIST digits, each 12x12 pixels, and 40 second layer bases, each 6x6. The pooling ratio C was 2 for both layers.

MNIST dataset is made of 28x28 images.

If we have 12x12 filter for the first layer, this gives NH=17 (NW=Nv-NH+1) and therefore 40 17x17 feature maps in output. How do we apply pooling C=2 on an odd number ? Np=Nh/C=17/2=8,5

Even if we use a technique (I don't know which) to handle Np=9, once we go to the next layer, we have Nh=4 => Np = 4/2 = 2 and we have only 40 2x2 feature maps for the last layer

If we consider that 12x12 and 6x6 are not the filter shapes but the number of hidden units, this will give us the last layer with 1x1 filter which seems even more wrong.

There must be something that I didn't consider in this or my reasoning is wrong ?

Can someone help me getting the filter shapes and number of visible/hidden units in each layer ?

asked Oct 20 '14 at 11:04

Baptiste%20Wicht's gravatar image

Baptiste Wicht
31121315

Be the first one to answer this question!
toggle preview

powered by OSQA

User submitted content is under Creative Commons: Attribution - Share Alike; Other things copyright (C) 2010, MetaOptimize LLC.