I wonder what "filter" and "feature map" mean when architectures of a convolutional neural networks are described. Some paper says "the layer has M feature maps" and another paper say "the layer convolves N filters." What is the relation between these numbers, M(=number of feature maps) and N(=number of filters)?

For example, if

  • (m-1)-th layer has M1 feature maps and
  • m-th layer (which is a convolution layer) has M2 feature maps,

then is it that m-th layer convolves M1*M2 filters? Or M2 filters?

Another example is, if

  • input of the network is 100x100x4 and
  • 1st hidden layer convolves 16 10x10 filters,

then does 1st hidden layer have 4(=16/4) feature maps? Or 16? Is it possible they are, in fact, 10x10x4 3D filters?

asked Jan 21 '14 at 14:44

aaron's gravatar image

aaron
6113


One Answer:

The term 'feature map' is usually used to mean the result of convolving one of the filters with the input. So in general the number of feature maps will be equal to the number of filters. As an example, if you have 32x32 input images and a convolutional layer with 16 5x5 filters, you will get 16 28x28 feature maps at the output of this layer.

So to answer your questions: the m-th layer convolves M2 filters. In the usual case, each of these filters sees all M1 feature maps of the previous layer. Sometimes, filters are restricted to see only part of the feature maps of the previous layer (this is called a 'sparse convolution layer' in cuda-convnet).

The 1st hidden layer in your second example will output 16 feature maps, because it has 16 filters.

answered Jan 22 '14 at 08:46

Sander%20Dieleman's gravatar image

Sander Dieleman
155672734

Your answer
toggle preview

powered by OSQA

User submitted content is under Creative Commons: Attribution - Share Alike; Other things copyright (C) 2010, MetaOptimize LLC.