I have a CNN with the following specifications:

Input layer size = 29x29 (MNIST hand written digit database, resized data) Hidden layers (1) = 5 feature maps (25x25), receptive field size = 5x5 (2) = 5 feature maps (13x13), max pooling layer (3) = 50 feature maps (9x9), convolution layer (4) = 50 feature maps (5x5), max pooling layer (5) = 100 feature maps (1x1), convolution layer, receptive field size = 5x5 Output layer = 10 (refering to MNIST, 10 output classes)

Training is done using BP, with mini batch size of 200 (Learning rate is 0.3, Momentum = 0.8). The CNN gets stucked at MSE 0.89~0.90 even after 100 epochs. What can be the possible reasons ?

Thanks for your time and help.

asked Oct 16 '13 at 00:10

gul's gravatar image

gul
30334


3 Answers:
-1

thanks for the reply. It starts from some point, say 2.0, then starts decreasing and once it reaches (usually during first epoch) 0.9, then it oscillates in the range 0.89~0.90. one of the problem was related to back propagating the error delta using argmax for max pooling case. However it seems the problem is some where else. I suspect my error back propagation is not perfect. Can you suggest some easy to follow tutorial for BP in the context of CNN ? Thanks for your time.

answered Oct 16 '13 at 06:48

gul's gravatar image

gul
30334

could you please reply in the comments and not as new answers, that is not the correct way to use this.

(Oct 17 '13 at 14:01) Leon Palafox ♦

ok sorry! In future I will take care of this.

(Oct 18 '13 at 04:50) gul
-1

I found that the 3rd convolution layer (50-100 with 5x5 feature map size and 5x5 convolution kernel) is causing high input values for next (100 units) layer. The sigmoid in this case is +1 and derivative is zero. Due to this error deltas are not propagated well. How can I avoid this problem ?

answered Oct 17 '13 at 10:53

gul's gravatar image

gul
30334

I would try different learning rates. 0.3 seems to be pretty high. Try 0.1, 0.01 and 0.001. And maybe it should decay exponentially until some minimum value has been reached.

(Oct 26 '13 at 09:40) alfa
-1

Or any idea how can I debug this problem ?

answered Oct 16 '13 at 03:14

gul's gravatar image

gul
30334

do you mean it starts out at 0.89 and never decreases i.e a horizontal line? or it decreases to a plateau?

(Oct 16 '13 at 04:25) Ng0323
1

Maybe use a different activation function. Rectified linear units (ReLu) Glorot- 2011 in the decoding layer and linear units (just f(x)=x and f'(x)=1) in the decoding layer work fine for me. In literature ReLu are seen very likely to be superior to tanh and sigmoid. Another option would be pretraining. That is an greedy-layerwise algorithm that defines an local unsupervised training criterion at every layer. So you learn the weights of each layer one after another. Then you can intialize the whole network with these weights and do backpropagation for finetuning again. It avoids vanishing (to zero) gradients.

(Oct 26 '13 at 07:01) gerard
Your answer
toggle preview

powered by OSQA

User submitted content is under Creative Commons: Attribution - Share Alike; Other things copyright (C) 2010, MetaOptimize LLC.