I am using CNN for learning a 2 class problem. The input is images with 2 classes. (presence of a car vs. absence of a car). The algorithm works beautifully when the images are square sizes (i.e 32x32 etc.) I am running into a problem that some of my images are rectangular (i.e. 14x53 etc.) Is padding the only way to make the images of square sizes to make it work for the CNN? I understand the kernels are 5x5 size. How should the covolution be treated in this case? If any one has ran into similar situation, please provide the best way to approach to resolve the situation.

asked Aug 29 '13 at 14:15

rogerthat's gravatar image

rogerthat
1223


2 Answers:

Padding is surely one way to do this, but works well only when the difference in sizes is limited. But for large variations in sizes, as in your case, the best possible solution it to use varying sizes in the pooling operator. Since pooling combines neighborhood information, and hence the invariance property, images of an object with different sizes with different pooling operations may lead to similar representation.

Just as a rule of thumb, the top-layer (just before feeding it into a logistic regression for classification) pooling operation can be of varying size such that inputs to the classifier are consistent.

answered Sep 05 '13 at 04:18

Rakesh%20Chalasani's gravatar image

Rakesh Chalasani
2641210

I have been wondering this myself and haven't seen this issue addressed. I can only think of padding as the solution. Changing the aspect ratio is possible but makes it harder during detection stage (you then need to consider all the various aspect ratios you used since in real time you wouldn't know what is the actual aspect ratio in advance) .

answered Sep 04 '13 at 22:33

Ng0323's gravatar image

Ng0323
1567915

Your answer
toggle preview

powered by OSQA

User submitted content is under Creative Commons: Attribution - Share Alike; Other things copyright (C) 2010, MetaOptimize LLC.