|
I am using CNN for learning a 2 class problem. The input is images with 2 classes. (presence of a car vs. absence of a car). The algorithm works beautifully when the images are square sizes (i.e 32x32 etc.) I am running into a problem that some of my images are rectangular (i.e. 14x53 etc.) Is padding the only way to make the images of square sizes to make it work for the CNN? I understand the kernels are 5x5 size. How should the covolution be treated in this case? If any one has ran into similar situation, please provide the best way to approach to resolve the situation. |
|
Padding is surely one way to do this, but works well only when the difference in sizes is limited. But for large variations in sizes, as in your case, the best possible solution it to use varying sizes in the pooling operator. Since pooling combines neighborhood information, and hence the invariance property, images of an object with different sizes with different pooling operations may lead to similar representation. Just as a rule of thumb, the top-layer (just before feeding it into a logistic regression for classification) pooling operation can be of varying size such that inputs to the classifier are consistent. |
|
I have been wondering this myself and haven't seen this issue addressed. I can only think of padding as the solution. Changing the aspect ratio is possible but makes it harder during detection stage (you then need to consider all the various aspect ratios you used since in real time you wouldn't know what is the actual aspect ratio in advance) . |