|
In Krizhevsky et al: "We did not pre-process the images in any other way, except for subtracting the mean activity over the training set from each pixel. So we trained our network on the (centered) raw RGB values of the pixels" Assuming my data matrix, X, is n-samples by p-features, does this mean I center each column of X? What's the reasoning behind centering each pixel with respect to the dataset? I expect this would remove a bit of the correlation between neighboring pixels for each sample. Is this just to prevent the network from saturating? But it seems like demeaning the rows of the rows of X would have the same effect. |
|
There are numerous reasons why one might perform mean subtraction prior to training. For a neural network, subtracting the mean makes it easier to set initial random weights and can reduce training time. Your quote mentions use of "raw RGB values" so they may be subtracting the mean to reduce effects of unknown scene illumination conditions. But note that subtracting the mean will not remove correlation between pixels (calculations of correlation and covariance already remove the mean). For an Thanks for correcting my statement about correlations. So for a full RGB image representation, would your matrix be n x (p*k) where k is the total number of pixels? What I meant to say was that the inherent smoothness/consistency between neighboring pixels would be lost for each individual image (ie the normalization would make it look much less like an image). It seems like it would be harming things like rotation/shift invariance since the normalization assumes each pixel globally represents the same feature in the dataset. I can understand this centering being more useful in non-image tasks than image related ones.
(Apr 03 '13 at 11:21)
cdrn
From just the quote you provided, I don't know if they're operating on individual pixels (ignoring neighbors) or using the neighborhood around each pixel during training/classification. If you are operating on independent pixels, then for an I'd be careful about using the term "normalization" to refer to mean subtraction because normalization typically involves scaling (stretching or contracting) of the data values, which is not happening with mean subtraction. From an algorithmic perspective, there is no loss of smoothness or consistency due to mean subtraction (vector differences between pixels in an image will remain unchanged). Of course, the image would visually appear different if rendered with means subtraction. If you use one image (or set of images) for classification and then try to classify pixels in a new image, you don't want your algorithm to perform poorly due to a mean offset between the training data and new data (e.g., due to the overall scene being brighter in the new image). By subtracting the RGB means separately from each image, you can mitigate that effect (to the extent that it can be represented by an additive offset).
(Apr 03 '13 at 12:48)
bogatron
I think we might be referring to different types of centering. To simplify things, let's refer to 300x300 grayscale images. For a dataset of 10000 images, my datastructure would be 10000x90000, would I center the columns of this matrix by subtracting each row by the same 90000 element vector?
(Apr 03 '13 at 15:02)
cdrn
If you are trying to reproduce the quoted experiment, given the example you just stated, it isn't clear whether you would subtract a scalar from your 10000x90000 matrix (mean pixel value subtraction) or subtract a common length-90000 vector from all rows of the matrix (mean image subtraction). I would need to know more about what the author did in his/her experiment (details are missing from the quote).
(Apr 03 '13 at 20:26)
bogatron
Both row-wise and column-wise normalisation (and I guess by extension subtracting the mean, which is normalisation without the scaling step) can be useful. If each row is an example, as in your description, then normalising each row boils down to a form of brightness/contrast normalisation in the case of image data. I suppose by leaving out the scaling step it affects the brightness only. Normalising each column in the matrix amounts to 'feature-wise' (or in this case pixel-wise) normalisation, which I think is particularly useful for models trained with gradient descent, since that tends to work better if the all input features have the same scale. Of course if the scaling step is left out, I suppose this only affects the bias terms in such models (which can also be useful, since you save time not having to learn the right biases first).
(Apr 04 '13 at 05:46)
Sander Dieleman
|
|
In the quoted experiment they subtracted the mean pix value for the batch. |