Hi there!

In the course of pre-processing you usually do some kind of normalisation and reduce the dimensionality via PCA. I was wondering if it's better to calculate e.g. a separate PCA of the test-set, or if I should re-use the principal components that I calculated on the training-set. Same goes for other values such as mean, median, stdev or whatever else is used as normalization. Is it common practice to use the values calculated on the training-set also for the test-set, or not?

asked Jan 13 '12 at 05:35

Untom's gravatar image

Untom
16112


One Answer:

Off-course you should, otherwise your classifier will do random predictions :) There is an example on how to this using scikit-learn.

answered Jan 13 '12 at 05:50

ogrisel's gravatar image

ogrisel
498995591

Your answer
toggle preview

powered by OSQA

User submitted content is under Creative Commons: Attribution - Share Alike; Other things copyright (C) 2010, MetaOptimize LLC.