int "A Practical Guide to Support Vector Classi cation", when it refer to scaling of data , it syas : Of course we have to use the same method to scale both training and testing data. For example, suppose that we scaled the rst attribute of training data from[10; +10] to [1; +1]. If the rst attribute of testing data lies in the range [11; +8], we must scale the testing data to [1:1; +0:8]. but in the past ,i think i just both scale the sample data and test data to[-1,+1], it will be ok . now it seems no the thing ,but why we must do as the Guide refers ?who can explain it in detailed ?

asked Sep 11 '11 at 08:10

lpvoid's gravatar image

lpvoid
15347


One Answer:

To keep it simple, imagine you have the same data point in both training and testing datasets. If you used different scaling factors for the different datasets, then obviously after scaling the two data, even if they represent the same data point, wouldn't be identical any more. This is obviously a bad thing, so you need do apply the same scaling to both datasets.

If your datasets are large and there are no big outliers than the maximum of training set and test set should be similar so you can be lucky and still get reasonable results by scaling the datasets separately. But there is really no reason to do that.

This answer is marked "community wiki".

answered Sep 11 '11 at 13:44

Andreas%20Mueller's gravatar image

Andreas Mueller
2686185893

Why not treat the mean and variance as parameters to be estimates? You're implicitly saying that the differences between train and test are so large that that you can't even confidently estimate the mean and variance of each feature, then how would you learn even more complicated associations?

(Sep 14 '11 at 23:12) digdug
1

I think I'm trying to say two things: 1) Axis-wise minimum and maximum are not very robust statistics. 2) Estimating scaling on the training and test set separately is a bad idea.

Even if you can estimate the variance on the trainingset reliably, it might be a very bad idea to estimate it again on the test set - which is what Ipvoid did. Imagine you only have one test point. How do you estimate the scaling there?

(Sep 15 '11 at 03:02) Andreas Mueller
Your answer
toggle preview

powered by OSQA

User submitted content is under Creative Commons: Attribution - Share Alike; Other things copyright (C) 2010, MetaOptimize LLC.