It seems that in some cases where there is some monotonic relationship between feature values, one would prefer scaling all features by taking the log, since standardisation is not a monotonic transformation (i.e., if feature1_value > feature_2_value in the original space, the order can be reversed after standardising, but not after taking the log). Are there cases you know of, where it matters whether you choose log or you choose standardisation?

asked Nov 24 '11 at 07:25

Georgiana%20Ifrim's gravatar image

Georgiana Ifrim
1514414


2 Answers:

I think the log is interesting when the individual / marginal feature distributions are highly non-Gaussian with fat tails (e.g. power laws for word frequencies in text documents for instance). In that case, taking the log make the feature value distributions more Gaussian-like which can make some statistical models behave better as the feature value scales do not span 30 orders of magnitude.

answered Nov 24 '11 at 09:44

ogrisel's gravatar image

ogrisel
398464480

Variable transformation is usually done in order to stabilize variance which helps to improve efficiency of certain estimators.

As for standardization, it is a prerequisite for certain estimators such as Elastic Net or SVM to work correctly (pretty much whenever a universal regularization is used in an estimator). Other than that, standardization is not required (some people still do that for convenience though)

answered Dec 14 '11 at 12:34

Yevgeny's gravatar image

Yevgeny
3113

edited Dec 28 '11 at 17:28

1

For some models it also helps numerically. E.g. if sigmoids are used in the model, extreme values make that sigmoid saturate quickly. If gradient based learning is used, there will be no gradient. This holds for RBMs, neural networks et al.

(Dec 14 '11 at 16:16) Justin Bayer
Your answer
toggle preview

Subscription:

Once you sign in you will be able to subscribe for any updates here

Tags:

×2
×1

Asked: Nov 24 '11 at 07:25

Seen: 900 times

Last updated: Dec 28 '11 at 17:29

powered by OSQA

User submitted content is under Creative Commons: Attribution - Share Alike; Other things copyright (C) 2010, MetaOptimize LLC.