|
It seems that in some cases where there is some monotonic relationship between feature values, one would prefer scaling all features by taking the log, since standardisation is not a monotonic transformation (i.e., if feature1_value > feature_2_value in the original space, the order can be reversed after standardising, but not after taking the log). Are there cases you know of, where it matters whether you choose log or you choose standardisation? |
|
I think the log is interesting when the individual / marginal feature distributions are highly non-Gaussian with fat tails (e.g. power laws for word frequencies in text documents for instance). In that case, taking the log make the feature value distributions more Gaussian-like which can make some statistical models behave better as the feature value scales do not span 30 orders of magnitude. |
|
Variable transformation is usually done in order to stabilize variance which helps to improve efficiency of certain estimators. As for standardization, it is a prerequisite for certain estimators such as Elastic Net or SVM to work correctly (pretty much whenever a universal regularization is used in an estimator). Other than that, standardization is not required (some people still do that for convenience though) 1
For some models it also helps numerically. E.g. if sigmoids are used in the model, extreme values make that sigmoid saturate quickly. If gradient based learning is used, there will be no gradient. This holds for RBMs, neural networks et al.
(Dec 14 '11 at 16:16)
Justin Bayer
|