A time ago, someone posted a question that I think got a great discussion but never got answered (aside from the comments), and I think the topic is important enough to deserve its own question.

Original question

Why does removing correlated features have a positive effect in higher dimensions, and why do correlated features work better in lower dimensions.

(I already have my own answer, but if any one has intuitive explanations that I might steal....borrow, that would be great)

Regards

asked Nov 29 '12 at 03:02

Leon%20Palafox's gravatar image

Leon Palafox ♦
40857194128

edited Nov 29 '12 at 03:02


2 Answers:

With a linear kernel, correlated inputs allow you to filter out independent noise - averaging

However, adding more dimensions by definitions leads to the curse of dimensionality - ie you need exponential growth in your data to fill your space. Conversely, since you typically have a fixed data set and are free to choose the number of features, adding extra variables will make it more likely you overfit. so the answer is to do some sort of principal components on your features to get the benefit of averaging without inducing the overfitting problems of adding too many variables.

BTW the original question seemed bogus - the difference was from 89% to 87%?! so probably not statistically significant.

answered Dec 04 '12 at 03:31

SeanV's gravatar image

SeanV
33629

The main problem with correlated features is that they over emphasize certain aspects of the data. In particular a lot of machine learning algorithms implicitly assume a Euclidean metric on the feature space, since they use dot products of feature vectors to measure similarity or equivalently they use euclidean distance to measure difference. Thew extreme case of correlated features are duplicate feature, so you can think about the effect of duplicating a feature. This is equivalent to changing the metric on the original data by doubling the duplicated feature. The exact effect of this depends on the algorithm. L2 penalized methods will start to favour the duplicated feature since they can use it at reduced penalty. K-means clustering will produce clusters that are elongated in the original space. This is not always bad. Sometimes correlated features may indicate that some aspect of the data really is more important, particularly if the methods of obtaining those features are independent. However it is usually worth trying to decorrelate the data (using something like svd) at least to see if it helps.

answered Dec 10 '12 at 03:14

Daniel%20Mahler's gravatar image

Daniel Mahler
122631322

edited Jul 16 '13 at 13:54

Your answer
toggle preview

powered by OSQA

User submitted content is under Creative Commons: Attribution - Share Alike; Other things copyright (C) 2010, MetaOptimize LLC.