1
1

I dont have real-life practical experience with ML and have done a basic course and had a couple of conceptual questions:

The VC result and Bias Variance result imply that if the number of features is very large then unless the number of training samples is high there is the sceptre of overfitting. So there is the problem of feature selection which has to be done systematically.

However it seems that if one uses regularization in some form then that can serve as a generic antidote to overfitting; and consequently one can ignore the feature dimensionality (assuming for a moment that the computing overhead of large feature set) -- I got that impression from online notes from a couple of courses and I also saw in a recent Google paper that they used logistic regression with regularization on a billion-dimension (highly sparse) feature set.. Is this a correct notion from a statistics that if one uses regularization and is willing to pay the computing costs one can be lax about feature selection?

Is there a theoretical result about the above notion (feature dimensionality and regularization effect on generalization error)?

asked Oct 04 '13 at 01:47

hsolo's gravatar image

hsolo
16234


One Answer:

LASSO can be viewed as a feature selector in itself, and I can recommend you section 3.4 of Hastie's Element of Statistical Learning for a more formal discussion.

In theory you can use LASSO in a large dataset to do your feature selection for you. I'm not particularly fond of using LASSO directly on large sparse datasets though, because when you do the optimization you are processing an incredible amount of zeros that do nothing, I rather condense the dataset first and the operate on it.

answered Oct 07 '13 at 21:21

Leon%20Palafox's gravatar image

Leon Palafox ♦
40857194128

edited Oct 07 '13 at 21:21

Your answer
toggle preview

powered by OSQA

User submitted content is under Creative Commons: Attribution - Share Alike; Other things copyright (C) 2010, MetaOptimize LLC.