|
i am trying to speed up greedy forward feature selection, and i'm looking for some tips, ideas, and references. for a D-dimensional set of features, the naive implementation of greedy forward selection is O(D^2) - there are D rounds, and in the i-th round D-i+1 features need to be tested. my two main ideas to improve speed are:
does anyone have experience with greedy forward feature selection? any other ideas to speed it up? thanks. |
|
There's another idead I read somewhere that assumes the features are independent, so when you optimize the i+1 feature, you only optimize that one without the previous dimensions. For example, let's say you already chosen features 1,2 . Now you want to test features 3 to D to see what improves the performance. In normal forward selection you select one dimension and optimize [1,2,3] , [1,2,4]...[1,2,D] and so on. That paper suggested optimizing only the test feature. Keep the weights for [1,2] fixed and find a single weight for the new dimension 3,4,..D . This is much faster since you are solving a 1D optimization problem. When you finish the round and find your best feature, then run a full training on [1,2,+selected feature]. thank you!
(Jun 24 '12 at 20:06)
sergeyfeldman
|
|
Very quickly, here are two references that might be of some use:
Also, I should add that with the latest scikit-learn, randomized linear models are supported, and thus the two options above should be fairly easy to implement thank you!
(Jun 24 '12 at 20:06)
sergeyfeldman
|