i am trying to speed up greedy forward feature selection, and i'm looking for some tips, ideas, and references.

for a D-dimensional set of features, the naive implementation of greedy forward selection is O(D^2) - there are D rounds, and in the i-th round D-i+1 features need to be tested.

my two main ideas to improve speed are:

  1. early stopping. if the algorithm reached a level of performance that has not been improved in the last K rounds, then stop.

  2. random subsampling of features in each round. instead of testing all D-i+1 remaining features during the i-th round, instead choose a smaller number of features at random and test those. this may be improved if there is a rough probability distribution on the "usefulness" of features from which the features are be chosen.

does anyone have experience with greedy forward feature selection? any other ideas to speed it up?

thanks.

asked Jun 22 '12 at 18:38

sergeyfeldman's gravatar image

sergeyfeldman
16114

edited Jun 22 '12 at 18:39


2 Answers:

There's another idead I read somewhere that assumes the features are independent, so when you optimize the i+1 feature, you only optimize that one without the previous dimensions.

For example, let's say you already chosen features 1,2 . Now you want to test features 3 to D to see what improves the performance. In normal forward selection you select one dimension and optimize [1,2,3] , [1,2,4]...[1,2,D] and so on.

That paper suggested optimizing only the test feature. Keep the weights for [1,2] fixed and find a single weight for the new dimension 3,4,..D .

This is much faster since you are solving a 1D optimization problem. When you finish the round and find your best feature, then run a full training on [1,2,+selected feature].

answered Jun 23 '12 at 04:29

rm9's gravatar image

rm9
586203833

thank you!

(Jun 24 '12 at 20:06) sergeyfeldman

Very quickly, here are two references that might be of some use:

Also, I should add that with the latest scikit-learn, randomized linear models are supported, and thus the two options above should be fairly easy to implement

answered Jun 23 '12 at 05:37

Gael%20Varoquaux's gravatar image

Gael Varoquaux
92141426

edited Jun 23 '12 at 05:39

thank you!

(Jun 24 '12 at 20:06) sergeyfeldman
Your answer
toggle preview

powered by OSQA

User submitted content is under Creative Commons: Attribution - Share Alike; Other things copyright (C) 2010, MetaOptimize LLC.