|
Hi, I try to analyse data using a boosted decision tree. For this purpose I have a variety of discriminating features. However, I do not want to use all of them, but rather select the most powerful once. Further I need to select a model for my BDT, i.e. the parameters like number of trees or minimum number of events in each leaf. The latter is done by a grid search using a 5-fold cross validation procedure. Now the problem for me is the correlation between those optimisations: For a given set of parameters of my model I would select a specific set of features, while for the next point in my grid I maybe select a different set of features. So what I would need to do (in my understanding) is a feature selection at each point in my grid to select the optimal set of parameters/features. This is of course very cpu time consuming and I'm not sure if there is any straight forward way to implement this within scikit-learn. So my question would be if anyone has some experience with such kind of problems and if there is any solution to this? Or if it at all matters, or if it is a negligible effect? Many thanks in advance!!! Cheers, Marcus |
|
Hi, thanks a lot for your replies (and the very interesting paper). I actually haven't thought about the feature selection done at each grid point. So this way should be fine. Thanks |
|
You can do boosting with sparsity and structured sparsity, as shown in this paper by John Duchi and Yoram Singer. |
|
A decision tree implicitly performs feature selection during its construction. If a feature is not useful for discrimination, it will not be selected at any point in the node splitting process, unless you request a very deep tree. So, you really have only 2 parameters for model selection - tree depth and number of trees. |