4
1

Where can I find an open-source implementation of ensembles of decision trees, preferably boosted decision trees?

I have seen code for implementing single decision trees. I have seen code for random forests. But where can I find an off-the-shelf implementation of (boosted) ensembles of decision trees?

[Note: My thesis parser minimizes the l1-regularized logistic loss by boosting ensembles of decision trees, and learns over the regularization path. The trees are quite sparse, given the l1-regularization. However, the code is coupled with the parser and does not work out-of-the-box.]

asked Apr 08 '11 at 19:56

Joseph%20Turian's gravatar image

Joseph Turian ♦♦
579051125146

edited Apr 08 '11 at 20:22

ogrisel's gravatar image

ogrisel
498995591


4 Answers:

I think OpenCV's implementation is pretty fast and useable (there are bindings to many languages), although not as intuitive to understand as I would like it.

answered Apr 08 '11 at 20:23

Alexandre%20Passos's gravatar image

Alexandre Passos ♦
2554154278421

The scikit-learn project just received two pull requests: the first one on random forests and the other on boosted decision trees. They are currently under review and those contributions will probably get combined soon.

Anyone knowledgeable with the algorithms can help review and comment those contributions (even if no scikit-learn contributor yet).

answered Apr 08 '11 at 20:27

ogrisel's gravatar image

ogrisel
498995591

edited Apr 08 '11 at 21:07

Can you link to the specific pull requests?

(Apr 08 '11 at 21:04) Joseph Turian ♦♦
1

I rephrased my answer to make it explicit that the links point to the pull requests (and not wikipedia articles as you migh have thought).

(Apr 08 '11 at 21:08) ogrisel

To state the obvious, WEKA provides a Java implementation. Also if it ever becomes public, the Avatar project implements them in C or C++. You may be able to ask for the code from one of the maintainers.

answered Apr 09 '11 at 10:40

Troy%20Raeder's gravatar image

Troy Raeder
89972025

For a quick test of the effectiveness of Weka's random forrest implementation, I suggest you the RapidMiner which includes it.

(Apr 09 '11 at 16:07) Lucian Sasu

if you are still looking, there are a number of tree ensemble methods available in R, particularly randomForest, gbm & mboost. gbm & mboost implement stochastic gradient boosting. gbm is the most scalable of these. they are all available on cran. you can search for more on crantastic.org

answered Jul 27 '11 at 19:58

Daniel%20Mahler's gravatar image

Daniel Mahler
122631322

Your answer
toggle preview

powered by OSQA

User submitted content is under Creative Commons: Attribution - Share Alike; Other things copyright (C) 2010, MetaOptimize LLC.