1
1

Random Forests are ensembles of unpruned decisions trees, which are said to be very accurate, as well as relatively fast to train.

However, the issue of prediction-speed is rarely mentioned.

Assuming I learned a forest of 100 decision trees, the naive inference algorithm would be to apply all of these to my data point, and take the majority vote. This is 100 times slower than using just one tree. Are there tricks to make this faster?

asked Sep 17 '10 at 09:08

yoavg's gravatar image

yoavg
741122331


4 Answers:

Try randomly sampling trees from the forest (with replacement) and taking a majority vote of the sampled trees. If p is the fraction of all trees that vote "yes", and p' is the fraction of sampled trees that vote "yes", then |p - p'| <= e with probability 1-delta, provided you sample at least (1/e^2) log (1/delta) trees. Note that this number of samples is independent of the number of trees in the forest.

answered Sep 17 '10 at 17:40

Umar's gravatar image

Umar
904

Sampling may work, but it is an approximate solution. I was hoping for something more exact.

(Sep 20 '10 at 04:44) yoavg

Bucila et al's approach is effective, largely because it allows you to generate a huge set of training data that allows the training of a replica model for the random forest.

There is a simpler method, however, that can produce reasonable compression more easily. Simply go through some data from a held-out training set and simply keep the trees that have higher mutual information relative to your target variable. If you don't have additional training data, you unlabeled data and use mutual information relative to your original model. Many of the trees in the random forest are simply emulating a random number generator and can be eliminated with no important loss of information.

answered Sep 24 '10 at 02:24

Ted%20Dunning's gravatar image

Ted Dunning
636815

If you have doubts about the efficacy of sampling trees from the ensemble, see Hernandez-Lobato et al. 2009. They give a clearer presentation of the computation for the binary classification case in Soto et al. 2010.

I suspect that the confidence bounds in the above papers will be tighter than the Chernoff bound Umar gives above, but that is just my intuition. Certainly the Chernoff bound will be easier to compute.

answered Sep 24 '10 at 15:47

Art%20Munson's gravatar image

Art Munson
64611316

What you seem to be after is described in this paper: Bucila et al, Model Compression. It's also unfortunately approximate, but you don't necessarily want an exact method here, or you'd be tied to the performance of the ensemble in the worst case.

answered Sep 20 '10 at 08:01

Alexandre%20Passos's gravatar image

Alexandre Passos ♦
2554154278421

Your answer
toggle preview

powered by OSQA

User submitted content is under Creative Commons: Attribution - Share Alike; Other things copyright (C) 2010, MetaOptimize LLC.