When creating decision trees in random forest, what happens if we select only the most informative attributes? Does it improve the performance of random forest?

asked Nov 01 '12 at 04:07

khawar's gravatar image

khawar
1445

edited Nov 01 '12 at 10:11

Rob%20Renaud's gravatar image

Rob Renaud
724111931


One Answer:

Generally with ensemble methods, there is a tradeoff between diversity and accuracy in each ensemble member. In the limit, you could imagine making every ensemble member as accurate as possible, but then they'd all be the same, and you'd actually get no advantage from adding extra trees. On the other hand, if all the ensembles all got the correct answer only 60% of the time, but in a way that was independent of every other tree, in the limit of infinite trees, the ensemble would always be correct. With just 15 trees, you'd be at 96% accuracy.

The reason that Random Forests select the best feature from a random subset of features at each node is so that different trees can be very different from each other. It's really a tradeoff between the accuracy and diversity of the tree members.

answered Nov 01 '12 at 10:09

Rob%20Renaud's gravatar image

Rob Renaud
724111931

edited Nov 01 '12 at 10:10

Your answer
toggle preview

powered by OSQA

User submitted content is under Creative Commons: Attribution - Share Alike; Other things copyright (C) 2010, MetaOptimize LLC.