2
1

Quote from this blog post:

Taking all the features and chucking them into a Random Forest works surprisingly well on a variety of real-world problems. This is demonstrated more empirically in this paper. I'm very interested in domains such as CV and NLP where this doesn't hold true

This is also true in my experience, but why is it the case?

asked Oct 18 '13 at 21:39

Maarten's gravatar image

Maarten
1315613

edited Oct 21 '13 at 03:23


One Answer:

Since no one has answered the question so far, let me try and present my own thoughts.

Let's pick a specific task: text classification using bag-of-words. There will be a few features of high importance, but there is a long tail of features: either words which don't appear in a lot of documents or which only have a weak correlation with one of the classes. For instance the word 'you' may be slightly more likely to appear in spam, because they try to sell you something. There are more features than training examples. Quite the opposite of other problems where there is a complex relationship between a small number of features.

My theory is that random forest does theoretically work in these cases, but needs a lot more trees than what is computationally feasible to give good results. On the other hand linear classifiers work quite well and are more much efficient, because the dataset is easily linearly separable.

answered Feb 21 '14 at 23:55

Maarten's gravatar image

Maarten
1315613

edited Feb 22 '14 at 00:02

Your answer
toggle preview

powered by OSQA

User submitted content is under Creative Commons: Attribution - Share Alike; Other things copyright (C) 2010, MetaOptimize LLC.