|
Quote from this blog post:
This is also true in my experience, but why is it the case? |
|
Since no one has answered the question so far, let me try and present my own thoughts. Let's pick a specific task: text classification using bag-of-words. There will be a few features of high importance, but there is a long tail of features: either words which don't appear in a lot of documents or which only have a weak correlation with one of the classes. For instance the word 'you' may be slightly more likely to appear in spam, because they try to sell you something. There are more features than training examples. Quite the opposite of other problems where there is a complex relationship between a small number of features. My theory is that random forest does theoretically work in these cases, but needs a lot more trees than what is computationally feasible to give good results. On the other hand linear classifiers work quite well and are more much efficient, because the dataset is easily linearly separable. |