|
I asked this over at CrossValidated and have six up-votes so far, but no answers, so thought I'd try over here... Say I've got a predictive classification model based on a random forest (using the randomForest package in R). I'd like to set it up so that end-users can specify an item to generate a prediction for, and it'll output a classification likelihood. So far, no problem. But it would be useful/cool to be able to output something like a variable importance graph, but for the specific item being predicted, not for the training set as a whole. Something like: Item X is predicted to be a Dog (73% likely) Because: Legs=4 Breath=bad Fur=short Food=nasty You get the point. Is there a standard, or at least justifiable, way of extracting this information from a trained random forest? Flipping each feature and dropping the |
|
This is a very interesting problem, and there are some classifier-independent approaches to it. I really like Baehrens et al How to explain individual classification decisions, but Åtrumbelj et al Explaining individual classifications using game theory is also interesting. Unfortunately I don't know of any library that prepackages these techniques. The Baehrens et al. paper is indeed quite interesting. I'm not sure if it's the best approach for a DF, at least in my case (with hundreds of variables), but I definitely like their definitions. Thanks!
(Apr 08 '11 at 15:35)
Harlan Harris
|
|
Here is a technique I use to perform EDA (exploratory data analysis) on the outputs of an ensemble of decision trees: The total scored is summed over individual trees. The tree score is the score at the leaf node that the example percolates down to. So, sort the tree scores from largest to smallest. Then, in decreasing order of tree score, output the feature path from root to leaf. This technique will show you which compound features contributed to highest weight to the total score. @JosephTurian, I am not sure I understand your comments. I would greatly appreciate if you can explain with an example.
(Nov 27 '12 at 19:28)
beejay
|
|
for model exploration of random forest I recommend the original paper by Breiman and Cutler. specifically look at the prototypes. |
I know this question is pretty old now but I need to solve the exact same problem. What did you end up doing here? Any pointers on your approach and its success would be greatly appreciated.