|
I asked this over at CrossValidated and have six up-votes so far, but no answers, so thought I'd try over here... Say I've got a predictive classification model based on a random forest (using the randomForest package in R). I'd like to set it up so that end-users can specify an item to generate a prediction for, and it'll output a classification likelihood. So far, no problem. But it would be useful/cool to be able to output something like a variable importance graph, but for the specific item being predicted, not for the training set as a whole. Something like: Item X is predicted to be a Dog (73% likely) Because: Legs=4 Breath=bad Fur=short Food=nasty You get the point. Is there a standard, or at least justifiable, way of extracting this information from a trained random forest? Flipping each feature and dropping the |
|
This is a very interesting problem, and there are some classifier-independent approaches to it. I really like Baehrens et al How to explain individual classification decisions, but Åtrumbelj et al Explaining individual classifications using game theory is also interesting. Unfortunately I don't know of any library that prepackages these techniques. The Baehrens et al. paper is indeed quite interesting. I'm not sure if it's the best approach for a DF, at least in my case (with hundreds of variables), but I definitely like their definitions. Thanks!
(Apr 08 '11 at 15:35)
Harlan Harris
|
|
Here is a technique I use to perform EDA (exploratory data analysis) on the outputs of an ensemble of decision trees: The total scored is summed over individual trees. The tree score is the score at the leaf node that the example percolates down to. So, sort the tree scores from largest to smallest. Then, in decreasing order of tree score, output the feature path from root to leaf. This technique will show you which compound features contributed to highest weight to the total score. |