|
Say I have a program performing some statistical inference like
How can I explain to the end user in normal English why those links were suggested and why were they ranked this way. In a rules based system it may be easier to do and I can comprehend how to do it. But for something which gives me a probability distribution or a score based on combination of factors - how can I do it. |
|
Transform your representation of the underlying preference (which might be some vector) into a linguistic representation. This linguistic representation can be a term-weight vector. Display the terms that are the highest weight. Why did you get recommended this song? Because you have a high weight for the "electro" component in your term-weight vector, and the song also has a high "electro" weight. |
|
Do you use any kind of metric proximity value? perhaps similar to Netflix and Youtube recommendation system you can say ("because you liked x we suggest you see y") That's another good idea. To be more precise, you can remove redundancy by trying to (sparsely) reconstruct the user preference vector on the basis of contributions from individual songs. That way, you don't include multiple redundant songs in your explanation. Just the most explanatory one.
(Jul 08 '10 at 03:02)
Joseph Turian ♦♦
|
|
This is just one instance of a general machine learning challenge: How do we explain the predictions of an automated system in a way that humans can understand? Neural network weights, for instance, are rarely comprehensible. Decision trees are somewhat more so, since there's a simple series of tests that led to the prediction. This is also true of rule sets. For linear classifiers, listing all the features that contributed to the decision would be impractical, but listing the highest-weight features could be informative. For instance, a spam filter could explain its classification by saying: "This email was marked as spam because it contained the words mortgage and cheap and was sent by an unknown sender." Sometimes understanding the prediction may be more important than obtaining the absolute highest accuracy, since it allows a user to integrate the prediction with their other knowledge. In such cases, it may be worth using a less accurate but easier to communicate model, such as Amazon.com's "Users who bought x also bought y," and "Users who viewed x also viewed y," etc. One would expect the best predictions to combine information about everything you've bought, viewed, and rated, but focusing on one dimension at a time may provide a better browsing experience. If you want to use a complex model with many hard-to-explain components, you can still use a simpler model to generate the explanations. For instance, if you find that 2-layer neural networks give the best accuracy on your problem, you can use the neural network to generate the predictions and a perceptron or decision tree to explain them. (This should work well when the complex and simple model agree; when they disagree, it may be a bit trickier.) Overall, this is a hard problem and an active area of research. Here's one recent paper on explaining recommendations using tags: Vig et al., IUI'09. "Tagsplanations: Explaining Recommendations Using Tags" Hope this helps! |
|
Also, if your model is nonlinear and/or not directly based on a distance measure (boosted classifiers, for example), and you want to tell the user what are his most important features that point towards that, you can try to perturb the user feature vector by zeroing some of his features and report the ones that affect the result the most. This allows you to make justifications that will be different in an item by item basis and can pick up low-module features. If you can, you should maybe compute the gradient of the prediction over the user features and pick features with some high combined gradient and value. This can even allow you to say "we recommended this to you because you don't like X". Replace this by feature pairs if you expect your model to deal with them well. |