Say I have a program performing some statistical inference like

  • Suggesting me songs e.g. pandora
  • Finding friends for me e.g. facebook
  • Giving me search results

How can I explain to the end user in normal English why those links were suggested and why were they ranked this way.

In a rules based system it may be easier to do and I can comprehend how to do it. But for something which gives me a probability distribution or a score based on combination of factors - how can I do it.

asked Jul 08 '10 at 01:59

ashish's gravatar image

ashish
150347


4 Answers:

Transform your representation of the underlying preference (which might be some vector) into a linguistic representation. This linguistic representation can be a term-weight vector. Display the terms that are the highest weight.

Why did you get recommended this song? Because you have a high weight for the "electro" component in your term-weight vector, and the song also has a high "electro" weight.

answered Jul 08 '10 at 02:23

Joseph%20Turian's gravatar image

Joseph Turian ♦♦
579051125146

Do you use any kind of metric proximity value? perhaps similar to Netflix and Youtube recommendation system you can say ("because you liked x we suggest you see y")

answered Jul 08 '10 at 02:24

Mark%20Alen's gravatar image

Mark Alen
1323234146

That's another good idea. To be more precise, you can remove redundancy by trying to (sparsely) reconstruct the user preference vector on the basis of contributions from individual songs. That way, you don't include multiple redundant songs in your explanation. Just the most explanatory one.

(Jul 08 '10 at 03:02) Joseph Turian ♦♦

This is just one instance of a general machine learning challenge: How do we explain the predictions of an automated system in a way that humans can understand? Neural network weights, for instance, are rarely comprehensible. Decision trees are somewhat more so, since there's a simple series of tests that led to the prediction. This is also true of rule sets. For linear classifiers, listing all the features that contributed to the decision would be impractical, but listing the highest-weight features could be informative. For instance, a spam filter could explain its classification by saying: "This email was marked as spam because it contained the words mortgage and cheap and was sent by an unknown sender."

Sometimes understanding the prediction may be more important than obtaining the absolute highest accuracy, since it allows a user to integrate the prediction with their other knowledge. In such cases, it may be worth using a less accurate but easier to communicate model, such as Amazon.com's "Users who bought x also bought y," and "Users who viewed x also viewed y," etc. One would expect the best predictions to combine information about everything you've bought, viewed, and rated, but focusing on one dimension at a time may provide a better browsing experience.

If you want to use a complex model with many hard-to-explain components, you can still use a simpler model to generate the explanations. For instance, if you find that 2-layer neural networks give the best accuracy on your problem, you can use the neural network to generate the predictions and a perceptron or decision tree to explain them. (This should work well when the complex and simple model agree; when they disagree, it may be a bit trickier.)

Overall, this is a hard problem and an active area of research. Here's one recent paper on explaining recommendations using tags:

Vig et al., IUI'09. "Tagsplanations: Explaining Recommendations Using Tags"

Hope this helps!

answered Jul 08 '10 at 03:38

Daniel%20Lowd's gravatar image

Daniel Lowd
28155

Also, if your model is nonlinear and/or not directly based on a distance measure (boosted classifiers, for example), and you want to tell the user what are his most important features that point towards that, you can try to perturb the user feature vector by zeroing some of his features and report the ones that affect the result the most. This allows you to make justifications that will be different in an item by item basis and can pick up low-module features. If you can, you should maybe compute the gradient of the prediction over the user features and pick features with some high combined gradient and value. This can even allow you to say "we recommended this to you because you don't like X".

Replace this by feature pairs if you expect your model to deal with them well.

answered Jul 08 '10 at 06:56

Alexandre%20Passos's gravatar image

Alexandre Passos ♦
2554154278421

Your answer
toggle preview

powered by OSQA

User submitted content is under Creative Commons: Attribution - Share Alike; Other things copyright (C) 2010, MetaOptimize LLC.