In context of item recommender systems, I'm having some trouble expressing the prediction computation step. For example, I refer to the following paragraph Amazon.com's paper:

Given a similar-items table, the algorithm finds items similar to each of the user’s purchases and ratings, aggregates those items, and then recom- mends the most popular or correlated items. This computation is very quick, depending only on the number of items the user purchased or rated.

I'm looking only at browsing history, so the item-user relationship is binary. Once the item similarities are computed using one of the well known formulas, it's possible to get "top N items" that are most similar to each item. So, if a user browses, say, three items, it's possible to aggregate together a list of 3N items. Do you just order the list according to item popularity?

Furthermore, how do you test the scoring function on just browsing data?

Sarwar's paper describes the prediction computation step at a weighted sum (Section 3.2). But the eperiments described in those paper use movie ratings. Does that prediction step also translate to the case when only browsing data is available?

asked Jul 06 '12 at 18:23

Saurabh's gravatar image

Saurabh
31225


One Answer:

The aggregation you describe is a classic item/item collaborative filter CF and, as you mention, at it's base form it's very simple.

score = {}
for seed in seeds:
  for candidate in sims[seed]:
    score[candidate] += sims[seed][candidate]
# pick top N from `score`

The "real" secret sauce part of a CF is how you built sims. How do you ensure the item/item sim space is dense enough you have reasonable sims for each item? How do you ensure that it's not so dense that the online performance of the dot product loop (pseudo code above) is slow?

This straight up summing of the sims has some problems though with diversity and biases around the number of contributors...

Say you have items A, B, C, D, E, F and sims(A,E)=0.2, sims(B,E)=0.3, sims(C,E)=0.2 and sims(D,F)=0.6. With seeds A,B,C,D then E is going to be scored 0.7 whereas F is going to be scored 0.6. E is considered better even though F was actually the most similar single item. Luckily there are lots of ways to dampen this; either directly by using a scaling function of number of contributors (even something simple like log(#contribs)) or by a fancier mechanism like maximal marginal relevancy [1] (I've found MMR gives reasonable results)

Evaluation of the CF is also a bit of a black art. I've always found that phrasing the recommendations as a kind of search problem has been useful since there are lots of approaches to evaluating search results (and I come from a search background originally)

Lots to play with! Mat

[1] http://users.cecs.anu.edu.au/~sguo/sigir2010.pdf

answered Jul 06 '12 at 19:27

mat%20kelcey's gravatar image

mat kelcey
1861410

Your answer
toggle preview

powered by OSQA

User submitted content is under Creative Commons: Attribution - Share Alike; Other things copyright (C) 2010, MetaOptimize LLC.