I have items for purchase and I have statistics how much each were purchased. I want to use this amount for ranking them. The simple guess is to Log(2 + num_purchased). I thinking about better formula for that, and most important I want to have some probabilistic background and meaning behind it.

asked Dec 11 '11 at 00:17

yura's gravatar image

yura
1025374854

Naive question here, but why do you need anything other than frequency? The only different possibility I can think of is not number sold rather number of transactions in which the product was sold in (much like frequency versus document frequency in text mining).

(Dec 11 '11 at 16:47) Robert Layton

One Answer:

The simplest thing to do is point-wise ranking -- that is, estimate the probability of purchase for each item and list from highest to lowest. Since you'll have a different number of samples for each item you should adjust your estimates to account for your confidence in the estimate. A simple way to do this is form a, say, 95% confidence region about the mean (using, e.g. a Chernoff-Hoeffding bound) and replace the mean with the upper bound of the confidence region. This is essentially how "upper confidence bound" algorithms for the bandit problem work, and this problem is more-or-less what you're trying to solve.

answered Dec 13 '11 at 10:42

Noel%20Welsh's gravatar image

Noel Welsh
72631023

Your answer
toggle preview

powered by OSQA

User submitted content is under Creative Commons: Attribution - Share Alike; Other things copyright (C) 2010, MetaOptimize LLC.