|
I have items for purchase and I have statistics how much each were purchased. I want to use this amount for ranking them. The simple guess is to Log(2 + num_purchased). I thinking about better formula for that, and most important I want to have some probabilistic background and meaning behind it. |
|
The simplest thing to do is point-wise ranking -- that is, estimate the probability of purchase for each item and list from highest to lowest. Since you'll have a different number of samples for each item you should adjust your estimates to account for your confidence in the estimate. A simple way to do this is form a, say, 95% confidence region about the mean (using, e.g. a Chernoff-Hoeffding bound) and replace the mean with the upper bound of the confidence region. This is essentially how "upper confidence bound" algorithms for the bandit problem work, and this problem is more-or-less what you're trying to solve. |
Naive question here, but why do you need anything other than frequency? The only different possibility I can think of is not number sold rather number of transactions in which the product was sold in (much like frequency versus document frequency in text mining).