|
I am interested in comparing ranked lists and quantifying how "similar" the rankings are. What are some good scores/measures to do that? The rankings are produced by different runs of a system. On each run, it is given n instances and assigns a score to each, resulting in a ranking of the instances. But there is non-determinism, so the system comes up with slightly different results each time. I'm not so much interested in the actual scores that were assigned to the instances, but I want to evaluate the order that is produced on the different runs. |
|
If all positions in the rank are equally important the standard measure is Kendall's tau statistic, which counts the number of inversions between two permuations. This goes from 0 to O(n²). It can be defined as a metric between permutations. If you really want contiguous blocks of elements (but don't care if the blocks themselves are in the wrong order), you can try a variant of kendall's tau that only penalizes adjacent swaps. I'm not sure this is a metric, however. If only the top k positions are of interest you can use precision@k (precision until it returns the top k ranked elements) or recall@k (among the top k returned elements, how many are atually among the top k elements). You can also compute all values of precision@k and recall@k and compute a precision-recall curve, and compute the area under that. If your elements are weighted you can use NDCG or ERR, possibly truncating at k. These can be used to evaluate an ordering as long as the true ordering is scored, but not the ordering being evaluated. |
|
If you want a probability distribution of rankings given the correct one, Mallow's model is a pretty standard one. you can find his original paper on it here . |
|
Further to Alex's comment, there is also Spearman's correlation and Gamma, which are non-parametric correlation metrics (Kendall's tau is one as well). A description can be found here. I use these to compare word lists (which are definitely not Gaussian, so Pearson's can't be used). |