I think I may have found a solution to my problem, even though it may not completely help for comparison with existing precision/recall curves. I have actually come to the conclusion that P/R can't nicely cope multiple labels, and even less with unbalanced classes.
However, I think my solution has nice properties, mostly that it can be expressed in terms of precision and recall too (and not as an ROC, for instance).
First of all, imagine that any IR system's result actually fits between a best case scenario and a worst case scenario. Best case is what I have described in the question. Worst case would be this :
Q1 : 0 0 0 0 1
Q2 : 0 1 1 1 1
What I propose is simply that precision, instead of being expressed without knowledge of the classes and their arity, should be expressed relative to the worst case/best case scenarii.
With this in mind, we just have to compute the cumulative sum of best/worst case :
Best : 2 3 4 5 5
Worst : 0 1 2 3 5
By construction, Best(N-1) == Worst(N-1)
And express
- Precision(k) as (TP(k)-Worst(k))/(Best(k)-Worst(k)), for k in [0 .. N-2]
- Recall is simply TP(k)/Worst(N-1)
With such a solution, the PR curve of the best case scenario would be a flat 100% line.
Another nice property of this solution is that best case and worst case scenario can be computed even with non-binary results. So a scalar product implementation of the document/document distance based on the class vector could be used.
