Revision history[back]
click to hide/show revision 1
Revision n. 1

Oct 06 '11 at 10:59

Guillaume%20Pitel's gravatar image

Guillaume Pitel
176148

I think I may have found a solution to my problem, even though it may not completely help for comparison with existing precision/recall curves. I have actually come to the conclusion that P/R can't nicely cope multiple labels, and even less with unbalanced classes.

However, I think my solution has nice properties, mostly that it can be expressed in terms of precision and recall too (and not as an ROC, for instance).

First of all, imagine that any IR system's result actually fits between a best case scenario and a worst case scenario. Best case is what I have described in the question. Worst case would be this :

Q1 : 0 0 0 0 1

Q2 : 0 1 1 1 1

What I propose is simply that precision, instead of being expressed without knowledge of the classes and their arity, should be expressed relative to the worst case/best case scenarii.

With this in mind, we just have to compute the cumulative sum of best/worst case :

Best : 2 3 4 5 5

Worst : 0 1 2 3 5

By construction, Best(N-1) == Worst(N-1)

And express

  • Precision(k) as (TP(k)-Worst(k))/(Best(k)-Worst(k)), for k in [0 .. N-2]
  • Recall is simply TP(k)/Worst(N-1)

With such a solution, the PR curve of the best case scenario would be a flat 100% line.

Another nice property of this solution is that best case and worst case scenario can be computed even with non-binary results. So a scalar product implementation of the document/document distance based on the class vector could be used.

click to hide/show revision 2
Revision n. 2

Oct 06 '11 at 11:03

Guillaume%20Pitel's gravatar image

Guillaume Pitel
176148

I think I may have found a solution to my problem, even though it may not completely help for comparison with existing precision/recall curves. I have actually come to the conclusion that P/R can't nicely cope with multiple labels, and even less with unbalanced classes.

However, I think my solution has nice properties, mostly that it can be expressed in terms of precision and recall too (and not as an ROC, for instance).

First of all, imagine that any IR system's result actually fits between a best case scenario and a worst case scenario. Best case is what I have described in the question. Worst case would be this :

Q1 : 0 0 0 0 1

Q2 : 0 1 1 1 1

What I propose is simply that precision, instead of being expressed without knowledge of the classes and their arity, should be expressed relative to the worst case/best case scenarii.

With this in mind, we just have to compute the cumulative sum of best/worst case :

Best : 2 3 4 5 5

Worst : 0 1 2 3 5

By construction, Best(N-1) == Worst(N-1)

And express

  • Precision(k) as (TP(k)-Worst(k))/(Best(k)-Worst(k)), for k in [0 .. N-2]
  • Recall is simply TP(k)/Worst(N-1)

With such a solution, the PR curve of the best case scenario would be a flat 100% line.

Another nice property of this solution is that best case and worst case scenario can be computed even with non-binary results. So a scalar product implementation of the document/document distance based on the class vector could be used.

click to hide/show revision 3
Revision n. 3

Oct 07 '11 at 08:10

Guillaume%20Pitel's gravatar image

Guillaume Pitel
176148

I think I may have found a solution to my problem, even though it may not completely help for comparison with existing precision/recall curves. I have actually come to the conclusion that P/R can't nicely cope with multiple labels, and even less with unbalanced classes.

However, I think my solution has nice properties, mostly that it can be expressed in terms of precision and recall too (and (which is not as an the case of ROC, for instance).

First of all, imagine that any IR system's result actually fits between a best case scenario and a worst case scenario. Best case is what I have described in the question. Worst case would be this :

Q1 : 0 0 0 0 1

Q2 : 0 1 1 1 1

What I propose is simply that precision, instead of being expressed without knowledge of the classes and their arity, should be expressed relative to the worst case/best case scenarii.

With this in mind, we just have to compute the cumulative sum of best/worst case :

Best : 2 3 4 5 5

Worst : 0 1 2 3 5

By construction, Best(N-1) == Worst(N-1)

And express (EDIT : this does work well)

  • Precision(k) as (TP(k)-Worst(k))/(Best(k)-Worst(k)), for k in [0 .. N-2]
  • Recall is simply TP(k)/Worst(N-1)

With such a solution, the PR curve of the best case scenario would be a flat 100% line.

Another nice property of this solution is that best case and worst case scenario can be computed even with non-binary results. So a scalar product implementation of the document/document distance based on the class vector could be used.

Example on reuters-21578, the Best(red), Worst(green), and random baseline(cyan) are drawn together with the TP count (blue)

Test method on reuters

click to hide/show revision 4
sorry forgot a NOT

Oct 11 '11 at 02:44

Guillaume%20Pitel's gravatar image

Guillaume Pitel
176148

I think I may have found a solution to my problem, even though it may not completely help for comparison with existing precision/recall curves. I have actually come to the conclusion that P/R can't nicely cope with multiple labels, and even less with unbalanced classes.

However, I think my solution has nice properties, mostly that it can be expressed in terms of precision and recall too (which is not the case of ROC, for instance).

First of all, imagine that any IR system's result actually fits between a best case scenario and a worst case scenario. Best case is what I have described in the question. Worst case would be this :

Q1 : 0 0 0 0 1

Q2 : 0 1 1 1 1

What I propose is simply that precision, instead of being expressed without knowledge of the classes and their arity, should be expressed relative to the worst case/best case scenarii.

With this in mind, we just have to compute the cumulative sum of best/worst case :

Best : 2 3 4 5 5

Worst : 0 1 2 3 5

By construction, Best(N-1) == Worst(N-1)

And express (EDIT : this does NOT work well)

  • Precision(k) as (TP(k)-Worst(k))/(Best(k)-Worst(k)), for k in [0 .. N-2]
  • Recall is simply TP(k)/Worst(N-1)

With such a solution, the PR curve of the best case scenario would be a flat 100% line.

Another nice property of this solution is that best case and worst case scenario can be computed even with non-binary results. So a scalar product implementation of the document/document distance based on the class vector could be used.

Example on reuters-21578, the Best(red), Worst(green), and random baseline(cyan) are drawn together with the TP count (blue)

Test method on reuters

powered by OSQA

User submitted content is under Creative Commons: Attribution - Share Alike; Other things copyright (C) 2010, MetaOptimize LLC.