2
1

I am trying out micro-/macro-averaging for precision/recall/f1 in multi-class classification problems.

As defined in Scholarpedia:

In multi-label classification, the simplest method for computing an aggregate score across categories is to average the scores of all binary task. The resulted scores are called macro-averaged recall, precision, F1, etc. Another way of averaging is to sum over TP, FP, TN, FN and N over all the categories first, and then compute each of the above metrics. The resulted scores are called micro-averaged. Macro-averaging gives an equal weight to each category, and is often dominated by the system’s performance on rare categories (the majority) in a power-law like distribution. Micro-averaging gives an equal weight to each document, and is often dominated by the system’s performance on most common categories. The two ways of measuring performance are complementary to each other, and both are informative

I followed examples in chapter 8 LingPipe book. (You may not need to open it, the example below is self explained.)

To my surprise, under micro-averaging, precision seems always equal to recall.

E.g. a 3-class problem with confusion matrix below:

[[9 3 0]
 [3 5 1]
 [1 1 4]]

Class by class, the confusion matrices are:

[[9  3] 
 [4 11]]

[[5 4] 
 [4 14]]

[[4 2] 
 [1 20]]

Then added these three into the micro-averaged confusion matrix:

[[18 9] 
 [9 45]]

So the precision and recall are the same.

Also, in the micro-averaged confusion matrix, the upper right (FN) and bottom left (FP) seems are all adding the upper right parts and bottom left parts of the original 3d confusion matrix, so looks like they will always be the same no matter what are the inputs and original confusion matrix.

I wonder, is this calculation right? Is it true for multi-class classification, micro-averaging always generates same precision and recall? If so, why people use this?

Thanks!


Disclaimer: this is a duplication of same question posted on stats.SE.

asked Dec 01 '11 at 03:29

flake's gravatar image

flake
31245

And to take this a step further, would it be true that P = R = F1-Score = Accuracy if everything must be labelled and there are no TN?

(May 09 '14 at 17:04) GregWerner

2 Answers:

You are right, micro-precision will be equal to micro-recall and to F-micro. This is because in your case each sample is always classified to belong to one of classes. If you would have some samples which are not classified to belong to any of known classes, the micro precision would differ from micro-recall and both of them would not be the same as F-micro.

answered Jan 24 '12 at 22:13

Vadim%20Zaliva's gravatar image

Vadim Zaliva
161

I figured out simple proof of your claim, ie. precision and recall are the same. Since Precision = TP/(TP+FP), Recall = TP/(TP+FN), the real question is whether FP=FN.

Let's say we have N classes, so the confusion matrix has N rows and N columns. Let x[i,j] be the cell in i-th row and j-th column. Following your example, we can find the micro-averaged value of FP by summing the FPs class by class (and the same reasoning applies to FN as well). Let FPi (FNi) be the FP (FN) value for class i. We can see that:

FP_i = klzzwxh:0016um_{j=1, j klzzwxh:0017eq i}^{N} x[i,j]

FN_i = klzzwxh:0019um_{k=1, k klzzwxh:0020eq i}^{N} x[k,i]

So we have:

FP = klzzwxh:0022um_{i=1}^{N} FP_i = klzzwxh:0023um_{i=1}^{N} klzzwxh:0024um_{j=1, j klzzwxh:0025eq i}^{N} x[i,j]

FN = klzzwxh:0027um_{i=1}^{N} FN_i = klzzwxh:0028um_{i=1}^{N} klzzwxh:0029um_{k=1, k klzzwxh:0030eq i}^{N} x[k,i]

Hmm, the sums look very similar. Can we make the second one to look exactly as the first one? First of all, let's rename the variables in the second (FN) sum, so we have variable i in the first place and j in the second (it's just changing k to i and i to j).

FN = klzzwxh:0038um_{j=1}^{N} klzzwxh:0039um_{i=1, i klzzwxh:0040eq j}^{N} x[i,j]

Now let's add zero to the sum - we won't change the value, just the appearance:

![FN = sum_{j=1}^{N} sum_{i=1, i neq j}^{N} x[i,j] + sum_{j=1}^{N} x[j,j] - sum_{j=1}^{N} x[j,j]][6]

We can join the double sum and the "plus" sum.

![FN = sum_{j=1}^{N} (sum_{i=1, i neq j}^{N} x[i,j] + x[j,j]) - sum_{j=1}^{N} x[j,j]][7]

![FN = sum_{j=1}^{N} sum_{i=1}^{N} x[i,j] - sum_{j=1}^{N} x[j,j]][8]

Now we change the order of summation in the double sum and rename the j variable to i in the second sum.

![FN = sum_{i=1}^{N} sum_{j=1}^{N} x[i,j] - sum_{i=1}^{N} x[i,i]][9]

It's pretty straightforward from there, just join the sums again.

![FN = sum_{i=1}^{N} (sum_{j=1}^{N} x[i,j] - x[i,i])][10]

FN = klzzwxh:0061um_{i=1}^{N}klzzwxh:0062um_{j=1,j klzzwxh:0063eq i}^{N} x[i,j]

And we're done, FN is exactly the same expression as FP, Q.E.D.

But I don't know "why people use it", I just kinda liked this "excercise".

answered Dec 03 '11 at 12:18

Peter%20Kov%C3%A1%C4%8D's gravatar image

Peter Kováč
462

Your answer
toggle preview

powered by OSQA

User submitted content is under Creative Commons: Attribution - Share Alike; Other things copyright (C) 2010, MetaOptimize LLC.