# Mutual Information between two data sets with one item each

 1 I'm working on writing some code that compares clustering assignments to a set of gold standard labels that uses Adjusted Mutual Information. In some of the datasets we have, there is only one gold standard label (so all data points are given only that label) and several clustering solutions similarly create just one cluster. From a scoring perspective, these one cluster solutions should receive a perfect score since their assignments matched exactly with the gold standard scores, but Mutual Information and it's adjusted form, Adjusted Mutual Information, automatically score there results with a 0. They both result in a score of 0 because of the core computation in Mutual Information: ``````n_ij/ N * log ( (n_ij/N) / (n_i* * n_*j / N^2) ) `````` Where `n_ij` is the number of times events i and j happen together, `n_i*` counts the number of times event i happens, and `n_*j` counts the number of times event j happens, with N as the total number of all events. When there's just one label and one cluster, `n_ij` = `n_i*` = `n_*j` = `N`. which results in ``````1.0 * log (1.0) = 0 `````` In theory Mutual Information is supposed to measure the amount of knowledge gained/known about one distribution given another distrubtion, so a high Mutual Information indicates that the two distributions are equally informative of each other and a low Mutual Information indicates that the two are completely independent. With just one cluster and one label, the two situations become ambiguous, you have perfect information about the class labels given the clustering labels but at the same time, they are probabilisticly independent of each other; however MI and AMI both default to the later interpretation of independence. So my question is, is it rediculous to assign a MI/AMI score of 1.0 to the odd case where there's only one event for both distributions, i.e. one cluster and one gold standard label? The situation itself is kind of silly, but that's somewhat out of my hands. asked Feb 06 '12 at 14:54 Keith Stevens 621●6●13●27 This sounds awfully similar to the problem of Gene Regulatory Network inference using ARACNE (Feb 10 '12 at 02:30) Leon Palafox ♦ I am totally unfamiliar with this field. I mostly focus on Word Sense Induction. What happens with Gene Regulatory Network inference? Maybe there are interesting cross-overs we should both know about. (Feb 10 '12 at 12:33) Keith Stevens

 1 You said it yourself - Mutual information is the amount of information gained through having new knowledge. Knowing that a data point exists (as that is all you really have when you have one label) does not give you extra information (you already knew it existed). answered Feb 07 '12 at 00:36 Robert Layton 1595●12●25●37 Yes, it is just as I said. Still that somehow bothers me as a scoring metric for clustering, which is partially why Adjusted Mutual Information was designed. If this case described has a AMI of 0, then AMI would attribute any incorrect solution, say one that creates 2 clusters for whatever reason, with a higher score. From a practical standpoint of rating solutions that just feels wrong, even if it's correct in theory. In any case, thank you for confirming why the Mutual Information should in fact be 0. (Feb 07 '12 at 12:16) Keith Stevens I correct myself, When there is just one cluster OR one class, AMI always assigns a score of 0, regardless of anything else because both distributions are still independent of each other. (Feb 07 '12 at 12:27) Keith Stevens No problem. I do a lot of clustering myself, and constantly find that things are not well defined around the fringes. While they have theoretical values, there is often a problem in practise with some things as the number of instances decreases. (Feb 09 '12 at 23:00) Robert Layton Evaluating clustering solutions is always infinitely frustrating. I don't think i've seen any single metric handle any use case elegently, which becomes especially troublesome when you want to aggregate scores across a large number of datasets. AMI seems to be the best choice for the data i'm working with, so for the time being, i'm going to just special case this one situation and force the score to be one when there's one cluster and one class even though it's technically wrong. (Feb 10 '12 at 12:36) Keith Stevens
 toggle preview community wiki