consider a binary classification task where negative labels are more frequent than positive labels, e.g., negative labels are 10 times more likely apriori. i have this belief that if i subsample the negative labeled instances so that i have a test set that is approximately balanced, and compute an AUC on the subsampled test set, that in expectation i will get the same AUC as if i computed AUC on the complete test set (i.e., with many more negatives than positives).
is this known to be true?