|
After a clustering process a training dataset, I want to evaluate/cross-validate the clustering result using a test dataset with the associated labels, by computing for example the v-measure or f-measure ... What is the best way to label the obtained representatives (clusters) using the test dataset labels, in order to evaluate this results ? |
|
Just to make sure I have the question correct: You have a training dataset with no class values, that has been clustered. You have a test dataset with the actual classes and want to use that to see how good your clusters were. The evaluation metric will (or at least should) take care of the labels for you. That is probably the main difference between supervised (which expect the same labels) and unsupervised (which expect internal consistency with labels) evaluations. So to answer your question, you use the cluster labels assigned by your clustering algorithm. Any good clustering evaluation will take care of the rest. Let me ask the question differently: suppose you have a training and testing dataset (both with the associated labels). We apply a clustering algorithm on the training set (e.g. using k-means). How can you compute the "recognition rate" by projecting the test dataset on the obtained centers ?
(May 16 '12 at 12:29)
shn
|