After a clustering process a training dataset, I want to evaluate/cross-validate the clustering result using a test dataset with the associated labels, by computing for example the v-measure or f-measure ...

What is the best way to label the obtained representatives (clusters) using the test dataset labels, in order to evaluate this results ?

asked Feb 28 '12 at 06:38

shn's gravatar image

shn
462414759

edited Feb 28 '12 at 10:32


One Answer:

Just to make sure I have the question correct: You have a training dataset with no class values, that has been clustered. You have a test dataset with the actual classes and want to use that to see how good your clusters were.

The evaluation metric will (or at least should) take care of the labels for you. That is probably the main difference between supervised (which expect the same labels) and unsupervised (which expect internal consistency with labels) evaluations.

So to answer your question, you use the cluster labels assigned by your clustering algorithm. Any good clustering evaluation will take care of the rest.

answered Feb 28 '12 at 16:50

Robert%20Layton's gravatar image

Robert Layton
1625122637

Let me ask the question differently: suppose you have a training and testing dataset (both with the associated labels). We apply a clustering algorithm on the training set (e.g. using k-means). How can you compute the "recognition rate" by projecting the test dataset on the obtained centers ?

(May 16 '12 at 12:29) shn
Your answer
toggle preview

powered by OSQA

User submitted content is under Creative Commons: Attribution - Share Alike; Other things copyright (C) 2010, MetaOptimize LLC.