1
1

I have a variation of spectral clustering I'd like to benchmark, but I'm not sure what's the best way to evaluate it. Is the standard distortion metric (used in the Ng, Jordan, and Weiss paper) meaningful when comparing (for example) algorithms that use the epsilon-close matrices versus k-nearest-neighbor matrices? If not, is there something standard, or must one fall back to extrinsic measures (such as evaluating a full system that depends on clustering and changing the clustering part)?

I'm not looking for a way to compare arbitrary clustering algorithms, as that's not really a good idea (as argued in the 2009 clustering nips workshop's opinion paper), so something specific to spectral algorithms will do. For example, if I keep the points but switch the matrix, is something like l2 distance between the smallest eigenvectors meaningful to represent the degradation of the algorithm when using that approximate matrix?

asked Sep 17 '10 at 16:47

Alexandre%20Passos's gravatar image

Alexandre Passos ♦
1893744214333

1

I have been trying to find a good distortion metric myself for sometime now... The popular papers on Spectral clustering through NCUT/AVG CUT etc have a Variation of Information measure. But i really dint think there is any one right way to do this.

(Sep 17 '10 at 18:24) kpx

I dont mean to hijack your question but I wanted to ask you. The graph Laplacian that you use is typically D^1/2. The spectral clustering using mincuts/avgcuts etc say that as the type of cut is varied this normalization changes. Do you consider that when you actually perform the clustering? They show that the clustering changes because of this and is some times better. Do you think the results are much different when(if) you tested this?

(Sep 17 '10 at 19:05) kpx

@kpx I thought the graph laplacian was D^1/2 A D^1/2 where A is the adjacency matrix (or the other definitions, when it's D-A or I - D^-1 A).

I've never tried many different variations of spectral clustering; to be honest, I'm a real noob in the area and I'm a bit afraid of wasting time on irrelevant experiments.

(Sep 18 '10 at 02:31) Alexandre Passos ♦

There was a typo in my earlier statement sorry for that. What i meant was that the Laplacian is obtained by multiplying W-D with D^1/2. There is a paper on Image segmentation using normalized cuts that uses the spectral clustering. Basically they say that Cuts in a graph is another view of spectral clustering. The normalized cuts reduce to The ordinary spectral clustering you were referring to. The other cuts like average and max reduce to something else.

Normalized cuts solve the eigen system: (D-W)x=aDx

Other cuts solve a variant of this. Average cuts solve: (D-W)x = ax

So they vary the D between 0 and 1/2 as a hyper-parameter cause one cant know where which cut would work best for what data...

(Sep 18 '10 at 04:29) kpx

That's the original spectral clustering paper, no? About these different matrices, you can find a discussion in section 8.4 of Ulrike Von Luxburg's tutorial on spectral clustering: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.84.1443&rep=rep1&type=pdf . She recommends using the random walk graph laplacian, I- D^-1 W for both theorectical and practical reasons.

(Sep 18 '10 at 08:10) Alexandre Passos ♦

Thanks for that!

(Sep 18 '10 at 21:57) kpx
showing 5 of 6 show all

One Answer:

Why not find a little bit of labeled data and use an external evaluation technique?

From the GLW opinion letter:

In our opinion, the major obstacle is the difficulty to evaluate a clustering algorithm without taking into account the context: why does the user cluster his data in the first place, and what does he want to do with the clustering afterwards?

It seems like the best way to answer these questions are either 1. have a user cluster the data or 2. use an extrinsic evaluation like you mentioned in the question. Extrinsic measures have an inherently attractive quality -- you can be sure that the improvement to the clustering component of your system corresponds to improved performance overall. Though it's more taxing if you only want to look at clustering, and you dont have an extrinsic system available for evaluation.

There are a fair variety of external techniques available, with a relatively small amount of manually labeled data. Amigo et al. has a nice review of some popular techniques -- though I'm not convinced by the claim of universality of the "Rag Bag" criteria. Meila's Variation of Information that you alluded to is great, but has a few issues with normalization in comparing across different data sets and different numbers of clusters. Full disclosure, I have a personal bias towards V-Measure -- but even this is imperfect -- it has a (IMHO justifiable) bias to prefer clustering solutions that make one cluster per data point over those which include ever data point in a single cluster. The Amigo et al paper concludes that B-cubed is the most attractive evaluation method. I haven't yet looked at it particularly closely, so it's worth a shot.

In short, there isn't a universally accepted technique to evaluate clustering approaches, though there are many contenders and a many approaches which can be discounted for a variety of reasons.

answered Sep 18 '10 at 08:42

Andrew%20Rosenberg's gravatar image

Andrew Rosenberg
156252135

Thanks for the references, I'll check them out.

My problem setting is a bit easier than this: I have an approximation to spectral clustering I'd like to test, and I'd like to see how much it "degrades" the cluster quality compared to doing the "right" thing.

But yes, I think I'm going to set up an extrinsic test and evaluate this approximation versus some heuristics to see what happens.

(Sep 18 '10 at 09:37) Alexandre Passos ♦
1

The part that is tough in that is defining "the cluster quality". That's what these external measures attempt to do, with the aid of labeled data. A full extrinsic evaluation is even more task specific, and requires more engineering to get off the ground.

(Sep 18 '10 at 10:23) Andrew Rosenberg

By the way I found a useful thing in Ng's original paper about also monitoring the eigen gap for noise. I mean is the clustering any good because noise can upset this algorithm. For me when I cluster gene expression data one deos not really know the true clusters, so two questions always are important as far as clustering Quality goes...

1) What is the comparison between the clusterings when we vary the number of clusters? Is 5-clusters better than 6? What is a measure for this? Maybe Variation of Information can help here.. Or V-Measures as Andrew suggests.

2) How much noise in the data can upset my clustering? How can I measure this.. i know Variation of information can help here... But what about using Eigen gaps?

If we have an answer to those questions, i think we can be a bit more specific on what is a good clustering... no? IMHO this may be the right way to go... Please correct me if you feel this is wrong...?

(Sep 18 '10 at 21:54) kpx

@Alexandre I had asked last time if the above approach seemed reasonable... I have not had the time to test it out, but I will in this week. Let me know what results you got with your clustering experiements, if any. -Tks

(Sep 20 '10 at 23:04) kpx

It seems reasonable and I think I'm going to run my experiments on a subset of MNIST digits and use v-measure to evaluate. I'll let you know when I get to it.

(Sep 21 '10 at 04:45) Alexandre Passos ♦

To add to this, I would suggest Adjusted Mutual Information (jmlr.csail.mit.edu/papers/volume11/vinh10a/vinh10a.pdf) as a metric when evaluating labeled data points. They make some reasonable claims that it is less biased by systems with more/less clusters. Both the F-Score and V-Measure have such biases. They also provide matlab code for doing the AMI evaluation, you can also code it yourself but it's a little painful to do efficiently.

(Jan 25 at 18:51) Keith Stevens

Adjusted Rand Index (ARI) is another clustering metric for the "supervised" case that is adjusted for chance as AMI. If find it much easier to compute it efficiently on large evaluation sets.

Also for V-Measure (noted VM in the following) and the clustering pairs F-score, you can always adjust their value for your clustering C against chance empirically: generate some random clustering RC and then compute the adjusted variant by doing: AVM = (VM(C) - E[VM(RC)]) / (1 - E[VM(RC)]).

VM(RC) is a random variable and can be approximated by the empirical mean of VM for several random clusterings.

(Jan 26 at 05:30) ogrisel
showing 5 of 7 show all
Your answer
toggle preview

powered by OSQA

User submitted content is under Creative Commons: Attribution - Share Alike; Other things copyright (C) 2010, MetaOptimize LLC.