Mean Shift clustering (see also Mean Shift: A Robust Approach to Feature Space Analysis) is able to produce clusters with shapes that depend upon the topology of the data and does not need an a priori estimate of the number of clusters to find.

K-means on the other-hand assumes the isotropy of the clusters and need to be taught the number of clusters to extract in advance.

Still, K-Means seams to the de-facto baseline clustering algorithm. Are there reasons (such as for instance: computational efficiency, memory requirements, robustness) to prefer using k-means instead of Mean Shift? How do you explain k-means is so popular and mean shift less known?

asked Jul 06 '10 at 20:20

ogrisel's gravatar image


4 Answers:

I just found the answer to my own question in this very interesting blog post. To summarize: k-means might be simpler to implement, has a complexity that is linear w.r.t. the number of samples while Mean Shift is quadratic, and further more Mean Shift was long introduced as tool a tailored for computer vision tasks and hence not disseminated in the machine learning community at large.

answered Jul 06 '10 at 20:28

ogrisel's gravatar image


I've always used k-means and set k=n when I didn't know k ahead of time (+some center merging criteria, which seems to come down to h in eq. 1), but it's pretty expensive. This is probably a more powerful algorithm, but k-means is a lot simpler--I imagine that's why. There's reasonably complex math (at least, non-intuitive), and you have to choose h. Everyone learns bubble sort, even though nobody uses it in practice (although given, everyone learns quicksort & mergesort too).

answered Jul 06 '10 at 20:42

sbirch's gravatar image


edited Jul 06 '10 at 20:45

Mean shift has a variety of problems that are somewhat complementary to the ones k-means has. For example, instead of having to pick the number of clusters, you have to pick the kernel. For some kernels with finite support, the kernel can get stuck in a region of uniform density. If you had two big spheres of uniformly distributed data points, k-means would be able to separate these but a mean shift kernel using uniform weight over a small sphere would not.

It's not exactly true to say that the runtime of k-means is linear or mean shift is quadratic. That's just for one iteration of the algorithm, but what you really care about is the convergence time, which requires a difficult to predict number of iterations. I don't know of any theoretical analysis of the runtime of mean shift, but k-means is actually superpolynomial: see How Slow is the k-Means Method? by David Arthur & Sergei Vassilvitskii

answered Jul 26 '10 at 14:06

Ian%20Goodfellow's gravatar image

Ian Goodfellow

edited Nov 06 '10 at 06:25

ogrisel's gravatar image


In my experience on large datasets (thousands of dimensions), KMeans is much faster than Mean shift. Ian's point about the linear vs quadratic complexity of KMeans is on the spot: the total convergence time is not given by the cost of one iteration. While the worst case complexity of KMeans is nasty, it's average complexity is quite OK (see the notes in the scikit-learn's implementation, and the references therein).

I do find that on small datasets, MeanShift tends to give more plausible results. In particular, KMeans enforces similar cluster size (actually, similar cluster variance), and this model may not fit your data.

answered Nov 06 '10 at 06:08

Gael%20Varoquaux's gravatar image

Gael Varoquaux

Your answer
toggle preview

powered by OSQA

User submitted content is under Creative Commons: Attribution - Share Alike; Other things copyright (C) 2010, MetaOptimize LLC.