# Why use k-means in place of Mean Shift clustering?

 8 4 Mean Shift clustering (see also Mean Shift: A Robust Approach to Feature Space Analysis) is able to produce clusters with shapes that depend upon the topology of the data and does not need an a priori estimate of the number of clusters to find. K-means on the other-hand assumes the isotropy of the clusters and need to be taught the number of clusters to extract in advance. Still, K-Means seams to the de-facto baseline clustering algorithm. Are there reasons (such as for instance: computational efficiency, memory requirements, robustness) to prefer using k-means instead of Mean Shift? How do you explain k-means is so popular and mean shift less known? asked Jul 06 '10 at 20:20 ogrisel 4989●9●55●91

 7 I just found the answer to my own question in this very interesting blog post. To summarize: k-means might be simpler to implement, has a complexity that is linear w.r.t. the number of samples while Mean Shift is quadratic, and further more Mean Shift was long introduced as tool a tailored for computer vision tasks and hence not disseminated in the machine learning community at large. answered Jul 06 '10 at 20:28 ogrisel 4989●9●55●91
 1 I've always used k-means and set k=n when I didn't know k ahead of time (+some center merging criteria, which seems to come down to h in eq. 1), but it's pretty expensive. This is probably a more powerful algorithm, but k-means is a lot simpler--I imagine that's why. There's reasonably complex math (at least, non-intuitive), and you have to choose h. Everyone learns bubble sort, even though nobody uses it in practice (although given, everyone learns quicksort & mergesort too). answered Jul 06 '10 at 20:42 sbirch 241●5●7●11
 4 Mean shift has a variety of problems that are somewhat complementary to the ones k-means has. For example, instead of having to pick the number of clusters, you have to pick the kernel. For some kernels with finite support, the kernel can get stuck in a region of uniform density. If you had two big spheres of uniformly distributed data points, k-means would be able to separate these but a mean shift kernel using uniform weight over a small sphere would not. It's not exactly true to say that the runtime of k-means is linear or mean shift is quadratic. That's just for one iteration of the algorithm, but what you really care about is the convergence time, which requires a difficult to predict number of iterations. I don't know of any theoretical analysis of the runtime of mean shift, but k-means is actually superpolynomial: see How Slow is the k-Means Method? by David Arthur & Sergei Vassilvitskii answered Jul 26 '10 at 14:06 Ian Goodfellow 1072●16●27●34 ogrisel 4989●9●55●91
 2 In my experience on large datasets (thousands of dimensions), KMeans is much faster than Mean shift. Ian's point about the linear vs quadratic complexity of KMeans is on the spot: the total convergence time is not given by the cost of one iteration. While the worst case complexity of KMeans is nasty, it's average complexity is quite OK (see the notes in the scikit-learn's implementation, and the references therein). I do find that on small datasets, MeanShift tends to give more plausible results. In particular, KMeans enforces similar cluster size (actually, similar cluster variance), and this model may not fit your data. answered Nov 06 '10 at 06:08 Gael Varoquaux 921●4●14●26
 toggle preview community wiki

### Subscription:

Tags:

×139
×39
×28
×13
×12
×3

Asked: Jul 06 '10 at 20:20

Seen: 8,609 times

Last updated: Nov 06 '10 at 06:25

User submitted content is under Creative Commons: Attribution - Share Alike; Other things copyright (C) 2010, MetaOptimize LLC.