Mean shift has a variety of problems that are somewhat complementary to the ones k-means has. For example, instead of having to pick the number of clusters, you have to pick the kernel. For some kernels with finite support, the kernel can get stuck in a region of uniform density. If you had two big spheres of uniformly distributed data points, k-means would be able to separate these but a mean shift kernel using uniform weight over a small sphere would not.

It's not exactly true to say that the runtime of k-means is linear or mean shift is quadratic. That's just for one iteration of the algorithm, but what you really care about is the convergence time, which requires a difficult to predict number of iterations. I don't know of any theoretical analysis of the runtime of mean shift, but k-means is actually superpolynomial: theory.stanford.edu/~sergei/papers/kMeans-socg.pdf