|
Is anyone aware of an incremental K-Means clustering algorithm (preferably open source)? I've found a few academic papers on the subject, but I haven't been able to find any code. |
|
There is an undergoing implementation (not sure how stable) at the scikits.learn. Edit (by OG): Here a link to the PDF of the paper Web Scale K-Means Clustering - very simple to read. The scikit-learn branch does not implement the optimized sparse representation, just the mini-batch part. Edit (by OG): the branch has been merged to master. This will be part of the 0.9 release. Yes, there is a pull request at https://github.com/scikit-learn/scikit-learn/pull/132 . Comments on the pull request or help with the code are most welcome.
(Apr 25 '11 at 16:02)
Gael Varoquaux
Isn't "batch k-means" the exact opposite of incremental/online k-means?
(Apr 25 '11 at 17:22)
Cerin
@Cerin technically yes, but the implementation uses mini-batches, and it's incremental if you use a batcksize of one.
(Apr 25 '11 at 17:25)
Alexandre Passos ♦
@Cerin the name of the branch is wrong, but the implementation is right.
(Apr 26 '11 at 03:34)
ogrisel
Should this be able to support incremental addition and removal of points? Looking through the code in scikits.learn, it only seems to support adding new points. Removing points requires rebuilding the clusters from scratch.
(May 03 '11 at 17:36)
Cerin
Just to followup, although there's no removal function, the incremental addition seems to work very well.
(May 18 '11 at 10:18)
Cerin
showing 5 of 6
show all
|
|
A very simple algorithm for online k-means is here, the code for which can be hacked up quite easily. The algorithm does require you to initialize the K cluster centroids. One way to initialize them would be with the first K examples in the stream. You can then update the centroids using the subsequent examples in the stream, as described in the algorithm. |
|
there are dozens of incremental/online kmeans code that I can list here, vfml has a C version. Which language are you seeking the code for? Preferably Python. I looked at VFML. Unfortunately, it hasn't been maintained in several years, and doesn't compile on most modern Linux distributions. Also, please don't double post.
(May 20 '11 at 11:19)
Cerin
sorry no idea why it was double posted :). About a year ago, I compiled and used it on 64 bit ubuntu 10.04 (I don't remember well but probably I may have changed the Makefile). But I didn't test on natty.
(May 20 '11 at 11:59)
cglr
|
|
Here they have an implementation applied to MRI images, the code seems to be pure matlab, which probably means is really slow, but on the upside, really easy to implement yourself. |