|
What are the current main methods for flattening hierarchical clusters? Specifically, given a dendrogram Z, deciding where to cut the dendrogram to result in k clusters. Scipy has the fcluster algorithm, which can use an inconsistency metric to split clusters when the child clusters are different. However there are a lot of different options, and each has the manually chosen t parameter. I'm looking for something more automated, even if it isn't perfect. |
|
I have code written against the Python hcluster package, which uses this approach to find the smallest flat-clustering. I can share this code, if you would like. You may not want the smallest flat-clustering, though. Perhaps you want a sample of flat clusters at various thresholds. Here is how I find the smallest flat-clustering. I do a line-search against the flattening threshold t. The minimum of the interval is t small enough to have every cluster included. The maximum of the interval is t large enough to have only one cluster. You then do a line search to find the value of t that has more than one cluster, but the minimum number of clusters. I do a line search with a maximum of 15 steps. Here is the code:
|
|
This is similar to pruning a decision tree, so many of the techniques (e.g., this one, with suitable modifications) for that task are applicable here. |
I've marked this question as answered, but if anyone stumbles across this question and has other suggestions, I'd be happy to hear them.