|
Suppose we have conditional entropy which is a variant of the simple one defined in v-measure but based on distances between datapoints and clusters centers instead of class labels (so in the formula set C of classes is replaced by set X of datapoints, and a_ck is replaced by Dist(x, k)). My question is, is it possible to compute the conditional entropy H(X|Y) (respectively H(Y|X)) in an incremental manner, i.e. update it each time a new datapoint x is considered, but the number of clusters (i.e. |Y|) is not fix (it may increase each time a new datapoint is added, or not) ? When I asked this question to someone he said the following, but I didn't really understood how to do that. So if someone can explain that, it would be great: "the only way I can imagine that is to place a Gaussian at each cluster center and find the conditional for a point to belong to it. For example there are 3 cluster centers given by the mean of the datapoints belonging to them, there is a mean x/y distance and standard variation, make a multidimensional gaussian/normal distribution on that, now you got probabilities for the data points. You got a likelihood, you can get the conditional too" |