Suppose we have conditional entropy which is a variant of the simple one defined in v-measure but based on distances between datapoints and clusters centers instead of class labels (so in the formula set C of classes is replaced by set X of datapoints, and a_ck is replaced by Dist(x, k)).

My question is, is it possible to compute the conditional entropy H(X|Y) (respectively H(Y|X)) in an incremental manner, i.e. update it each time a new datapoint x is considered, but the number of clusters (i.e. |Y|) is not fix (it may increase each time a new datapoint is added, or not) ?

When I asked this question to someone he said the following, but I didn't really understood how to do that. So if someone can explain that, it would be great:

"the only way I can imagine that is to place a Gaussian at each cluster center and find the conditional for a point to belong to it. For example there are 3 cluster centers given by the mean of the datapoints belonging to them, there is a mean x/y distance and standard variation, make a multidimensional gaussian/normal distribution on that, now you got probabilities for the data points. You got a likelihood, you can get the conditional too"

asked Feb 25 '12 at 12:12

shn's gravatar image

shn
462414759

Be the first one to answer this question!
toggle preview

powered by OSQA

User submitted content is under Creative Commons: Attribution - Share Alike; Other things copyright (C) 2010, MetaOptimize LLC.