I notice it is a common practice to scale the observations of each dimension into having mean 0 and variance 1; and then in the model we can assume all the dimensions have the same unknown variance.

Assume x is a multivariate variable with D dimensions. Assume x is generated from a mixture model

x~sum pi_k f_k(x)

where pi_k is the prior of component k and f_k is the density function or mass function of component k.

In a naive setting, the dimensions are assumed independently. It is possible that distributions for different dimensions vary in their scale. For example, maybe D_1, the first dimension, is a Gaussian with very large precision; while D_2 is Gaussian with very small precision; and D_3 is a binomial distribution, etc. Therefore,

p(x)=Pi p(x_d)

Assume X is the set of observations for x, the latent variable z_i (indicating the assignment of ith observation to one of the K mixture component. Then the posterior z given X are

p(z|X) = p(X|z)p(z) / p(X)

My question is: Do this different dimensions (with different scales) contribute to the result (inferring z) equally? Since the pdf of D_1 has a much larger slope than that of D_2, it feels that when changing the value of z_i from one mixture component to another, D_1 has a much larger impact on the value of p(X|z) than D_2. Therefore, it seems that D_1 has a larger "weight" on the results than D_2, doesn't it?

asked Oct 26 '10 at 10:31

Denzel's gravatar image

Denzel
1113410


One Answer:

I'm not 100% sure whether I understood your question correctly, but the impact of a certain dimension on the change of p(X|z) after z_i should also depend on how the corresponding component differs from the previous one with respect to that dimension. One could for example assume that the Gaussians in the mixture that generated a certain dimension are almost identical. In that case, other dimensions with lower precision Gaussians with means that are more different can still be dominant in determining z_i. If their means are far apart but they are far from the data, they will also have very little influence. In the end it depends on all the components their densities and how they relate to the data you observe.

To be honest, I don't directly see what this has to do with normalization because the point of it is often that the model that you are presenting the data doesn't know its dimensions variances. Feeding this data to a model that assumes different variances anyway, will indeed often lead to an exaggerated influence of those dimensions which variance it assumed to be lower.

Hope this made sense :)

answered Oct 26 '10 at 11:32

Philemon%20Brakel's gravatar image

Philemon Brakel
2445103560

edited Oct 26 '10 at 11:35

Your answer
toggle preview

powered by OSQA

User submitted content is under Creative Commons: Attribution - Share Alike; Other things copyright (C) 2010, MetaOptimize LLC.