|
Various forms of the correlation, e.g., r = sum (x_i * y_i)/(sigma_x * sigma_y) or r = sum ((x_i-xbar) * (y_i-ybar))/(sigma_x * sigma_y) are popular similarity measures in many applications. Is there a probabilistic interpretation for this such that either r or r^2 is an approximate likelihood for x and y coming from the same or similar distribution? i.e., if we have some form of P_theta1(x) and P_theta2(y), then r is related to P(theta1=theta2 | x,y)? |
I think you will have better luck if you look at these values as unnormalized log-likelihoods (that is, the logarithm of an unnormalized likelihood function). Then you can look for exponential family distributions that have sufficient statistics that look like these terms and see what they look like.
@Alexandre, I agree, the unnormalized log likelihood has a better chance of being close to this than the likelihood itself. I looked at the Gaussian and Exponential but I don't think they look like this. Since the correlation is scale-invariant in either variable, I am sure it requires only part of the parameter to be equal, while the other part could be integrated out, or just substituted for the ML. Since correlation is such a basic quantity, I just feel someone must have thought about it before.
This seems like it should be in the exponential family, so read up on that and see if you find anything that is known and looks somewhat like this.
Tom Minka gave a great lecture on EP to approximate Exponential Families to Gaussians. You may check into that. http://videolectures.net/mlss09uk_minka_ai/