|
suppose x and y are m-by-p and n-by-p matrices respectively (x and y came from different multinormal distribution) . Can I calculate Covariance[Join[x,y]] from Covariance[x] and Covariance[y]? Are there a closed form to calculate Covariance[Join[x,y]] without combine these two matrices? Note : Join[x,y] combine the x and y to generate one matrix (m+n)-by-p thank you all
showing 5 of 6
show all
|
|
I think this might be possible if your data is centered (as then the ij-th entry of the covariance matrix is just the expected product of features i and j), but if your data is not centered then you'd have to keep other partial results to get the full product. For example, if you do one pass through x and y to compute the means of the features (and then you can combine the means of x and y using only the values of the means and the values of n and m) you can center the data and do another pass through x and y separately computing a cumulative sum, for every pair of features fi and fj, of (fi - mean(fi))(fj - mean(fj)). Then you can add these two temporary matrices and divide each entry by n+m and get your covariance matrix. I think centering data is very good solution. thank you
(May 04 '11 at 11:10)
Omar Osama
|
Are there any restrictions on (or relationships between) m, n, or p? Do you know of any particular properties of x, y? Usually matrices need to be conformal with each other and often at least one symmetric for these kinds of computations. Can you tell us more about the problem you're working on?
sorry for being ambiguous.
I did edit the question. I think it is clear now.
If both samples come from 2 different distributions, aren't their covariances 0? Since they are independent
after posting the question i did think about that
but what I mean is for example if i have m machine each has a dataset each dataset has different number of observations but the same features (note that the same features should be from the same distribution with just variation)
consider that I want to apply LDA classifier on these m datasets, each one will return me a Cov matrix .. I want to combine them to generate the "true" Cov matrix
I wish I am clear .. :)
What do you mean by LDA (Latent Dirichlet Allocation or Linear Discriminant Analysis)?
Parallelism isn't really my cup of tea, but I would find difficult to use LDA in such a scheme. Why don't you try using mixture models
LDA here means Linear Discriminant Analysis.
thank you very much