|
In EM we use k different covariance matrices for k different gaussians. But in GDA, we use the same covariance matrix for both the gaussians. Why is this so? |
|
EM in itself is an optimization algorithm, not a model, so it doesn't say anything about Gaussians. From context I assume you are talking about mixtures of Gaussians. The assumption of GDA that all classes have the same covariance is just a simplification. This might be appropriate because you know something about your data or it might just make things simpler. It means that you have less parameters, reducing the complexity of your classifier and therefor avoiding over-fitting and simplifying parameter estimation. Yes, I was talking about mixture of Gaussians. One of my friends gave me this justification -- same covariance because the features are same. Applying the same logic, then EM should also use the same covariances. Can you elaborate on why this justification is misplaced? Also, what if we make the same simplifying assumption (of using the same covariance matrix) for EM?
(May 12 '11 at 06:34)
athena123
What do you mean by "the features are the same"? I would see the classes having same covariances as an assumption about how the data is generated. If you expect the data to be Gaussian with some given covariance given the class, then this assumption is justified. For example if the values of the means correspond to some "real" locations and the Gaussian distribution models some sensor noise, then if the noise is independent of the location, this assumption would be justified - I think this is along the lines of what your friend means. Of course, this same assumption would then be justified for doing mixture models. You can just do EM with this additional constraint and you will get some fit of the data. If the assumption is valid, then your fit will be faster and more robust. If it is not, then your fit will not be as good. As all model assumptions, you could also see this as some belief you have about the way one should interpret the data or how you choose to interpret the data. Usually I would think this is more of a simplifying assumption than some real belief about the data.
(May 12 '11 at 08:30)
Andreas Mueller
|
|
If you use EM for mixtures of gaussians with the same covariance assumption you get something equally valid. For example, scikits.learn's implementation of mixtures of gaussians allows this. Conversely, if you use a GDA with different covariance matrices you get what is called a quadratic discriminant analysis. |