|
What I want to do is discover latent clusters in my data. I don't think it matters whether the factors are positive or not for this. In this case is there any advantage in doing NMF over plain old PCA? They're both linear, as far as I understand. |
|
Sounds like your data contains both positive and negative values, given that you "don't think it matters whether the factors are positive or not", then you should not use NMF, as it's trying to decompose a element-wise non-negative matrix, which means not only the factors being non-negative, the coefficients are non-negative as well. In general, the "plain" PCA and NMF can both be interpreted as maximum likelihood estimates under certain likelihood model: PCA has the underlying Gaussian assumption, while NMF can have Gaussian/Poisson/Gamma assumption, depending on how the loss function is defined, with extra constrains that both factors and coefficients are nonnegative. Having this in mind may give you more insight on which one to choose. Nevertheless, it's pretty computationally cheap to try both on a reasonably big data set. So if you don't care about the details, just run both methods and see which one suits your particular application better. So it sounds like NMF is more suited for counts (which can't be negative), whereas PCA is more general?
(Feb 04 '14 at 04:32)
digdug
Yes, count, or energy, anything non-negative by nature. NMF is good at decomposing stuff where you may have additive property, e.g. the data is from the (non-negative) weighted sums of factor 1 and factor 2. PCA, on the other hand, is only changing the coordinate system into a new one which is formed by the eigenvectors of the covariance matrix.
(Feb 04 '14 at 12:57)
Dawen Liang
|