|
I guess the question is not very well-phrased. Let me take one example. We can use LDA to model text and it is vaguely a kind of matrix factorization model. On the other hand, we can also use pure matrix factorization methods like Non-negative matrix factorization model to model text. So, what's the real difference between these two kind of models? I heard that Bayesian models are easier to be interpreted while computational models are more computational efficient. I also would like to hear some linkage (e.g., surveys, papers) about two kind of models. I come across one paper "a unified view of matrix factorization* which incorporate nearly everything into a Bregman divergence function. But, I don't know how this view is usually pursued. Thanks. |
|
IMO, the differences are mainly due to the kind of generative assumptions (or the lack of it) these models make which can be crucial for the type of data being modeled. For example, it may not make much sense to assume a Gaussian likelihood model for data if your data is text (in which case, you should rather use a multinomial model). Also, some models can be seen as less advanced versions of other models. For example, LDA can be seen as a Bayesian extension of the probabilistic LSI which in turn is a probabilistic extension of LSI, and so on. Another difference comes from the fact whether the models are modular or not. For instance, generative models such as LDA can be easily used within more complicated models (think of various extensions of LDA). This may not be true for many other (non-probabilistic) models. I should also mention that some very similar models were discovered around the same time with the same goals in mind (e.g., discrete/multinomial PCA which is actually equivalent to LDA). Thanks. But, how can we easily convert one into another (e.g., by changing the loss function)? Say, for LDA, what is the nearest computational model?
(Aug 17 '10 at 19:41)
Liangjie Hong
Usually, it can be done say by changing the generative model, the priors or the likelihood function. For example, LDA reduces to probabilistic LSI if you treat the topic distribution theta as a parameter and not as a random variable with a dirichlet prior in it,
(Aug 17 '10 at 19:48)
spinxl39
I guess my question is not well-formed. pLSI is still p. Why not just non-negative matrix factorization?
(Aug 17 '10 at 21:09)
Liangjie Hong
1
Well, if you compare LDA vs NMF, both can be seen as doing matrix factorization: term by document matrix decomposed into term by topic matrix and topic by document matrix. However, in NMF this decomposition isn't unique whereas in LDA it is. See this paper. Besides, non-probabilistic NMF isn't a generative model whereas LDA is.
(Aug 17 '10 at 21:46)
spinxl39
|