1
1

Is there any significant difference? I came across this but am not sure. In LSI why don't we center the data as in PCA? (the answer is hinted in the link but still not clear) I mean isn't it more better if we centered the data first? I expect a better subspace would be found if centered.

asked Nov 10 '10 at 21:09

Oliver%20Mitevski's gravatar image

Oliver Mitevski
753172640


3 Answers:

Assume X is a terms-by-documents matrix.

The SVD of X is given by X=USV^T, which implies that X^TX = V(S^TS)V^T and XX^T = U(SS^T)U. From this we can conclude that the projected document space is given by X^T = VS^T and the projected term space is given by X = US. If we keep only the top-k singular values S_k, we get reduced document and term spaces: X^T_k = VS^T_k and X_k = US_k.

Applying PCA to the centered terms-by-document matrix, corresponds to finding eigenvectors of the covariance matrix (X-E[X])(X-E[X])^T = XX^T. Keeping the top-k eigenvalues, we get the reduced document space X^T_k = VS^T_k. Conversely, applying PCA to the centered documents-by-terms matrix, corresponds to finding eigenvectors of the covariance matrix (X^T-E[X^T])(X^T-E[X^T])^T = X^TX. Keeping the top-k eigenvalues, we get the reduced term space X_k = US_k.

Thus, ignoring the issue of centering, LSI, which is just rank-reduced SVD, gives us a way to project both documents and terms, while PCA only gives one of these projections.

answered Nov 17 '10 at 06:01

Oscar%20T%C3%A4ckstr%C3%B6m's gravatar image

Oscar Täckström
1459102743

There isn't really much of a difference. In the case of text data, centering wouldn't make much of a difference if you compare LSI and PCA. Because of the sparse nature of the feature vector of each document, the mean would be close to zero and centering isn't really needed. Therefore LSI and PCA would end up doing almost similarly. Look at section 2.1 of the paper Translingual Document Representations from Discriminative Projections from EMNLP this year which discusses this issue briefly.

answered Nov 11 '10 at 00:55

spinxl39's gravatar image

spinxl39
3458104368

Thanks for clearing my doubts.

(Nov 11 '10 at 02:24) Oliver Mitevski

In LSI you factorize and truncate the document-word matrix, while in PCA you factorize and truncate the word covariance matrix. In any case, as spinxl39 pointed out, one of the matrices you get in LSI is pretty much equivalent to the matrix you get from PCA, in terms of what it does.

answered Nov 11 '10 at 03:30

Alexandre%20Passos's gravatar image

Alexandre Passos ♦
1896744214334

Your answer
toggle preview

powered by OSQA

User submitted content is under Creative Commons: Attribution - Share Alike; Other things copyright (C) 2010, MetaOptimize LLC.