|
Is there a computational methods to determine the amount of compression rate in PCA due to features dimension & count? In the first n cols for reduction then perform reproject on one percentage of the data after it compute the difference with original data .If the validation is greater than the specified threshold select n cols else increase n and repeat loop. Is this true? is there a better solution? |
|
To determine the compression amount, you have to decide how much of the variance you want to retain. A common method is to sort the eigenvalues in descending order (most implementations of PCA already do this for you). Then, if you compute the vector of cumulative sums and normalize that vector by dividing it by the total sum, the elements of that vector indicate the fraction of total variance retained by keeping that many eigenvectors. For example:
The compression is just to clarify bogatron's answer: the mean square reconstruction error is given by the sum of the eigenvalues (variance) corresponding to the eigenvectors you reject. So you don't calculate the reconstruction error by repeatedly projecting onto each 1 to ith eigenvectors, you have the mean square recontruction error for every possible choice of principal components by just looking at the eigenvalues.
(Aug 04 '13 at 19:21)
SeanV
|