I'm implementing PCA using eigenvalue decomposition for sparse data. I know matlab has PCA implemented, but it helps me understand all the technicalities when I write code. I've been following the guidance from here, but I'm getting different results in comparison to built-in function princomp.

Could anybody look at it and point me in the right direction.

Here's the code:

function [mu, Ev, Val ] = pca(data)

% mu - mean image
% Ev - matrix whose columns are the eigenvectors corresponding to the eigen
% values Val 
% Val - eigenvalues

if nargin ~= 1
 error ('usage: [mu,E,Values] = pca_q1(data)');
end

mu = mean(data)';

nimages = size(data,2);

for i = 1:nimages
 data(:,i) = data(:,i)-mu(i);
end

L = data'*data;
[Ev, Vals]  = eig(L);    
[Ev,Vals] = sort(Ev,Vals);

% computing eigenvector of the real covariance matrix
Ev = data * Ev;

Val = diag(Vals);
Vals = Vals / (nimages - 1);

% normalize Ev to unit length
proper = 0;
for i = 1:nimages
 Ev(:,i) = Ev(:,1)/norm(Ev(:,i));
 if Vals(i) < 0.00001
  Ev(:,i) = zeros(size(Ev,1),1);
 else
  proper = proper+1;
 end;
end;

Ev = Ev(:,1:nimages);

asked Dec 09 '10 at 14:24

matcheek's gravatar image

matcheek
1113

How is the data matrix stored ? Is each row a different data point and each column a feature/pixel value ? Or is each column a different data point ?

(Dec 09 '10 at 15:33) Aman

One Answer:

From 'nimages = size(data,2);', it seems that each column is an example in your case. So basically your data is DxN where D is the number of features and N is the number of examples. In this case, first off, your data mean 'mu' would be mu = mean(data,2) or mu = mean(data'). You seem to be wanting to be using the latter but the transpose is outside (you used mean(data)').

Once you have the mean, you should just be doing data(:,i) = data(:,i)-mu to center each data point (or use a 'repmat' with mu if you want a one-liner). You are using mu(i) which again isn't right.

Also, you use data'data for covariance which (assuming columns are examples) is NxN. You should use datadata'/(N-1) which is how the covariance is defined. Or use the 'cov' function from matlab.

answered Dec 11 '10 at 03:25

spinxl39's gravatar image

spinxl39
3698114869

Your answer
toggle preview

powered by OSQA

User submitted content is under Creative Commons: Attribution - Share Alike; Other things copyright (C) 2010, MetaOptimize LLC.