Principle Component Analysis/ Singular value decomposition; great with ovariancancer dataset terrible with my data

7 views (last 30 days)
Hello,
I am currently getting to grips with PCA, I came accorss a great tutorial from Steve Brunton on its use with matlab. This turotial makes use of the ovariancancer dataset included with matlab and works very well for seperating data, however when I try to apply it to my own data the seperation is nowhere near as clear. So my questions;
  1. Is there a discriptor of what each of the 4000 features within the ovariancancer database are and any pre-processing done on them?
  2. For anyone maths minded, what would be causing this to work well for one dataset and not the other. I can see my rank is very high but I cannot understand why.
What is my data? My data is 13 channel PSG recordings, from which I window into 10 second windows, with 5 second overlaps. I then calculate Mean, Med,Mode,variance ,Standard deviation, Interquartile range, range, kurtios and skewness. This gives me 117 features (9*13). I will include the first 1000 rows of features and clinical truth as the data is open source anyways. The code, which works well is included below;
%load ovariancancer % works great with this featureset but poorly with others, I have renamed
% my uploaded vaiables to match this example
[U,S,V] = svd(obs,'econ');
figure
subplot(1,2,1)
semilogy(diag(S),'k-o','LineWidth',2.5)
set(gca,'FontSize',15), axis tight, grid on
subplot(1,2,2)
plot(cumsum(diag(S))./sum(diag(S)),'k-o','Linewidth',2.5)
set(gca,'FontSize',15), axis tight, grid on
set(gcf,'Position',[1440 100 3*600 3*250])
figure, hold on
for i = 1:size(obs,1)
x = V(:,1)'*obs(i,:)';
y = V(:,2)'*obs(i,:)';
z = V(:,3)'*obs(i,:)';
if (grp(i) == 1)
plot3(x,y,z,'rx','LineWidth',1);
else
plot3(x,y,z,'bo','LineWidth',1);
end
end
xlabel('PC1'), ylabel('PC2'), zlabel('PC3')
view(85,25), grid on, set(gca,'FontSize',15)
set(gcf,'Position',[1400 100 1200 900])
  1 Comment
Christopher McCausland
Christopher McCausland on 17 Dec 2021
For anyone that comes accross this I belive my problem was high variance between the featrues. I resolved this by using the normalize function however this leads to U,S,V returning arrays of NaN values. I will continue to update this if I make any progress but I am not sure why normalisation causes this NaN output.

Sign in to comment.

Answers (0)

Products


Release

R2020b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!