find a correlation

I have a Matrix
X=[0.231914928 3.126057882 -1.752476846
-0.779092587 2.143243132 -1.944363312
-1.744892449 1.206497824 -2.267829067
-0.276817947 1.774687601 -1.768924258
-0.367233254 1.697905199 -1.508506912
-0.367233254 1.697905199 -1.508506912
-1.378240769 0.814907572 -1.700393377
-2.389248284 -0.060411815 -1.892279842
-1.333033116 0.860977013 -1.831972668
0.135041386 1.40613207 -1.333067858]
Y=[0.253549664
-0.231692981
0.768395971
2.988670669
-0.038625616
-0.038625616
-0.525155376
-1.011685136
0.961463336
3.181738034]
At first I want to calculate the correlation coefficient between all X columns which can be done like this
[R]=corrcoef(X)
then I want to see which pair of columns has the highest correlation together for example column 1 with 2 ? 1 with 3? 2 with 3?
then the one that has correlation more than 0.5 lets say for example columns 1 and 2 , then check their correlation with y and say which one is more correlated

2 Comments

Mohammad, in general, I think your questions are great questions for Answers, but often I feel that I do not understand your questions. I think it would be easier for those of us who chose to answer questions, if you could spend a little more time composing your questions.
Niki
Niki on 5 Sep 2011
Thanks Daniel for your comment, For sure, I will do my best , Thanks

Sign in to comment.

 Accepted Answer

For highest correlation:
maxR = max(max(triu(R,1))) % highest correlation
[row,col] = find(R==maxR,1,'first')
The greatest correlation occurs between the columns row an col.
Second question:
[r w] = max([corr(X(:,row),Y) corr(X(:,col),Y)])
w equal to 1 means that X(:,row) has a higher correlation than X(:,col). w equal to 2 mean that X(:,col) has a higher correlation than X(:,row).

6 Comments

Niki
Niki on 5 Sep 2011
there is one problem
at first it only works for the highest correlation , if there are several then will not work, so lets say find the pair correlation in X higher than 0.5 , and then
regarding to the second question
R =
1.0000 0.8171 0.6448
0.8171 1.0000 0.1545
0.6448 0.1545 1.0000
means the highest correlation is between columns 1 and 2
but the command
[r w] = max([corr(X(:,row),Y) corr(X(:,col),Y)])
only calculate the correlation between column 2 and Y
and what about column 1 and Y?
The command
[r w] = max([corr(X(:,row),Y) corr(X(:,col),Y)])
calculates the correlation between column 2 and Y and between 1 and Y. Look:
[corr(X(:,row),Y) corr(X(:,col),Y)]
For first question:
[row,col] = find(triu(R,1)>=0.5)
Then you can use function corr.
Niki
Niki on 5 Sep 2011
please take a look at the code that I used , and if you can solve the error
Thanks
Niki
Niki on 5 Sep 2011
if you put the command, I can accept your answer
Niki
Niki on 5 Sep 2011
corr(X(:,unique([row;col])),Y)

Sign in to comment.

More Answers (2)

the cyclist
the cyclist on 5 Sep 2011
It wasn't perfectly clear to me if you wanted to find correlation with Y for all the columns that had r>0.5, or only the highest. This does all of them. Maybe you could tailor this to what you need.
r = corrcoef(X); % Correlation coefficiant
[i j] = find(r>0.5); % Indices of r > 0.5
indexToOffDiagonalElementsWithHighCorrelation = (i~=j); % Only use off-diagonal elements
XColumnsWithHighCorrelation = unique(i(indexToOffDiagonalElementsWithHighCorrelation))
for nx = 1:numel(XColumnsWithHighCorrelation)
rxy{nx} = corrcoef(X(:,XColumnsWithHighCorrelation(nx)),Y);
disp(rxy{nx})
end

3 Comments

Niki
Niki on 5 Sep 2011
The first problem when you perform
>>r = corrcoef(X)
then you will have like this
r =
1.0000 0.8171 0.6448
0.8171 1.0000 0.1545
0.6448 0.1545 1.0000
But I would like to have like this
r =
0 0 0
0.8171 0 0
0.6448 0.1545 0
How can I do that ?
use triu to zero out the upper diagonal.
Niki
Niki on 5 Sep 2011
Thanks Oleg, I did not know " Triu " :D
Andrei put a command for that, Thanks

Sign in to comment.

[v id]= max(triu(corrcoef([X,Y]),1))
Variant last
R = triu(corrcoef([X,Y]),1)
Rx = R(1:end-1,1:end-1)
Rx05 = Rx.*(Rx>.5)
[ix jx] = find(Rx05==max(Rx05(:)))
cYX = R([ix,jx],4)
[vXY xi]= max(cYX)

7 Comments

Niki
Niki on 5 Sep 2011
Andrei
when I use of your command
[v id]= max(triu(corrcoef([X,Y]),1))
v =
0 0.8171 0.6448 0.5133
what is the output?
first is 0
second is 0.8171 (between column 1 and 2)
third is 0.6448 (between column 1 and 3)
the last is 0.5133 (???)
last between X(:,1) and Y , because v(4) = 1
Niki
Niki on 5 Sep 2011
it is 1 because the highest was column 1 and column 2
therefore if i want to also have the X(:,2) and Y. then what should I do?
R = triu(corrcoef([X,Y]),1)
Rx = R(1:end-1,1:end-1)
Rx05 = Rx.*(Rx>.5)
[ix jx] = find(Rx05==max(Rx05(:)))
cYX = R([ix,jx],4)
[vXY xi]= max(cYX)
X(:,2) and Y -> R(2,4)
Niki
Niki on 5 Sep 2011
I think you can not reach to the answer with this command please check this out
for example we have
>> X=rand(10);
>> Y=rand(1,10);
then we perform
>>[R]=corrcoef(X);
then if you perform
>>[v id]= max(triu(corrcoef([X,Y]),1))
v is different with R , which I only can see one value similar, could you please tell me what is happening with this command ?
Niki
Niki on 5 Sep 2011
Andrei, I like your comment very much, Thanks

Sign in to comment.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!