# The calculated R squared is not equal to the squared of correlation coefficient by Matlab functions corr

262 views (last 30 days)

Show older comments

With model predicitons and true values, the R2 (determiantion coefficient) can be readily calculated using the standard formula:

Rsq = 1 - sum((ytrue - ypred).^2)/sum((ytrue - mean(ytrue)).^2)

Alternativley, the R square can be obtained by calculating the correlation coefficient, using buildin functions such as corr or corrcoeff:

Rsq = (corr(ytrue,ypred))^2

However, it is found the latter value is sligherly larger than the former. How does the build-in function give a higher value?

##### 3 Comments

### Answers (2)

Ameer Hamza
on 23 Apr 2020

##### 0 Comments

John D'Errico
on 24 Apr 2020

Edited: John D'Errico
on 24 Apr 2020

What I do not see is the actual model you used. Did you use a linear model? Was there a constant term in the model? The problem is, depending on the model, the claims you make about R^2 and the correlation coefficient are only valid for specific models.

x = rand(10,1);

>> y = rand(10,1);

>> p2 = polyfit(x,y,2);

>> pred = polyval(p2,x);

>> Rsq = 1 - sum((y - pred).^2)/sum((y - mean(y)).^2)

Rsq =

0.140274350649466

>> corr(y,pred).^2

ans =

0.140274350649466

So, the square of the correlation coefficient is the same as the value your formula computes. It matches down to the last digit, which is my expectation.

However, now try the same thing, but using a model that has no constant term in it. In this case, I'll use a cubic polynomial fit, but one that has no constant term. We can do that using backslash, though I could have done the fit using any number of tools.

mdl = [x,x.^2,x.^3]\y

mdl =

0.552026949387604

3.2235169295382

-3.50451900695301

>> pred = [x,x.^2,x.^3]*mdl;

>> Rsq = 1 - sum((y - pred).^2)/sum((y - mean(y)).^2)

Rsq =

0.195980323024559

>> corr(y,pred).^2

ans =

0.200698709640219

What was wrong? The error is in the assumption that the two ways compute the same thing for models that have no constant term estimated.

There are adjusted R^2 computations that can be more accurate in these cases, but even so, there is no expectation the formulas will give the same result any longer, when the model lacks a constant term.

##### 0 Comments

### See Also

### Categories

### Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!