Asked by Roberto Suarez-Moreno
on 16 Jul 2019

Hello everyon

I'm using the stepwisefit function between the predictor variable X(69,8) and the response variable y. Each column in X (predictor) is previously standardized as well as the response variable. After applying the function as follows: B = stepwisefit(X,y), only two predictors appears to be in the model given the statistical significance by default (0.05). The problem is that I expect these coefficients to range between +1 and -1 due to previous standardization, but the values are 2.0251 and -2.2983. I need them to be standardized so that I can calculate the percentage of explained variance by each one.

Thanks in advance for your help!

Answer by the cyclist
on 16 Jul 2019

Accepted Answer

I'm not sure I would necessarily expect coefficients to be in that range, even for normalized variables. Can you quote a reference for that?

Here is one simple example of a straightforward OLS regression that defies your expectation. Granted, I am using highly negatively correlated explanatory variables.

N = 100;

x1 = randn(N,1);

x2 = -x1 + 0.2*randn(N,1);

y = 2 + 3*x1 + 4*x2 + 0.7*randn(N,1);

x1n = (x1 - mean(x1))/std(x1);

x2n = (x2 - mean(x2))/std(x2);

yn = (y - mean(y))/std(y);

Xn = [x1n,x2n];

mdl = fitlm(Xn,yn)

figure

scatter(x1n,x2n)

Roberto Suarez-Moreno
on 16 Jul 2019

Thanks again for your quick response.

To make it more clear, I'm trying to apply the same analysis that authors perform in the this paper: https://link.springer.com/article/10.1007/s00382-016-3416-9

The variance is depicted in eq(2) of section 2.3 (Methods). Then, they show the explained variance by each term in Fig. 8a.

I'm trying to do the same but with different predictors for a different response variable. In my case, as in the paper, the predictors and response variable are previously standardized to have unit variance as the authors mention. My predictors are uncorrelated due to an orthogonality constraint.

the cyclist
on 16 Jul 2019

I don't have access to that paper.

If your explanatory variables are uncorrelated with each other, things are simple. The fraction of variance explained by each variable is the square of its correlation coefficient with the response variable. (You don't need a model for that.)

You can also double-check this as follows ...

If you call your function like this

[b,se,pval,inmodel,stats,nextstep,history] = stepwisefit(...)

then

history.rmse

will give the progression of RMSE as you add variables.

1 - history.rmse.^2

gives the progression of explained variance.

Because your explanatory variables are uncorrelated, the additional explained variance is due completely to the variable added to the model at that stage.

(There will be a bit of rounding error.)

Roberto Suarez-Moreno
on 16 Jul 2019

Yes, that is exactly was I have done. I've run the stepwisefit by successively add terms (columns) to the predictor. Then, each increase in the Rsquared, or what is the same, the rmse^2, provides each individual explained variance, which finally coincides with the Rsquared of the model with all the terms included due to the linear independence between predictors.

My initial error was to base the calculation on the regression coefficients. Thanks for your help, it has been crucial. In case you are interested in the paper, please let me know so that I can share with you the pdf.

Sign in to comment.

Opportunities for recent engineering grads.

Apply Today
## 0 Comments

Sign in to comment.