confidence intervals returned by predict()
Show older comments
The predict() function returns confidence intervals (CIs) for values predicted from a model. There are four options available for the CIs. Two of the options do not give the CIs I expect. Can someone explain these unexpected results? Are my expectaitons wrong or is the function wrong? I will give examples, using a simple linear regression model, and I will explain what values I expect. I'm sorry this is a long post, but I did not have time to make it shorter.
Create some data and make a simple linear regression model:
x=(5:15)';
b0=0; b1=1; sigma=1; %b0=intercept, b1=slope, sigma=s.d. of random noise
y=b0+b1*x+sigma*randn(size(x));
mdl=fitlm(x,y); % model using x, y
Make predictions with confidence intervals (four options for CIs)
xnew=(0:20)';
[~,yci1] =predict(mdl,xnew,'Prediction','curve', 'Simultaneous',false);
[~,yci2] =predict(mdl,xnew,'Prediction','curve', 'Simultaneous',true);
[~,yci3] =predict(mdl,xnew,'Prediction','observation','Simultaneous',false);
[ypred,yci4]=predict(mdl,xnew,'Prediction','observation','Simultaneous',true);
Plot predictions and confidence intervals
figure
subplot(211)
plot(x,y,'k*',xnew,ypred,'-k.'); hold on
plot(xnew,yci1(:,1),'-r',xnew,yci2(:,1),'-g',xnew,yci3(:,1),'-b',xnew,yci4(:,1),'-m');
plot(xnew,yci1(:,2),'-r',xnew,yci2(:,2),'-g',xnew,yci3(:,2),'-b',xnew,yci4(:,2),'-m');
legend('Data','Prediction','curve,non-simul','curve,simul.','obs.,non-simul','obs.,simul.')
ylabel('Y'); grid on
subplot(212)
plot(xnew,yci1(:,2)-ypred,'-r',xnew,yci2(:,2)-ypred,'-g',...
xnew,yci3(:,2)-ypred,'-b',xnew,yci4(:,2)-ypred,'-m');
legend('curve,non-simul','curve,simul.','obs.,non-simul','obs.,simul.')
xlabel('X'); ylabel('C.I. Half-width'); grid on
I wish the Matlab help epxlained the following, which took me some work to figure out: The four different CIs returned by predict() follow the general formula
where SE varies depending on the 'Prediction' option, and c varies depending on the 'Simultaneous' option.
When predict() is called with 'Prediction','curve', SE is given by
where 
When predict() is called with 'Prediction','observation', SE is given by

When predict() is called with 'Simultaneous',false, c (for simple linear regression) is given by
where p is the CI probability, 0.95 by default. The critical value of the t statistic can be obtained in Matlab with c=tinv((1+p)/2,n-2). In the example here, p=0.95 and n=11, therefore c=tinv(.975,9)=2.2622. The formulas above produce CIs that agree with the CIs of predict(), when Simultaneous is false. These CIs are plotted in red and blue above.
When Simultaneous is true, the results are not what I expect. I expect the CIs (which, according to the Matlab Help, are by Scheffe's method) to be (see here and here; these sources use different notation, but they appear to agree):
where d is the number of independent new x values for simultaneous prediction. In the examples plotted above, d=21, because length(xnew)=21. Therefore we expect c=sqrt(21*finv(.95,21,9))=7.8391. Therefore we expect the CI widths to be wider by a uniform factor of 7.84/2.26=3.47, when Simultaneous is true. But the CIs are only wider by a factor of 1.2898. (The ratio of CI widths is the same when 'Prediction','observation' is used.) Why the discrepancy?
The confidence interval, when predicting a single value with 'Simultaneous',true , is also not what we expect. When predicting a single value, d=1, and c simplifies to
. , where p is the CI probability. This is identical to the non-simultaneous confidence interval,
, due to the relationship between F and t distributions. It makes sense that the simultaneous and non-simultaneous CIs would be the same when there is only one value being predicted "simultaneously". But the CIs returned by predict() are not the same, when one value is being predicted. See example below.
xnew=10;
[ypred1,yci1]=predict(mdl,xnew,'Prediction','curve','Simultaneous',false);
[ypred2,yci2]=predict(mdl,xnew,'Prediction','curve','Simultaneous',true);
fprintf('CI, non-simultaneous: %.2f to %.2f; half-width %.2f\n',yci1,yci1(2)-ypred1)
fprintf('CI, simultaneous: %.2f to %.2f; half-width %.2f\n',yci2,yci2(2)-ypred2)
Why are the CIs not the same?
Accepted Answer
More Answers (0)
Categories
Find more on Linear Predictive Coding in Help Center and File Exchange
Products
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!
