Broad confidence bound range when fitting in Matlab
19 views (last 30 days)
Show older comments
Hey,
I fit a function to data in Matlab and for the obtained fitting parameters, I get quite large range from Matlab. I have attached the picture. The confidence bounds Matlab shows me for some of my parameters are way bigger than the lower and upper bound of my parameters. I know the function I am fitting is very sensitive to two of the fitting parameters and even very small changes in these two parameters make huge changes. I am wondering why I am getting this huge confidence bound from Matlab and if I can trust the fitting result in this situation?
Thaks in advance!
0 Comments
Answers (3)
John D'Errico
on 27 Nov 2022
Edited: John D'Errico
on 27 Nov 2022
Wide confidence limits are typically a symptom, a reflection of uncertainty in some form. And unfortunately, we don't have your data, so it is difficult to be positive where that uncetainty lies. I suppose I could make up an example of each problem I mention below, but that would take a lot of time to build.
It might be that you have insufficient data to fit the curve well. So too many parameters in your model for the information content in the data. This is not uncommon.
It might be that some of your parameters can trade off with each other to some extent. So a change in one parameter can be offset by a simiilar change in another. Even if there is a global optimum, it might be difficult to resolve. Again, you can call this a variation of the first issue, that your model is too complex for the data available to fit well.
It might be that your model is just not a good fit to your data. In that case, the wide confidence intervals are merely a reflection of the intrinsicly wrong model.
Another issue is how the confidence intervals are derived. They are only approximations that ignore correlations betwwen your parameters. Remember those tradefoffs I mentioned above? The confidence intervals you see assume tradeoffs don't exist.
Given some time, I could probably come up with some other scenarios too, but in the end, remember the word uncertainty. Wide confidence intervals suggest uncertainty, but that uncertainty might arise from different sources, in different ways. Can you trust the result? Hard question there, since we don't see anything beyond the confidence intervals. Trust is sort of meaningless in this context, not really a good word. They are numbers - trust that. Only as good as your data and the validity of your model. With more data and less noisy data, the confidence intervals will potentially be tighter. And remember that in a real world context, in the presence of noise and other confounding factors, no model is a perfect description of data.
Honestly, mine is not a very useful answer in my eyes. But we don't have your data. We don't have your model. We don't know why it is that you think that is a good model for your data.
3 Comments
Bjorn Gustavsson
on 29 Nov 2022
In addition to John's points about the uncertainty, it is worth mentioning that you should try to get out at least the parameter covariance matrix from the fit. That would allow you to get the ellipsoid for the parameter uncertainty. This should help.
Star Strider
on 27 Nov 2022
The important thing to note here is that the confidence intervals for ‘n’, ‘L’ and ‘A’ include zero (have opposite signs) and so are not actually needed in the model and contribute nothing significant to the fit to the data. The idea of ‘trust’ is obviously subjective, however assuming that the model actually describes the process that created the data and the data measurements are accurate may not be appropriate.
I would examine the data to be certain that the process that created them and measured them (specifically that the measuring equipment was appropriately calibrated) conform to the assumptions of the model being used to estimate their parameters. If that is not actually the situation, then the model may not be appropriate to the data, and a different model (specifically one that describes the process that produced the data) may be required.
.
2 Comments
John D'Errico
on 27 Nov 2022
Edited: John D'Errico
on 27 Nov 2022
If a model is nonlinear, that the confidence band includes zero does not necessaily indicate the parameter is not needed. Even if the model is linear in the parameters, it may simply indicate insufficient data (or too much noise in the data) for the complexity of the model. For example:
x = randn(5,1)/10;
y = x.^3 + randn(size(x));
mdl = fittype('a + b*x.^3','indep','x');
fittedmdl = fit(x,y,mdl)
plot(fittedmdl,x,y)
So we have a model with relatively huge noise. In fact, the model is almost correct, though with the inclusion of an unnecessary constant term. But all necessary terms are included in the model too.
Note that the confidence interval on the constant term certainly includes zero, so your assertion would be correct there. It was unnecessary. But the cubic term would then also be deemed just as unnecessary. In fact, the estimated sign of the cubic term was completely wrong.
Here the width of the confidence intervals is a signal that the data is wildly inadequate to fit that model, given the noise in the data.
As well, that a parameter confidence interval includes zero may simply be evidence of nonlinearity, or possibly lack of fit.
Walter Roberson
on 27 Nov 2022
Edited: Walter Roberson
on 27 Nov 2022
When you see a coefficient shown with bounds that are close to the equal positive and negative, then it typically means that the fitting process could not decide whether the coefficient should be positive or negative. Consider for example if you fitted with a model A^2*x + B then negative and positive A would give the same result and so negative versus positive cannot be resolved.
If there are an even number of coefficients that follow the same pattern, it can mean that the model cannot distinguish between (negative for one coefficient, positive for a second) compared to (positive for one coefficient, negative for a second coefficient) . For example, A*exp(-B*x) + C*exp(-D*x) then if you swap A and C and B and D simultaneously you have the same equation; 2*exp(-3*x) + (-5)*exp(-7*x) is the same as (-5)*exp(-7*x) + 2*exp(-3*x) so A = 2 versus A = -5 cannot be resolved
In such cases it can help to set up constraints on one of the variables to be 0 to inf . If you have not analyzed the function to see which signs are important, then add one constraint at a time to see how the other variables react.
2 Comments
Walter Roberson
on 29 Nov 2022
How did you set the bounds on the fitting process?
It is possible to have bounds that are strictly positive but for the calculated range to include some negative . The calculation involves mean and standard deviation, and when the distribution does not happen to match Normal Distribution, it is possible that 3 standard deviations below the mean might be negative even though none of the underlying values are negative. However you can generally tell that situation apart by the fact that in the case where the fitting cannot tell the difference between positive and negative, then the range is pretty much equal positive and negative, whereas for the case of standard-deviations-predict-negative then the reported range will be distinctly biased.
See Also
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!