How to fit a custom non-linear function with known (and constant) value of uncertainty?
9 views (last 30 days)
Show older comments
Joshua Hargis
on 12 Oct 2017
Commented: Joshua Hargis
on 14 Oct 2017
Hi all, I've been searching the documentation and the Mathworks Community for a while now and haven't found quite what I need (or maybe I have but haven't realized it yet). Basically, I would like to fit a non-linear model to some data I have.
I have the following non-linear model (analytic approximation to a step function, where "erf" is the error function): y = a * { 1 + erf [ b * ( x + c ) ] } + d . I can fit this model to my data without issue (I've been using the "fit" function in conjunction with a custom fittype ...should I use another?). What I would like to do now, however, is to include a known uncertainty value in my fitting procedure (the uncertainty is a constant...extracted from the 95% error in the noise value prior to the step-change in my data). How can I include this known, constant-valued measurement uncertainty in the fitting and not have the fit only do its regression based on minimizing error between the data and the curve it generates, but rather to consider this measurement uncertainty value as well?
My first thought was to use a weighting matrix, but since the measurement uncertainty is constant, I think weighting isn't used for that (I'm not a stats expert, so please correct me if I'm wrong here).
0 Comments
Accepted Answer
John D'Errico
on 12 Oct 2017
Well, you COULD do this, in a sense. Not using fit however, because tools to do curve fitting do not have anything like that as an option, nor realistically, should they. And weighting will not solve the problem either.
As I said, you COULD do it. Does it make mathematical, statistical sense? No, not a lot. You need to use a maximum likelihood estimation, so it will become an optimization problem, using a tool like fminunc or fmincon, from the optimization toolbox. (No, lsqnonlin, lsqcurvefit, lsqlin all cannot solve the problem you want to solve.) fminsearch could solve the problem though, since it becomes a simple optimization.
Is there any reason you want to do this? Because I don't see any reason to formulate the problem as a maximum likelihood estimation, since it wont give you anything better out the end.
3 Comments
John D'Errico
on 12 Oct 2017
Edited: John D'Errico
on 12 Oct 2017
You have not made any valid reason for doing what you want to do, but there are good reasons for saying it is a silly idea.
The noise IS small. That seems clear from the plot and from your statement, that is it roughly 2% of the magnitude of the step itself. So I would call that amount of noise relatively small.
Even if the noise were larger, thinking that you do know the noise variance from a small sample does not say that you actually know the variance, or that it will improve the fit. All that you then have is an estimate of the variance. It won't improve the fit, and the fit will take a fair bit more effort for you to do, including custom code to do that fit.
So why do you think the noise does not show up clearly? Subtract the fitted curve from your data. Plotting the data itself with the the model in there is wrong, IF you want to look at the noise. You said:
"As the step change in the data is occurring, it seems as if the data fit a curve quite well and this change in the data doesn't really appear noisy."
You have roughly about 4 data points in the region of the curve where it goes through a transition. Since the curve has a relatively high slope there, of course you won't see much noise. The noise gets lost in the signal. Where the curve is essentially constant, then you can see the noise.
It is also true that this model has almost too much flexibility. There are virtually only 4 data points in the transition region. But your model has two parameters to estimate the shape of the curve in that transition region. Why do I say that?
y = a * { 1 + erf [ b * ( x + c ) ] } + d
Look carefully at the model.
What does d do here? When x is -inf, erf is -1. So y(-inf) = d.
Therefore d is an estimate of the lower asymptote for the model.
How about a? erf(inf) is 1. So y(inf) will be 2*a+d. Effectively, those two parameters have almost nothing to do with the shape of the curve in the transition region, since they are effectively determined by the asymptotes of the lower and upper tails.
Now you have two more parameters, b and c. You estimate those two parameters almost entirely from the shape of the curve in the transition region. Trying to estimate 2 parameters from 4 data points is almost overfitting the curve in that area.
So what happens is I would indeed expect the curve to fit the data with a little tighter precision in that region. b and c are estimates only. They will have confidence intervals around those numbers that will actually be a bit relatively wider than that around a and d. This is simply a reflection of the overfitting.
Ok, so IF this is indeed what you perceive to be the problem, will using some custom MLE estimator help here? NO! ABSOLUTELY NOT! You would be wasting your time.
The point is, the 4 parameters in your model use different parts of the data to give them their estimates. Two of those parameters will be well estimated. But the other two parameters have almost too little data where you need it, so the curve will fit just a bit more tightly, following the noise.
x = (-10:10)';
y = erf(x*.5) + randn(size(x))/25;
plot(x,y,'o')
![](https://www.mathworks.com/matlabcentral/answers/uploaded_files/185932/image.jpeg)
Ok. So I have a curve where the noise standard deviation is roughly 2% of the transition from -1 to 1. The curve itself has only about 4 data points where it really matters to estimate b and c. A couple of those points lie in the toe and shoulder of the curve, so they do have some impact on the values of b and c. But not a lot.
So lets try it out on this random data.
ft = fittype('a*(1 + erf(b * ( x + c ) ) + d)','coeff',{'a','b','c','d'})
ft =
General model:
ft(a,b,c,d,x) = a*(1 + erf(b * ( x + c ) ) + d)
[mdl,G] = fit(x,y,ft)
Warning: Start point not provided, choosing random start point.
> In curvefit.attention.Warning/throw (line 30)
In fit>iFit (line 299)
In fit (line 108)
mdl =
General model:
mdl(x) = a*(1 + erf(b * ( x + c ) ) + d)
Coefficients (with 95% confidence bounds):
a = 0.9992 (0.9787, 1.02)
b = 0.4818 (0.436, 0.5275)
c = -0.01222 (-0.1147, 0.09022)
d = -0.9988 (-1.018, -0.9792)
G =
struct with fields:
sse: 0.0231432855006534
rsquare: 0.998691255416355
dfe: 17
adjrsquare: 0.998460300489829
rmse: 0.0368967442375686
If you look at the confidence intervals around a and d, they are quite narrow, as I predicted. d is roughly -1, and 2*a+d is roughly 1.
b and c however, have MUCH wider intervals on them. This is because the information content of my data is simply too little to estimate them well. So they end up chasing the noise a bit. 4 points with two parameters equals a poor fit.
plot(x,y,'ro',x,mdl(x),'b-')
![](https://www.mathworks.com/matlabcentral/answers/uploaded_files/185933/image.jpeg)
Plotting the residuals, we see:
plot(x,y - mdl(x),'ro')
![](https://www.mathworks.com/matlabcentral/answers/uploaded_files/185935/image.jpeg)
See that the residuals are just a bit smaller for the points right near zero.
Now, would that have improved had I tried to provide explicit estimates of the noise variance, based on my samples in a tail, using acustom MLE tool? NO!!!!!!!!!!! In fact, the overall fit quality would be slightly worse. And it STILL would have overfit the transition region.
Sorry. You have not got a valid reason to do what you want. There is only one way to get a better fit in that region, and it is to get more data. Period. There is no magic bullet here to be found. And I'd argue, if I have interpreted your comment correctly, that you are worrying about something where there is no valid reason to think you can do better.
More Answers (0)
See Also
Categories
Find more on Linear and Nonlinear Regression in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!