# Fitting nonlinear noisy data

31 views (last 30 days)
Samuele Bolotta on 18 Mar 2021
Commented: Bjorn Gustavsson on 22 Mar 2021
I am fitting a function to some simulated data. The procedure works perfectly, but I would like to know if it can be made more robust to noise. When I use this amount of noise:
y = awgn(CPSC,35,'measured');
It still works very well. But if the amount of noise gets increased to:
y = awgn(CPSC,25,'measured');
In 15% of cases the fit is completely wrong.
This is the function that I use to generate the data:
function [EPSC, IPSC, CPSC, t] = generate_current(G_max_chl, G_max_glu, EGlu, EChl, Vm, tau_rise_In, tau_decay_In, tau_rise_Ex, tau_decay_Ex,tmax)
dt = 0.1; % time step duration (ms)
t = 0:dt:tmax-dt;
% Compute compound current
IPSC = ((G_max_chl) .* ((1 - exp(-t / tau_rise_In)) .* exp(-t / tau_decay_In)) * (Vm - EChl));
EPSC = ((G_max_glu) .* ((1 - exp(-t / tau_rise_Ex)) .* exp(-t / tau_decay_Ex)) * (Vm - EGlu));
CPSC = IPSC + EPSC;
end
And this is the fitting procedure:
% Values generated by simulation
[~,~,CPSC,t] = generate_current(60,40,0,-70,-30,0.44,15,0.73,3,120);
% Initial values
gmc = 90;
gmg = 90;
tde = 1;
tdi = 1;
tre = 1;
tri = 1;
% Apply white noise to the CPSC
y = awgn(CPSC,35,'measured');
% Alternatively, without noise
% y = CPSC;
%% Perform fit
[xData, yData] = prepareCurveData(t, y);
% Set up fittype and options.
ft = fittype( '((G_max_chl) .* ((1 - exp(-t / tau_rise_In)) .* exp(-t / tau_decay_In)) * (Vm - EChl)) + ((G_max_glu) .* ((1 - exp(-t / tau_rise_Ex)) .* exp(-t / tau_decay_Ex)) * (Vm - EGlu))', 'independent', 't', 'dependent', 'y' );
opts = fitoptions( 'Method', 'NonlinearLeastSquares' );
opts.Display = 'Off';
opts.Lower = [-70 0 1 1 -30 0 0 0 0];
opts.StartPoint = [-70 0 gmc gmg -30 tde tdi tre tri]; % Starting values
opts.Upper = [-70 0 150 150 -30 20 20 5 5];
[fitresult1, gof1] = fit(xData, yData, ft, opts)
%% Plot fit with data
figure( 'Name', 'Fit' );
h = plot( fitresult1, xData, yData );
legend( h, 'CPSC at -30mV', 'Fit to CPSC', 'Location', 'NorthEast', 'Interpreter', 'none');
subtitle('Realistic values')
% Label axes
xlabel( 'time', 'Interpreter', 'none' );
ylabel( 'pA', 'Interpreter', 'none' );
grid on
How can I make it more robust to noise?
Thanks!

John D'Errico on 22 Mar 2021
This is something I recall reading about many years ago. Nonlinear least squares in the presence of high noise is a classically bad problem. It tends to converge poorly. It requires very good starting values, else it will likely diverge to some meaningless result.
The fix is simple. Ok, simple is not always truly simple to achieve. It often involves one or more of these ideas, possibly all three:
1. Get better data. Yeah, I know. Not so easy some of the time. More data will not hurt either, especially if it is good .
2. Provide better starting values. Also not easy.
3. Apply intelligent constraints on the parameters to reduce the search space.
4. Use a robust solver, often an iteratively re-weighted solver, that can decrease the penalty on those large residual points to allow the solver to converge.
5. Multi-start methods are a good choice, since they improve the chance you will get one start point in the basin of attraction of the solution.
In the end, if your data is total crapola, nothing else matters but to get better data.
##### 2 CommentsShow 1 older commentHide 1 older comment
Bjorn Gustavsson on 22 Mar 2021
Here's the curves produced by the OPs function and the "very noisy" data and the best fitting weighted lsq-fit with the desired parameters as fitting variables. All the worries above are at least for the OPs example not of major importance. Alan Weiss on 18 Mar 2021
Most likely the issue is that there are multiple local minima, as in this example: Nonlinear Data-Fitting Using Several Problem-Based Approaches, especially the section Split Problem is More Robust to Initial Guess.
In that example a "split problem" approach worked. In general, you might want to use multiple start points, as in MultiStart Using lsqcurvefit or lsqnonlin.
Good luck,
Alan Weiss
MATLAB mathematical toolbox documentation
Samuele Bolotta on 22 Mar 2021
In the meanwhile I have implemented a multi-start method, and suprisingly it does not give any significant advantage over the normal fit function. Do you think it is still worth trying to split the linear and the nonlinear problems?

Bjorn Gustavsson on 18 Mar 2021
When I do this I typically write my own error-function or residual-function and then use fminsearch (or some of the general optimization-contributions on the file-exchange, like fminsearchbnd or optimize). These functions do the same job as fit as far as I understand, but with a more direct hands-on interface for the users. That way I can write my error-functions with the properly weighted contributions when I have measurements with varying standard deviation (I typically get away with the assumption that my noise is generaly normal-distributed, but if it isn't it's also possible to generalize to optimizing the liklelihood/log-likelihood for other noise-distributions). Therefore I'd solve your problem by turning away from fit and use lsqnonline (or fminsearch) instead:
function res = your_fit_residuals(pars,t,y_obs,y_std)
% Your residual-function for use with lsqnonlin
% that also takes the estimated standard deviations of your observations
% giving you a weighted least-square solution for your fit
if nargin == 3
y_std = 1; % if no std given defaults to a constant and get a standard lsq-fit
end
G_max_chl = pars(1);
tau_rise_In = pars(2);
tau_decay_In = pars(3);
Vm = pars(4);
EChl = pars(5);
G_max_glu = pars(6);
tau_rise_Ex = pars(7);
tau_decay_Ex = pars(8);
EGlu = pars(9);
y_mod = ((G_max_chl) .* ((1 - exp(-t / tau_rise_In)) .* exp(-t / tau_decay_In)) * (Vm - EChl)) + ...
((G_max_glu) .* ((1 - exp(-t / tau_rise_Ex)) .* exp(-t / tau_decay_Ex)) * (Vm - EGlu))');
res = (y_mod-y_obs)./y_std;
end
Then you call lsqnonlin like this:
Upper = [-70 0 150 150 -30 20 20 5 5];
StartPoint = [-70 0 gmc gmg -30 tde tdi tre tri]; % Starting values
Lower = [-70 0 1 1 -30 0 0 0 0];
% You have fixed the first second and fifth parameter here, I'm also unsure
% about the ordering of parameters the FIT-function uses so that you have
% to check...
sigma_y = % You have to get estimates of the uncertainty of your measurements from somewhere
[pars_lsqnonlin = lsqnonlin(@(pars) your_fit_residuals(pars,xData, yData,sigma_y),...
StartPoint, Lower,Upper);
yModel = your_fit_residuals(pars_lsqnonlin,xData, 0*yData);
With information about the variation of measurement uncertainty the fit will weight the contributions from the different data-points accordingly.
HTH
Samuele Bolotta on 22 Mar 2021
Thanks for the great input. Much appreciated!