Three terms exponential fit without initial guess

Is there a possibility to fit data (enough data points, thousands of data points) with 3-term exponential fit without initial guesses ? If not, is there a way of having an idea about the initial guesses, mathematically, without prior knowledge on the physical model (just to be safe and unbiased by what we want to get).
Many thanks!

2 Comments

Theoretically, the use of global optimization algorithms will not rely on the guessing of initial value to obtain a global optimal solution, such as Matlab's genetic algorithm toolbox, of course, unfortunately, the actual results are far from what was expected,at least so far.
Theoretically, the use of global optimization algorithms will not rely on the guessing of initial value to obtain a global optimal solution, such as Matlab's genetic algorithm toolbox, of course, unfortunately, the actual results are far from what was expected,at least so far.

Sign in to comment.

 Accepted Answer

There are a lot of misconceptions here.
  1. Can you do a fit without ANY initial guesses? No. Any nonlinear fit will use initial guesses. The ones that do not explicitly require a start point, like fit, still use guesses, but they supply random numbers as guesses. And that can be actively a bad thing. (I'll discuss that later.)
  2. You CAN use tools like GA. But even there you will need to have some intelligent set of bounds for the parameters, else the exponentials will just arbitrarily overflow or underflow. Either case is an actively bad thing in a fit.
  3. You can use tools like a partitioned least squares solver. (I'll show an example of that too.) These tools are more robust to bad starting values, they converge more stably,, etc. But they will still require some starting values.
A nonlinear least squares fit is just a search routine. You need to start it looking in some intelligent place. And exponentials are notorious for being nasty things, IF you fail to treat them properly. Again, remember that a fit is JUST a search. Think of it as putting a blind person down on the face of the earth, and tasking them with finding the point of lowest elevation. Give them a cane, to figure out which way is down hill. Also an altimeter (with spoken output), to figure out how high they are. And finally, give the poor fellow some scuba gear, as they will need it.
The point is, a nonlinear fit is little more intelligent that that. Start the fellow out in the vicinity of the Dead Sea, and expect he will get wet, but not that he will find a point in the bottom of the Pacific ocean.
Since you have supplied no data, I'll need to make up some data.
x = 1 + rand(500,1)*100;
y = 200 ./ x + randn(size(x));
plot(x,y,'.')
Yes. I know that data is NOT in fact a sum of three exponentials. But it has the necessary shape. And most of the time people are just guessing how many terms they need in such a model anyway. Much of the time, they don't even really know the true model. Too bad. This is my data. It looks a lot like any other exponential model data.
So let me see if I can fit this data as a sum of three exponentials. We can see some problems you might run into.
mdl = fittype('a*exp(-b*x) + c*exp(-d*x) + e*exp(-f*x)','indep','x')
mdl =
General model: mdl(a,b,c,d,e,f,x) = a*exp(-b*x) + c*exp(-d*x) + e*exp(-f*x)
First, look at the data. remember that exponential data often looks like that. The problem is, the points at the left end will carry the most influence, dragging the rest of the curve around. Any error there will have a huge impact on the fit.
Next, remember that the magnitude of your independent variable will be hugely important. If x is too large, then the exponential rate parameters can easily be large enough, or small enough in comparison, that you will see overflows in the exponential or underflows. And then the fit will become complete crap. For example, the data I gave here has x on the order of 0-100. Say an average value of 50. If, given a model with terms like
a*exp(-b*x)
if the starting estimate for b is on the order of 100, then ALL of your data will underflow at that start point. Your fit will probably fail to converge.
As well, you NEED to know the sign of the rate constants. Is this a positive or negative exponential model? The problem is that the optimization routine will have a great deal of difficulty in switching the signs of the rate constants. So it you provide a model as
a*exp(b*x)
and you give it the wrong sign for b as a starting value, the solver will usually fail to converge to something reasonable. Here, the problem is that exp(-x) and exp(x) have fundamentally different shapes. But since solvers don't really understand mathematics, they just fail to converge.
fittedmdl = fit(x,y,mdl)
Warning: Start point not provided, choosing random start point.
fittedmdl =
General model: fittedmdl(x) = a*exp(-b*x) + c*exp(-d*x) + e*exp(-f*x) Coefficients (with 95% confidence bounds): a = 346.6 (329.2, 364) b = 1.058 (0.9984, 1.118) c = 11.61 (10.58, 12.64) d = 0.01996 (0.01824, 0.02168) e = 69.76 (64.57, 74.95) f = 0.1848 (0.1712, 0.1984)
plot(fittedmdl)
hold on
plot(x,y,'b.')
hold off
I tested out this fit multiple times, allowing fit to choose its own starting values. It actually did surprisingly well on this specific data, even though the data is not at all a sum of exponentials. Sometimes it failed to converge. That happened a little less than half the time here in my play as I was writing this.
Another problem is that sums of exponentials are themselves difficult problems to estimate. I saw that the Curve Fitting Toolbox did not have an 'exp3' model built in as an option. That is a good thing, because sums of exponentials often resut in poorly posed problems in the optimization. One would see that in the form of multiple singular or nearly singular matrix warnings, telling you the solver is having grave numerical problems.
Ok, can you find good starting values for such a model? SOMETIMES this is doable. Doing it automatically is not easy though. You could do things like take the log of your data. Since the log of an exponential model is now linear in the unknown coefficients, you could see if the result is a inear fit. That is if we have
y = a*exp(b*x)
then
log(y) = log(a) + b*x
Essentially, we now would hope to see a straight line fit, and the slope of that line would be a good estimate of your rate constant. Typically, I do this using a semi-logy plot. If the curve looks nice and liner, then an exponential model will be a good idea. Sadly, this fails for a sum of exponentials, since the log of a sum is nothing special.
semilogy(x,y,'.')
Warning: Negative data ignored
What does that tell me? It tells me that a pure exponential fit must fail to fit well, since the curve is not a straight line. But maybe, if we have a sum of several terms it may work. The semi-logy plot also points out the problem with trying to model process with additive noise, and then taking the log.
Finally, you can use a partitioned nonlinear least squares solver. I have one of them posted on the file exchange, in the form of fminspleas. This tool needs to iterate on only the nonlinear parameters in the problem, so given a model like:
a*exp(-b*x) + c*exp(-d*x) + e*exp(-f*x)
the solver works differently on the intrinsically nonlinear parameters b,d, and f, cpompared to the conditionally linear parameters a,c, and e. Essentially, the solver needs to search only in a 3-dimensional parameter space instead of a 6-dimensional parameter space. And this makes the solve more efficient, more robustly convergent, etc. You can find fminspleas on the File Exchange for free download.
In the end though, a nonlinear least squares fit is often no better than the quality of the starting values you can provide.

7 Comments

Thank you very much. It is very helpful.
I have tested it on my data and since it starts from random initial points, I get the fitting correct once in few separate runs.I could constraint the fitting by lower and upper limits once I get a reasonably good fitting, and with the new contraints it helps getting very good (excellent one) fitting, so I am thinking of a better automation of this process.
I am then thinking whether it is possible to implement a Monte Carlo simulation to run the fitting multiple times and estimate the minimum in order to get the best parameters. Can you please help or redirect to some useful reference ?
Many thanks again.
Question: is your sum of exponentials the sum of a*exp(b*x) terms, or is it the sum of a*exp(b*(x-c)^2) terms, also known as the sum of Gaussian terms?
The sum of Gaussians is difficult to fit using normal optimizers, very often getting trapped in a way that assigns most of the weight to one of the terms and making the other terms effectively linear or noise.
Yes, the model is the sum of a*exp(b*x) terms, not Gaussian terms.
Honestly, I think it is a bad idea. It MAY work, until no set of random start points converges. Then you are stuck, and your code will just keep on trying random lottery tickets,hoping to see some starting point work.) And, because fit does not make very intelligent choices for the start points, you may just always get stuck on some problems.
I'd suggest a simple scheme. First, use my fminspleas. If there is a chance to converge, that code will have a better chance, since it requires no starting values at all for the three linear parameters in the model.
But first, I might better your odds. You supply no sample data. But a simple idea is to assume there is one dominant term in the sum. Estimate the coefficients of that term, by use of a linearization. That is, transform a ONE term model to:
log(y) = log(x)*b + log(a)
Now use a simple linear least squares, i.e., polyfit with a linear model, where the variables are now log(x) and log(y). Estimate the rate parameter from that fit. This will get you in the right ball park for the rate parameter, and since it is polyfit, it will always converge to something reasonable. All you really care about is the linear coefficient in that model at this point. (If you have a problem wit nasty outliers, as I mention below, that will fail. And while you might consider using robustfit instead of polyfit, the later nonlinear least squares will still probably fail. Large residual nonlinear least squares is a NASTY problem to solve, and doing so with a sum of exponentials model is asking for trouble. I really do need to see some data to offer better advice.)
Next, assume the other two exponential terms have rate parameters that are similar in magnitude to the first term. Choose some similar numbers to the dominant term, plus or minus 20%. At this point, you are now in the right ball park, so convergence should work, unless your data is complete crapola. If it is that, then nothing will help. Sadly, we still have seen no data at all from you, so it is difficult to know.) Finally, I would now use my fminspleas tool, fitting the model with three exponential terms, now using those starting values.
Note that when I talk about bad data, I mean data that has large residuals. So a couple of large outliers in your data will render any such fit completely impossible almost always. Another source of bad data is just too little data. As I have said multiple times, sums of exponentisl models are a problem always. They converge poorly. They often diverge due to a multitude of problems, so too little data will get you into trouble, and I too often see people think they need only 6 data points to estimate 6 parameters.
Again, if you show some data, I might be able to offer more help.
Many thanks for these ideas. Yes sure please find the data in this excel file: https://docs.google.com/spreadsheets/d/1PoDzJcYLgEiTNHuzjy5fwZdI21XJTNKlI6HNE6NH4HE/edit?usp=sharing
The experimental signal (columns 3 and 4) is a combination (convolution) of an impulse response of the detector (measured experimentaly and given in columns 1 and 2) with the real signal which should have the model of 3-exponential terms to be fitted.
I have attached the detector response (something that I will also need to figure out how to convolute with the fitting, once this latter is handled). If you want to make it simple initially, you can just fit on the data starting after index = 4000 (before is dominated by the detector response seen as a spike).
Many thanks again for your kind follow-up and help.
In that file, I find 4 columns of data.
load thisdata
plot(data(:,1),data(:,2))
plot(data(:,3),data(:,4))
So you want to find a model for that second curve as a sum of three exponential terms, if we ignore the first 4000 points before anything starts to happen? Something like this?
ind = 4000:size(data,1);
plot(data(ind,3),data(ind,4),'-')
Well, I can say almost yes. The model should normally be a 3-exponential terms, but as you can see the real signal is actually a bit more complex where it contains an effect of the impulse response.
If we ignore the that "spike" in the first 4000 data, what would be the best fitting algorithm to that gives the best fit ? I tried using the model you gave me first, but needed to run several times untill I got roughy an acceptable fit in its shape, then I used the outputs to rewrite the script with upper and lower options, so with this I can get immediately the best fit.
% model:
mdl = fittype('a*exp(-b*x) + c*exp(-d*x) + e*exp(-f*x)','indep','x')
% options obtained after several runs till a convergence to an acceptable shape of the fit
options_Exp = fitoptions(mdl);
options_Exp.Lower = [-0.02544 0.2425 -63.85 1.273 0.01355 0.001544];
options_Exp.Upper = [-0.02425 0.2468 -27.06 1.378 0.01356 0.001571];
% It would have been probably better to give start points instead of upper and lower limits ?
fittedmdl = fit(x,y,mdl,options_Exp)
figure
plot(fittedmdl)
hold on
plot(x,y,'b.')
hold off
So I thought whether your algorithm fminspleas or some MonteCarlo simullations can be a better alternative to this process (manual runs for several times).
Then last, if you could refer me to some questions useful for fitting and convolutting with the impulse signal (response of the detector) that would be great and would ve very grateful. (I'm also wondering whether it can be better to try deconvolve the signal first then fit 3-terms exponential, then reconvolve to fit the real signal.

Sign in to comment.

More Answers (1)

Yes, it is possible to get such fit. Here is an initial draft code syntax:
Xdata = ...
Ydata = ...
MODEL = fittype( @(a,b,c,d,f, g, x) (a*exp(b*x)+c*exp(d*x)+f*exp(g*x)), ...
'indep','x', 'coeff', {'a', 'b', 'c', 'd', 'f', 'g'} );
[FITRESULTS, GF] = fit(Xdata, Ydata, MODEL);
plot(FITRESULTS, Xdata, Ydata)

1 Comment

I'm sorry, but that still uses initial guesses, but just that fit supplies comletely random guesses of its own. And that is often a bad idea.

Sign in to comment.

Categories

Products

Asked:

on 17 Aug 2022

Edited:

on 20 Aug 2022

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!