How to obtain curve fitting tool startpoints using code? Replicate Curve Fitter Toolbox

15 views (last 30 days)
Hello,
I am trying to fit curves to time series.
I have a 100 different time series (all of different lengths) and for each of them, I know by trial and error on quite a few time series that gauss3 is a good fit.
Here's is what I have done until now. I obtain the time series and fit gauss 3 using the curve fitter toolbox GUI. I like the fits (on all the time series I tried) and so I export as code. I get the code as follows:
[xData, yData] = prepareCurveData( time_temp, state_timeseries_temp );
% Set up fittype and options.
ft = fittype( 'gauss3' );
opts = fitoptions( 'Method', 'NonlinearLeastSquares' );
opts.Display = 'Off';
opts.Lower = [-Inf -Inf 0 -Inf -Inf 0 -Inf -Inf 0];
opts.StartPoint = [2.01679210779712 262.632 7.13033226840792 0.012098277147597 242.358 16.7897888539238 0.0108815814086219 19.848 22.539819613689];
When I put this into my script, run the code and plot the fit on top of my actual time series the results (same as the curve fitter) are good.
But, the opts properties are specific to the time series I used in the curve fitter, so I remove the opts as follows and try again:
% Fit gaussian curve of degree 3
[xData, yData] = prepareCurveData( time_temp, state_timeseries_temp );
% Set up fittype and options.
ft = fittype( 'gauss3' );
opts = fitoptions( 'Method', 'NonlinearLeastSquares' );
% Fit model to data.
[fitresult, gof] = fit( xData, yData, ft, opts );
The resultant fit is not as good as before.
How can I generate these opts properties programmatically (the way curve fitter toolbox generates for each time series I feed in), since I want to do this for a lot of different time series and the specific opts values obtained using curve fitter toolbox is not going to work with other time series.
Does the curve fitter toolbox do this iteratively? If so, how can I replicate that in a script?
Basically, I want to replicate (in a script) exactly what the curve fitter applet is doing when I select a particular time series.

Accepted Answer

John D'Errico
John D'Errico on 5 Sep 2023
Edited: John D'Errico on 5 Sep 2023
I'm sorry, but there is no magic. If you do not supply start points, then fit uses random numbers. This actually has SOME amount of logic to it in that if you just arbitrarily use a fixed set of numbers, all of which are equal, say all ones, then the curve fitting process for SOME models, can fail due to a singularity.
Again, there is no magical formula to find good start points for a completely general model. There simply is not. For SOME models, and SOME data sets, you can come up with intelligent starting values, but that usually requires understanding both the data AND the model.
A problem is that since the start points are just random numbers, then SOME of the time, they will be good choices. But that is not always the case. I'm sorry, but this is just life. Understanding your model is CRUCIAL. That is, understanding what the model parameters mean, and how they impact the shape of the model.
Can you do something better than just random numbers? Well, yes. But better often involves time. Here that is CPU time. You can use tools to perform multiple starts from randomly selected sets of starting parameters. Fit the model repeatedly. Then choose the best fit that results. This scheme will work acceptably, much of the time. But even then, you will need to provide an intelligent region to choose the random start points from. For xample, if you always choose start points randomly from the interval [0,1], then on SOME models, for SOME data, this will virtually always result in failed fits, simply because of poorly scaled numerics where the problem generates underflows or overflows becuase it is workign in double precision arithmetic. (This is an easy example to see, but sorry, I won't spend the time to give an example. That would make this answer far too long.)
Is there anything more that can be done? Well, YES. You can use a better solver. That would be one of the form of my fminspleas code, as found on the file exchange. This will improve the robustness of the problem to poor starting values, since it reduces the dimensionality of the search space. Essentially, no starting values will be needed at all for some of those parameters. Do you need to understand how that process works? It will help, to understand the concept of conditionally linear parameters, versus intrinsically nonlinear parameters. And even then, you will still need to provide start points for SOME of the parameters. Yes, you could use a random multi-start code on top of that.
Again, I'm sorry, but there simply is no magical formula you can use.
  2 Comments
atharva aalok
atharva aalok on 5 Sep 2023
I do NOT want the start points through a magic method.
I just wish to replicate the way the curve fitter toolbox arrives at its StartPoints. I am pretty happy with the StartPoints that the curve fitter toolbox arrives at and the subsequent curve that it fits.
How do I replicate in a script the process that the curve fitter toolbox uses?
John D'Errico
John D'Errico on 5 Sep 2023
Edited: John D'Errico on 5 Sep 2023
Did I not answer that? YES. It uses RANDOM numbers. How do you replicate that?
help rand
In fact, I recall it uses rand itself. And that means SOME of the time you will be happy. And some of the time you will be back here, complaining that it did poorly and wanting to know how to do better.
I DID explain very clearly how to do better. You can use multi-start methods, on top of a tool like fminspleas to almost always gain a reasonable best fit, when that fit does exist. Will even that fail some times? Of course. I can easily make up a test case where it will, as long as I know what process you are using. This is the nature of numerical methods. They can always be fooled by someone who understands both the methods employed, and the mathematics involved.

Sign in to comment.

More Answers (2)

Mrutyunjaya Hiremath
Mrutyunjaya Hiremath on 5 Sep 2023
% Prepare data
[xData, yData] = prepareCurveData( time_temp, state_timeseries_temp );
% Set up fittype
ft = fittype( 'gauss3' );
% Initialize options
opts = fitoptions( 'Method', 'NonlinearLeastSquares' );
% Data-driven initialization (Example: using mean and std)
mu_init = mean(yData);
sigma_init = std(yData);
% Set the StartPoint (modify this according to your specific model)
opts.StartPoint = [mu_init, 200, sigma_init, 0.1, 100, 10, 0.1, 20, 20];
% Fit model to data
[fitresult, gof] = fit( xData, yData, ft, opts );
In the above example, I've used mean and standard deviation as initial points for some parameters.
If you want, you can adjust this based on the specific characteristics of your data and what each parameter in the gauss3 model represents.
  2 Comments
Mrutyunjaya Hiremath
Mrutyunjaya Hiremath on 6 Sep 2023
Replicating that is quite CHALLENGING.
The curve fitting toolbox in software like MATLAB or similar computational environments often provides options to automatically select initial "StartPoints" for curve fitting algorithms. However, the methods used for determining these starting points can be quite varied and are generally heuristic in nature. The algorithm's choice of initial points might depend on several factors including, but not limited to:
  1. Data Range: The range of the data set can be used to set initial parameter estimates. For example, for exponential fits, an initial guess for the rate parameter might be set based on the decay observed in the data.
  2. Data Moments: Statistical moments like mean, variance, etc., can provide good initial estimates for some types of functions.
  3. Random Sampling: Some algorithms might use Monte Carlo methods to sample the parameter space as a way of choosing an initial point.
  4. User-Specified: In many cases, the user has the option to specify initial points based on prior knowledge or intuition about the system being modeled.
  5. Built-in Heuristics: For certain types of commonly-used functions (like Gaussian, Lorentzian, etc.), the software might have built-in heuristics for choosing initial parameters that work well in a general sense for those functions.
  6. Linearization: For some types of nonlinear functions, a linear approximation can be made to estimate initial parameters, which are then refined through the nonlinear fitting process.
  7. Previous Fits: If multiple fits are being conducted on similar types of data, some algorithms may use the parameters from a previous fit as the initial guess for the next fit.
  8. Optimization Algorithms: Some curve fitting tools might run a preliminary optimization routine to find a "good enough" initial guess for the parameters.
  9. Statistical Methods: Methods like least squares estimates, maximum likelihood estimates, or Bayesian methods might be used to arrive at an initial guess.
  10. Analytical Solutions: For simpler models, analytical solutions might exist that can provide exact initial estimates based on the data.
It's worth noting that the quality of the initial guess can significantly impact the performance of the curve fitting algorithm, especially for non-linear models. Poor initial guesses can result in the algorithm converging to a local minimum rather than the global minimum.

Sign in to comment.


Steven Lord
Steven Lord on 5 Sep 2023
Under certain circumstances (when the Method is NonlinearLeastSquares and you're using certain library models) MATLAB uses heuristics to generate starting points as stated in the description of the StartPoint name-value argument for the fit function. I don't know if we expose the functions that compute those heuristics to be called by user code and if we do I don't know the names by which you'd call those heuristic functions. In other circumstances I believe the start points are generated randomly.
You may want to contact Technical Support directly using this link to request as an enhancement that Curve Fitting Toolbox expose those heuristic functions.

Products


Release

R2022a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!