Multi-dimensional Fitting

I have 3 parameters for my function. Let's say f(x,y,z)=(a*x+b)*exp(-y/c)*(z^2+d) (Or i have n parameters).
I constructed the custom function because I know the behavior of the function (that I desire).
I have many samples(around 5000). For example f(1000,10,2)= 35;
Is there a method to fit these samples into a shape,solid (for 3 parameters case) ?
Or is there a method that to find the coefficients (a,b,c,d in this case) for my custom function using all my samples?
The answer doesnt have to be a specific for 3 parameter case, i need actually a solution for n parameters case.
(I know Matlab has curve fitting and surface fitting tools but no more dimensions.)
Any support will help me, thanks.

 Accepted Answer

Walter Roberson
Walter Roberson on 25 Dec 2019
You cannot use cftool it supports at most two independent variables and on dependent variable. You can, however, use your custom equation with fit() from the Curve Fitting Toolbox, or you could use nonlinear least squares https://www.mathworks.com/help/optim/nonlinear-least-squares-curve-fitting.html
fit() is often surprisingly efficient at what it does, and often generates coefficients that are close enough to optimal to be "good enough" for practical purposes.
However, in equations that have two or more major basins of attraction, fit() will typically go with the larger basin even when the smaller basin has a significantly better fit. Adding upper and lower bounds on the "reasonable" values of parameters can help a lot.
I would predict that the in your sample equation, the c value would tend to vary a lot. You do not have have an additive constant so the fitter would tend try to raise or lower the overall height by driving c large, but large values of c lead to small -y/c leading to near 1 value of exp() and relatively large changes in c have small effects, making it difficult to locate the best c value. This is a common problem for equations with exp() and a multiplicative coefficient inside.

5 Comments

I wrote
%generate scattered points
x = rand(1,50)*10;
y = rand(1,50)*10-5;
z = rand(1,50)*2-1;
xyz = [x.', y.', z.'];
%create a fitting function
fun = @(C,xyz)(C(1)*xyz(:,1)+C(2)).*exp(-xyz(:,2)/500).*(1-C(3)./(C(4)+exp(-xyz(:,3)/C(5))))
%now generate points according to the function, adding a little
%noise to reflect uncertainty in measurements and round-off error
C = randn(1,5);
F = fun(C,xyz) + randn(50,1)*1e-10;
%now do some fitting
opt = optimoptions('lsqcurvefit', 'MaxFunctionEvaluations', 2000);
lsqcurvefit(fun,rand(1,5),xyz,F, [],[],opt)
Unfortunately the lsqcurvefit results were all over the place depending on what the initial value rand(1,5) turned out to be. Not reliable results at all.
The generated C that was used to populate the F values was
1.392207870887926 2.473900289376266 0.5039192426908656 -0.8224805538305699 0.2009818706463425
I then constructed
fvec = @(C) sum((fun(C,xyz) - F).^2);
and ran it through a custom optimizer that I wrote. The residue (sum of squared differences) with the known C values was 4.59715449612088e-19 -- any residue better than that would be "overfitting" -- fitting to the random noise.
After a fair bit of processing with my custom optimizer, the three best trial C values I found were
1.87327276773823e-18 @ [2.24518893244395 3.98961510448558 -0.461913715132666 -1.21583421679435 -0.200981870650839]
2.26305801333892e-18 @ [2.24518893250953 3.9896151040961 -0.461913715128887 -1.21583421679739 -0.200981870648803]
2.30133630608853e-18 @ [1.39220787088086 2.47390028942311 0.503919242681713 -0.822480553836453 0.20098187065357]
The third of those points has about 9 digit agreement with the actual values used to generate the points, so we can see that my fitting process was working pretty well. But notice that the two best points differ significantly.
If we compare the ideal,
/ y \ / 391871678034571 x 2785364105346679 \ / 2269450513607405 \
-exp| - --- | | ----------------- + ---------------- | | -------------------------------------------------------------------- - 1 |
\ 500 / \ 281474976710656 1125899906842624 / | / / 18014398509481984 z \ 1852061557875417 \ |
| | exp| - ------------------- | - ---------------- | 4503599627370496 |
\ \ \ 3620567511004373 / 2251799813685248 / /
to the best point found in my search,
/ y \ / 5055716019765459 x 8983814548956485 \ / 1040137217674399 \
exp| - --- | | ------------------ + ---------------- | | ------------------------------------------------------------------ + 1 |
\ 500 / \ 2251799813685248 2251799813685248 / | / / 18014398509481984 z \ 2737815262849665 \ |
| | exp| ------------------- | - ---------------- | 2251799813685248 |
\ \ \ 3620567511085367 / 2251799813685248 / /
we observe a overall sign difference, but that is partly negated by the different sign of the last term: -(termA-1) expands to (1-termA) which compares to (1+termB) , We also observe a much more serious sign difference on the z coefficient of the exp(). Substituting in the first of my random points,
xyz = [7.0223663266789 2.55914120907961 -0.411394746307526]
then the formula at the known generation values would predict
/ 576267369779971 \ / 2269450513607405 \
exp| - ------------------ | | ------------------------------------------------------------------------------ - 1 | 1941168252945175763659330182021
\ 112589990684262400 / | / 1852061557875417 \ |
| 4503599627370496 | exp(7411028904691008/3620567511004373) - ---------------- | |
\ \ 2251799813685248 / /
- --------------------------------------------------------------------------------------------------------------------------------------------------
158456325028528675187087900672
and the formula at the best point I found would predict
/ 576267369779971 \ / 1040137217674399 \
exp| - ------------------ | | ----------------------------------------------------------------- + 1 | 25043900806793194606380187720381
\ 112589990684262400 / | / / 7411028904691008 \ 2737815262849665 \ |
| 2251799813685248 | exp| - ---------------- | - ---------------- | |
\ \ \ 3620567511085367 / 2251799813685248 / /
--------------------------------------------------------------------------------------------------------------------------------------
1267650600228229401496703205376
For the known generation values, the terms to be multiplied come out as
[ 0.99489479367074606620566129200363, -0.92719579975957747573178766456934, -12.250493961636970947965676803273]
and for the best point found in my search, they come out as
[ 0.99489479367074606620566129200363, 0.57494016275805730759843536485652, 19.756154260712107376891803209107]
which multiply out to nearly exactly the same thing.
This is the sort of problem I referred to before, that when you have exp() involved, it is pretty common for the minimizers to have difficulty resolving between two different configurations.
fmincon with a seed point of [1 1 -1 -1 -1] and 10000 iterations, does a respectable job of getting near-ish to the best point found in the search, and with a seed point of [1 1 1 -1 1] does a respectable job of getting near-ish to the known generating points.
For that matter, lsqcurvefit() does a very good job of getting quite close to those values given those respective seed points -- enough so that despite my earlier exploration, I would recommend lsqcurvefit if you can provide a reasonable starting point -- but it can do a quite bad job if your starting point is in the wrong area.
... Though as I demonstrated above, the coefficients that best fit the points might turn out to be quite different than expected, because of the mathematics of the equation to be fit.
In general, if you have equations with multiple terms multiplied together, and you do not constrain all but one of the terms to be a specific sign, then you are at risk of there being multiple solutions.
(a*x+b) is not constrained as to sign: (-a)*x+(-b) is -(a*x+b) so if one of the other terms can be made negative you will have multiple solutions.
Can (1-c/(d+exp(-z/e))) be made negative through a transformation of values? My checks suggest that there is no direct negative, but that Yes, you can flip signs leading to a range in the negative of the other range; although it might not be the direct negative, it could potentially be balanced in a least-squared sense by changes to a and b.
Actually the function that I desire will not give negative results at all. First term with the variable x is an increasing linear function; second term with the variable y is a decaying exponential function (must go to nearly zero at some point i.e. y=3000), for the last term with the variable z is a '-Sigmoid' function, it has same value from z=0 to z=some_value than it must be decreased for a short time and it has another value after that point. All my result give positive values at least nearly 0 in some point y=3000.
I will use my dataset in lsqcurvefit to find a behaviour that I desire. Maybe i can set some of c,d,e coefficients to make the lsqcurvefits' job easy. When I am done I let you know how it works.
Thank you for your efforts.
If the first term cannot be have a negative coefficient then put in bounds on the values.
With the way that you show in lsqcurvefit for each variable i created a vector than I gather them and it worked.
xyz = [x.', y.', z.'];
my final equation like this:
(c1*x+c2).*exp(-y/500).*(1-c3./(c4+exp(-15*(z-0.5))))
lsqcurvefit gave me
c1=13.5195
c2=162.8643
c3=7.8244e-09
c4=1.0062e-08
These values solve my problem perfectly.
Additionally, I set lower band [0;0;0;0] to avoid possible errors.
Now, I have a function that includes linearity, exponentiality and sigmoid property.
Thank you for everything.
See you later.

Sign in to comment.

More Answers (2)

Muhammet Dabak
Muhammet Dabak on 25 Dec 2019
Edited: Walter Roberson on 26 Dec 2019
You said I cannot use cftool because it allows 2 variables most, that is ok. But, 'use your custom equation with fit() from the Curve Fitting Toolbox' , here is the fit definition in Matlab: Fit a curve or surface to data. Means no variables more than 2.
lsqcurvefit and lsqnonlin can be used if I have just x as a variable(1 parameter).
I couldn't get what you mean by these.
And my exact function is
f(x,y,z)=(a*x+b)*exp(-y/500)*(1-c/(d+exp(-z/e))); %[ 3 unknowns,5 coefficients]
I can set some coefficients by trying , that is not the deal actually(thanks for consideration in the sample equation).
I need a way to find coefficients for at least 3 unknowns.
If you see a way to do it just give me a very simple example so I could understand you.(3unknowns,3 coefficients
for ex. f(x,y,z)=(x+a)(y+b)(z+c);
Thanks anyway.
//Edit//
Note: I tried 'polyfitn' and 'regress' functions but they supply just polynomial and linear expressions ,however, i have exponential terms. One can use these functions if your function does not contain exponential expressions. 'polyfitn' produces n variables-m degree expression for you. But you have to download it.
// This note is written for people who has similar issues about this topic.//

1 Comment

You are right, fit() cannot be used for 3 or more independent variables. You will need to use a nonlinear least squares such as https://www.mathworks.com/help/optim/ug/lsqcurvefit.html .
Unfortunately with the random data I generated, lsqcurvefit did not do a good job. I am experimenting further.

Sign in to comment.

With 'polyfitn' function describing a function as a summation of all variables in the degree that you choose.
For example:
p=polyfitn([x,y,z],t,'constant x^2 y^3 z z.*x')
formula=polyn2sym(p);
Which gives
formula= c1*x^2 + c2*y^3 + c3*z + c4*z*x + c5
Which is very useful if you can describe your function as a summation and if it just includes polynomial and linear components. If I cannot find a way to build a custom equation for my case, I will try to convert my function to eliminate exponential terms and try again.
Thank you.

Categories

Products

Release

R2018b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!