Multi-dimensional Fitting

Question

Muhammet Dabak on 25 Dec 2019

0
Link

Direct link to this question

https://nl.mathworks.com/matlabcentral/answers/497977-multi-dimensional-fitting

Commented: Muhammet Dabak on 27 Dec 2019

Accepted Answer: Walter Roberson

I have 3 parameters for my function. Let's say f(x,y,z)=(a*x+b)*exp(-y/c)*(z^2+d) (Or i have n parameters).

I constructed the custom function because I know the behavior of the function (that I desire).

I have many samples(around 5000). For example f(1000,10,2)= 35;

Is there a method to fit these samples into a shape,solid (for 3 parameters case) ?

Or is there a method that to find the coefficients (a,b,c,d in this case) for my custom function using all my samples?

The answer doesnt have to be a specific for 3 parameter case, i need actually a solution for n parameters case.

(I know Matlab has curve fitting and surface fitting tools but no more dimensions.)

Any support will help me, thanks.

0 Comments
Show -2 older commentsHide -2 older comments

Sign in to comment.

Sign in to answer this question.

Answer 1

Walter Roberson on 25 Dec 2019

1
Link

Direct link to this answer

https://nl.mathworks.com/matlabcentral/answers/497977-multi-dimensional-fitting#answer_407670

You cannot use cftool it supports at most two independent variables and on dependent variable. You can, however, use your custom equation with fit() from the Curve Fitting Toolbox, or you could use nonlinear least squares https://www.mathworks.com/help/optim/nonlinear-least-squares-curve-fitting.html

fit() is often surprisingly efficient at what it does, and often generates coefficients that are close enough to optimal to be "good enough" for practical purposes.

However, in equations that have two or more major basins of attraction, fit() will typically go with the larger basin even when the smaller basin has a significantly better fit. Adding upper and lower bounds on the "reasonable" values of parameters can help a lot.

I would predict that the in your sample equation, the c value would tend to vary a lot. You do not have have an additive constant so the fitter would tend try to raise or lower the overall height by driving c large, but large values of c lead to small -y/c leading to near 1 value of exp() and relatively large changes in c have small effects, making it difficult to locate the best c value. This is a common problem for equations with exp() and a multiplicative coefficient inside.

5 Comments
Show 3 older commentsHide 3 older comments

Walter Roberson on 26 Dec 2019

Open in MATLAB Online

I wrote

%generate scattered points
x = rand(1,50)*10;
y = rand(1,50)*10-5;
z = rand(1,50)*2-1;
xyz = [x.', y.', z.'];
%create a fitting function
fun =  @(C,xyz)(C(1)*xyz(:,1)+C(2)).*exp(-xyz(:,2)/500).*(1-C(3)./(C(4)+exp(-xyz(:,3)/C(5))))
%now generate points according to the function, adding a little
%noise to reflect uncertainty in measurements and round-off error
C = randn(1,5);  
F = fun(C,xyz) + randn(50,1)*1e-10;
%now do some fitting
opt = optimoptions('lsqcurvefit', 'MaxFunctionEvaluations', 2000);
lsqcurvefit(fun,rand(1,5),xyz,F, [],[],opt)

Unfortunately the lsqcurvefit results were all over the place depending on what the initial value rand(1,5) turned out to be. Not reliable results at all.

The generated C that was used to populate the F values was

1.392207870887926  2.473900289376266 0.5039192426908656 -0.8224805538305699 0.2009818706463425

I then constructed

fvec = @(C) sum((fun(C,xyz) - F).^2);

and ran it through a custom optimizer that I wrote. The residue (sum of squared differences) with the known C values was 4.59715449612088e-19 -- any residue better than that would be "overfitting" -- fitting to the random noise.

After a fair bit of processing with my custom optimizer, the three best trial C values I found were

87327276773823e-18 @ [2.24518893244395          3.98961510448558        -0.461913715132666         -1.21583421679435        -0.200981870650839]
26305801333892e-18 @ [2.24518893250953           3.9896151040961        -0.461913715128887         -1.21583421679739        -0.200981870648803]
30133630608853e-18 @ [1.39220787088086          2.47390028942311         0.503919242681713        -0.822480553836453          0.20098187065357]

The third of those points has about 9 digit agreement with the actual values used to generate the points, so we can see that my fitting process was working pretty well. But notice that the two best points differ significantly.

If we compare the ideal,

    /    y  \ / 391871678034571 x   2785364105346679 \ /                           2269450513607405                               \
-exp| - --- | | ----------------- + ---------------- | | -------------------------------------------------------------------- - 1 |
    \   500 / \  281474976710656    1125899906842624 / | /    /   18014398509481984 z \   1852061557875417 \                      |
                                                       | | exp| - ------------------- | - ---------------- | 4503599627370496     |
                                                       \ \    \     3620567511004373  /   2251799813685248 /                      /

to the best point found in my search,

    /    y  \ / 5055716019765459 x   8983814548956485 \ /                          1040137217674399                              \
exp| - --- | | ------------------ + ---------------- | | ------------------------------------------------------------------ + 1 |
   \   500 / \  2251799813685248    2251799813685248 / | /    / 18014398509481984 z \   2737815262849665 \                      |
                                                       | | exp| ------------------- | - ---------------- | 2251799813685248     |
                                                       \ \    \   3620567511085367  /   2251799813685248 /                      /
                                                      

we observe a overall sign difference, but that is partly negated by the different sign of the last term: -(termA-1) expands to (1-termA) which compares to (1+termB) , We also observe a much more serious sign difference on the z coefficient of the exp(). Substituting in the first of my random points,

xyz = [7.0223663266789 2.55914120907961 -0.411394746307526]

then the formula at the known generation values would predict

     /     576267369779971  \ /                                2269450513607405                                    \
  exp| - ------------------ | | ------------------------------------------------------------------------------ - 1 | 1941168252945175763659330182021
     \   112589990684262400 / |                  /                                          1852061557875417 \     |
                              | 4503599627370496 | exp(7411028904691008/3620567511004373) - ---------------- |     |
                              \                  \                                          2251799813685248 /     /
- --------------------------------------------------------------------------------------------------------------------------------------------------
                                                            158456325028528675187087900672

and the formula at the best point I found would predict

   /     576267369779971  \ /                          1040137217674399                             \
exp| - ------------------ | | ----------------------------------------------------------------- + 1 | 25043900806793194606380187720381
   \   112589990684262400 / |                  /    /   7411028904691008 \   2737815262849665 \     |
                            | 2251799813685248 | exp| - ---------------- | - ---------------- |     |
                            \                  \    \   3620567511085367 /   2251799813685248 /     /
--------------------------------------------------------------------------------------------------------------------------------------
                                                    1267650600228229401496703205376

For the known generation values, the terms to be multiplied come out as

[ 0.99489479367074606620566129200363, -0.92719579975957747573178766456934, -12.250493961636970947965676803273]

and for the best point found in my search, they come out as

[ 0.99489479367074606620566129200363, 0.57494016275805730759843536485652, 19.756154260712107376891803209107]

which multiply out to nearly exactly the same thing.

This is the sort of problem I referred to before, that when you have exp() involved, it is pretty common for the minimizers to have difficulty resolving between two different configurations.

fmincon with a seed point of [1 1 -1 -1 -1] and 10000 iterations, does a respectable job of getting near-ish to the best point found in the search, and with a seed point of [1 1 1 -1 1] does a respectable job of getting near-ish to the known generating points.

For that matter, lsqcurvefit() does a very good job of getting quite close to those values given those respective seed points -- enough so that despite my earlier exploration, I would recommend lsqcurvefit if you can provide a reasonable starting point -- but it can do a quite bad job if your starting point is in the wrong area.

... Though as I demonstrated above, the coefficients that best fit the points might turn out to be quite different than expected, because of the mathematics of the equation to be fit.

Walter Roberson on 27 Dec 2019

If the first term cannot be have a negative coefficient then put in bounds on the values.

Muhammet Dabak on 27 Dec 2019

Open in MATLAB Online

With the way that you show in lsqcurvefit for each variable i created a vector than I gather them and it worked.

xyz = [x.', y.', z.'];

my final equation like this:

(c1*x+c2).*exp(-y/500).*(1-c3./(c4+exp(-15*(z-0.5))))

lsqcurvefit gave me

c1=13.5195
c2=162.8643
c3=7.8244e-09
c4=1.0062e-08

These values solve my problem perfectly.

Additionally, I set lower band [0;0;0;0] to avoid possible errors.

Now, I have a function that includes linearity, exponentiality and sigmoid property.

Thank you for everything.

See you later.

Sign in to comment.

Answer 2

Muhammet Dabak on 25 Dec 2019

0
Link

Direct link to this answer

https://nl.mathworks.com/matlabcentral/answers/497977-multi-dimensional-fitting#answer_407675

Edited: Walter Roberson on 26 Dec 2019

Open in MATLAB Online

You said I cannot use cftool because it allows 2 variables most, that is ok. But, 'use your custom equation with fit() from the Curve Fitting Toolbox' , here is the fit definition in Matlab: Fit a curve or surface to data. Means no variables more than 2.

lsqcurvefit and lsqnonlin can be used if I have just x as a variable(1 parameter).

I couldn't get what you mean by these.

And my exact function is

f(x,y,z)=(a*x+b)*exp(-y/500)*(1-c/(d+exp(-z/e))); %[ 3 unknowns,5 coefficients]

I can set some coefficients by trying , that is not the deal actually(thanks for consideration in the sample equation).

I need a way to find coefficients for at least 3 unknowns.

If you see a way to do it just give me a very simple example so I could understand you.(3unknowns,3 coefficients

for ex. f(x,y,z)=(x+a)(y+b)(z+c);

Thanks anyway.

//Edit//

Note: I tried 'polyfitn' and 'regress' functions but they supply just polynomial and linear expressions ,however, i have exponential terms. One can use these functions if your function does not contain exponential expressions. 'polyfitn' produces n variables-m degree expression for you. But you have to download it.

// This note is written for people who has similar issues about this topic.//

1 Comment
Show -1 older commentsHide -1 older comments

Walter Roberson on 26 Dec 2019

You are right, fit() cannot be used for 3 or more independent variables. You will need to use a nonlinear least squares such as https://www.mathworks.com/help/optim/ug/lsqcurvefit.html .

Unfortunately with the random data I generated, lsqcurvefit did not do a good job. I am experimenting further.

Sign in to comment.

Answer 3

Muhammet Dabak on 26 Dec 2019

0
Link

Direct link to this answer

https://nl.mathworks.com/matlabcentral/answers/497977-multi-dimensional-fitting#answer_407752

Open in MATLAB Online

With 'polyfitn' function describing a function as a summation of all variables in the degree that you choose.

For example:

p=polyfitn([x,y,z],t,'constant x^2 y^3 z z.*x')
formula=polyn2sym(p);

Which gives

formula= c1*x^2 + c2*y^3 + c3*z + c4*z*x + c5

Which is very useful if you can describe your function as a summation and if it just includes polynomial and linear components. If I cannot find a way to build a custom equation for my case, I will try to convert my function to eliminate exponential terms and try again.

Thank you.

0 Comments
Show -2 older commentsHide -2 older comments

Sign in to comment.

Multi-dimensional Fitting

0 Comments
Show -2 older commentsHide -2 older comments

Accepted Answer

5 Comments
Show 3 older commentsHide 3 older comments

More Answers (2)

1 Comment
Show -1 older commentsHide -1 older comments

0 Comments
Show -2 older commentsHide -2 older comments

See Also

Categories

Tags

Products

Release

Community Treasure Hunt

Multi-dimensional Fitting

0 Comments Show -2 older commentsHide -2 older comments

Accepted Answer

5 Comments Show 3 older commentsHide 3 older comments

More Answers (2)

1 Comment Show -1 older commentsHide -1 older comments

0 Comments Show -2 older commentsHide -2 older comments

See Also

Categories

Tags

Products

Release

Community Treasure Hunt

0 Comments
Show -2 older commentsHide -2 older comments

5 Comments
Show 3 older commentsHide 3 older comments

1 Comment
Show -1 older commentsHide -1 older comments

0 Comments
Show -2 older commentsHide -2 older comments