Fitting a data set by optimising multiple parameters

Hello,
I am developing a model for simulating experimental data sets. Each data set corresponds to two specific variables (P and T). The model is generally represented by the following equation:
V_total = V1 + V2 + A log (a/a1) + B log (a/a2)
where a = 0:1.5:700;
The terms of V1, V2, A and B are calculated through specific equations so you can consider them constants. The coefficients of a1 and a2 are functions of P and T as follows:
a1 = x1 * P^0.5 * exp(16.95 - (5052 / T))
a2 = x2 * P^0.5 * exp(8.47 - (2526 / T))
For each data set, I want to obtain the best fitting by optimising coefficients x1 and x2.
For better understanding, here is a plot of the data set of P = 5 & T = 90. I randomly used x1 = 0.1 and x2 = 0.005.
I would be grateful if someone helps me in solving this issue.
Thank you in advance.

2 Comments

Can you upload one or two example datasets? You can use the paper clip icon in the INSERT section of the toolbar.
Thank you @the cyclist for your answer. Here is the dataset of P = 5 & T = 90.
Please consider T = 363 instead of 90 as it is the temperature and should be used in Kelvin in the equations of a1 and a2.

Sign in to comment.

Answers (1)

Perhaps something like this —
T1 = readtable('Ask_17.05.2024.xlsx')
T1 = 431x2 table
a V_total _______ _______ 0 1.32 0.80556 1.33 2.4444 1.34 4.0833 1.36 5.6944 1.38 5.6944 1.39 8.1389 1.42 9.7778 1.44 10.583 1.46 13.028 1.48 14.639 1.48 16.278 1.5 17.083 1.51 18.722 1.52 21.167 1.53 21.167 1.54
a = T1.a;
a(1) = 1E-4; % Please Avoid Calculating 'log(0)'
V_total = T1.V_total;
P = 5;
T = 90 + 273
T = 363
V1 = rand; % Provide Missing Value
V2 = rand; % Provide Missing Value
A = rand; % Provide Missing Value
B = rand; % Provide Missing Value
a1 = @(x,a) x(1) * sqrt(P) * exp(16.95 - (5052 / T));
a2 = @(x,a) x(2) * sqrt(P) * exp(8.47 - (2526 / T));
V_total_fcn = @(x,a) V1 + V2 + A*log(a/a1(x,a)) + B*log(a/a2(x,a))
V_total_fcn = function_handle with value:
@(x,a)V1+V2+A*log(a/a1(x,a))+B*log(a/a2(x,a))
x0 = rand(2,1);
X = lsqcurvefit(V_total_fcn, x0, a, V_total)
Local minimum possible. lsqcurvefit stopped because the final change in the sum of squares relative to its initial value is less than the value of the function tolerance.
X = 2x1
5.4249 0.5849
<mw-icon class=""></mw-icon>
<mw-icon class=""></mw-icon>
figure
plot(a, V_total, '.', 'DisplayName','Data')
hold on
plot(a, V_total_fcn(X,a), '-r', 'DisplayName','Regression Fit')
hold off
grid
xlabel('a')
ylabel('V\_total')
legend('Location','best')
This may work without further changes with the correct values for ‘V1’, ‘v2’, ‘A’, and ‘B’, however it may be necessary to use the real, imag, or abs functions to deal with the complex results if they persist after that.
.

5 Comments

Thank you very much @Star Strider for your answer. The code already works, but with unstable behaviour. The final code that I have developed using your answer and including the values of V1, V2, A and B is as follows:
clc
clear
close all
Table1 = readtable('Ask_17.05.2024.xlsx');
a = Table1.a(1:15:end,:); % Reduction of the exp. data
V_total = Table1.V_total(1:15:end,:); % Reduction of the exp. data
a(1) = 1E-2; % An arbitrary starting value
P = 5;
T = 90 + 273;
A = 0.0379;
B = 0.0338;
V1 = 1.2148;
V2 = a .* 0.36 .* (6.7399e-04 + R); % where R is a function of 'a' and is defined through a specific equation.
R = ...;
a1 = @(x,a) x(1) * sqrt(4.2575) * 20.8113;
a2 = @(x,a) x(2) * sqrt(4.2575) * 4.5619;
V_total_fcn = @(x,a) V1 + V2 + (A .* log(a./(a1(x,a) .* (1 - G)))) + (B .* log(a./a2(x,a) .* (1 - G))); % Final equation of V_total, where G is also a function of 'a' and is defined through a specific equation.
G = ...;
x0 = rand(2,1);
X = lsqcurvefit(V_total_fcn, x0, a, V_total)
figure
plot(a, V_total_fcn(X,a),'Color','#D95319','LineWidth',1.3)
hold on
plot(a, V_total,'o','MarkerEdgeColor','#D95319','LineWidth',1.3)
hold off
lgd = legend('Simulation','Experiment','Location','southeast');
I ran the code three times with the same value of a(1) = 1E-2 and I had three different fitting values as follows:
%% 1st trial
x(1) = 0.003
x(2) = 0.9482
%% 2nd trial
x(1) = 0.6424 - 0.0038i
x(2) = 0.0001 + 5.2915e-7i
%% 3rd trial
x(1) = 0.0003
x(2) = 0.4734
Are there any ideas to overcome this unstable behaviour?
It's not an unstable behaviour, but the usual dependence of the solution on the initial guess.
If you only know a range where the unknown parameters are located, use "MultiStart":
My pleasure!
I cannot run this because ‘R’ and ‘G’ are missing. (They are required for ‘V2’ and ‘V_total_fcn’.)
The problem with nonlinear parameter estimation is that the results are sensitive (extremely sensitive in some instances) to the initial parameter estimates, especially if there are multiple local minima in the response surface. The genetic algorithm optimisation should work, and should give the global values for the ‘x’ parameter vector that give the best overall fit. (I am also not constrainiing ‘x’ in this instance.)
I will run this when the required paramters appear.
Table1 = readtable('Ask_17.05.2024.xlsx');
a = Table1.a(1:15:end,:); % Reduction of the exp. data
V_total = Table1.V_total(1:15:end,:); % Reduction of the exp. data
a(1) = 1E-2; % An arbitrary starting value
P = 5;
T = 90 + 273;
A = 0.0379;
B = 0.0338;
V1 = 1.2148;
R = 0.02777 .* ((1./(143.5860 .* (1 - ((1.1261e-08 .* a) ./ (3e-5 + (1.1261e-08 .* a)))).^1.5)) + (1./(143.5860 .* (1 - ((5.6303e-09 .* a) ./ (3e-5 + (5.6303e-09 .* a)))).^1.5)));
G = 8.9717e-3 .* (a .* 10).^0.3;
V2 = a .* 0.36 .* (6.7399e-04 + R); % where R is a function of 'a' and is defined through a specific equation.
a1 = @(x,a) x(1) * sqrt(4.2575) * 20.8113;
a2 = @(x,a) x(2) * sqrt(4.2575) * 4.5619;
V_total_fcn = @(x,a) V1 + V2 + (A .* log(a./(a1(x,a) .* (1 - G)))) + (B .* log(a./a2(x,a) .* (1 - G))); % Final equation of V_total, where G is also a function of 'a' and is defined through a specific equation.
% x0 = rand(2,1);
% X = lsqcurvefit(V_total_fcn, x0, a, V_total)
ftns = @(x) norm(V_total - V_total_fcn(x,a));
PopSz = 500;
Parms = 2;
optsAns = optimoptions('ga', 'PopulationSize',PopSz, 'InitialPopulationMatrix',randi(1E+4,PopSz,Parms)*1E-3, 'MaxGenerations',5E3, 'FunctionTolerance',1E-10); % Options Structure For 'Answers' Problems
tic
% [theta,fval,exitflag,output,population,scores] = ga(ftns, Parms, [],[],[],[],zeros(Parms,1),Inf(Parms,1),[],[],optsAns);
[X,fval,exitflag,output,population,scores] = ga(ftns, Parms, [],[],[],[],[],[],[],[],optsAns);
ga stopped because the average change in the fitness value is less than options.FunctionTolerance.
toc
Elapsed time is 0.482099 seconds.
fprintf('\nParameters —\n\t\tx(1) = %.6f\n\t\tx(2) = %.6f\n\n',X)
Parameters — x(1) = 0.017415 x(2) = 0.004000
NrGenerations = output.generations
NrGenerations = 66
FitnessValue = fval
FitnessValue = 0.3318
V_t_mdl = fitnlm(a,V_total, V_total_fcn, X)
Warning: The Jacobian at the solution is ill-conditioned, and some model parameters may not be estimated well (they are not identifiable). Use caution in making predictions.
V_t_mdl =
Nonlinear regression model: y ~ F(x,a) Estimated Coefficients: Estimate SE tStat pValue _________ __________ ______ __________ b1 0.018479 0.00035689 51.778 2.3849e-29 b2 0.0042741 0.0013761 3.106 0.0043143 Number of observations: 29, Error degrees of freedom: 28 Root Mean Squared Error: 0.0625 R-Squared: 0.863, Adjusted R-Squared 0.863 F-statistic vs. zero model: 2.48e+04, p-value = 8.05e-43
X = V_t_mdl.Coefficients.Estimate;
fprintf('\nTweaked Parameters —\n\t\tx(1) = %.6f\n\t\tx(2) = %.6f\n\n',X)
Tweaked Parameters — x(1) = 0.018479 x(2) = 0.004274
figure
plot(a, V_total_fcn(X,a),'Color','#D95319','LineWidth',1.3)
hold on
plot(a, V_total,'o','MarkerEdgeColor','#D95319','LineWidth',1.3)
hold off
lgd = legend('Simulation','Experiment','Location','southeast');
EDIT — (21 May 2024 at 17:01)
Changed code to include definitions for the missing parameters and the fitnlm call to provide statistics and to tweak the parameters returned by ga.
.
I actually could not understand the following part of the code:
ftns = @(theta) norm(V_total - V_total_fcn(x,a));
PopSz = 500;
Parms = 2;
optsAns = optimoptions('ga', 'PopulationSize',PopSz, 'InitialPopulationMatrix',randi(1E+4,PopSz,Parms)*1E-3, 'MaxGenerations',5E3, 'FunctionTolerance',1E-10); % Options Structure For 'Answers' Problems
tic
% [theta,fval,exitflag,output,population,scores] = ga(ftns, Parms, [],[],[],[],zeros(Parms,1),Inf(Parms,1),[],[],optsAns);
[theta,fval,exitflag,output,population,scores] = ga(ftns, Parms, [],[],[],[],[],[],[],[],optsAns);
toc
NrGenerations = output.generations
Nevertheless, I have added it to my code and have commented the following two lines:
x0 = rand(2,1);
X = lsqcurvefit(V_total_fcn, x0, a, V_total)
In addition, I have defined the equations of R and G. Hence, the error was: Unrecognized function or variable 'x'.
You can try yourself by defining R and G as follows:
R = 0.02777 .* ((1./(143.5860 .* (1 - ((1.1261e-08 .* a) ./ (3e-5 + (1.1261e-08 .* a)))).^1.5)) + (1./(143.5860 .* (1 - ((5.6303e-09 .* a) ./ (3e-5 + (5.6303e-09 .* a)))).^1.5)));
G = 8.9717e-3 .* (a .* 10).^0.3;
My revised code (edited a few minutes ago in my previous Comment) implements the genetic algorithm (ga), and the additional assignments define the fitness function ‘ftns’ and the options structure I use with my ga calls.
I did not previously correct my code because I could not run it, so there are several obvious errors.. The current version runs and gives a reasonable result. I added a fitnlm call both to provide statistics on the parameters and the fit, and to tweak the parameters to give the best result. (The ga function actually has a version of this as part of its options and refers to it as a hybrid parameter estimation.)
There may still be some differences in the estiamted parameters between the runs, however they should be within the confidence intervals for any set of estimated parameters. The fit in general is quite good.
.

Sign in to comment.

Asked:

on 17 May 2024

Commented:

on 21 May 2024

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!