Non-linear Multivariate regression using genetic algorithm

I would like to perform non-linear multivariate regression using genetic algorithm in matlab.
I have five independent variables and one response. How can I do non-linear regression by minimizing the mean square error in matlab using genetic algorithm?
R = f (x1, x2, x3, x4)
I want to fit the data to the equation similar to the following:
1.PNG
Thus, my objective is to find the constants C1, C2, C3,..., C21 using the GA in matlab by minimizing the difference between the actual response and the response from the model.

 Accepted Answer

One (very basic) approach:
xd = rand(10,1); % Create Dependent Variable
xi = rand(10,4); % Create Independent Variable Matrix
R = @(b,x) b(1).*sin(b(2).*x(:,1)) + b(3).*exp(b(4).*x(:,2)) + b(5).*cos(b(6).*x(:,3)) - b(7).*x(:,4); % Objective Function
ftnsfcn = @(b) norm(xd - R(b,xi)); % Fitness Function
B = ga(ftnsfcn, 7); % Genetic Algorithm Call
The idea is to create a matrix of the independent variables, then refer to them by their respective columns (my preference).
There are many ways to customise the ga population and other parameters to get interim outputs and create a specific initial population, among others. The ga function will search the parameter space for the best set, so being rigorous about defining a range for them is only necessary if they are widely varying in their expected amplitudes.

10 Comments

Many thanks dear Star Strider. I actually have a dataset for each independent variables and response values for each combinations of the independent variables. I want to fit to non-linear multivariable regression similar to the following one.
Thus, my objective is to find the constants C1, C2, ...., C21
1.PNG
Could you please help me with this regard.
As always, my pleasure!
I actually have a dataset for each independent variables and response values for each combinations of the independent variables.
I am not certain what that means in the context of writing the fitness function.
You must have at least 23 sets of the four columns of ‘X’ in order to get reliable parameter estimates, and the more, the better. You then need to define your ‘R’ function. I will resist the temptation to write all of ‘R’ and highlight some of the more important coding techniques for it
I would write it as:
R = @(C,X) C(1).*x(:,1).^C(2) + ... + C(9).*X(:,1).^C(10).*x(:,2).^C(11) + ... + C(18).*x(:,1).^C(19).*x(:,2).^C(20).*x(:,3).^C(21);
This will fit one dependent variable, since it will return a vector. It is possible to fit matrix dependent variables, however the regression function has to be written to return a matrix to fit them.
It may be necessary to run the ga function a few times in order to get the best fitness. It will search the entire parameter space and return the best parameteer set, however it is not entirley immune to ending up with the wrong set of parameters that it will return because the fitness values do not change appreciably between generations.
A general (and somewhat encyclopedic) ga appplication (to fit nonlinear differential equations to data) is: Is it possible to use Simulated Annealing or Genetic Algorithm for parameter estimation of a parametric System of ODEs? That illustrates a number of optons for using ga, including fitting matrix dependent variables.
Very useful information. Really appreciate!
However, I face one problem that every time I ran the matlab I am getting different values of the coefficients.
I actually did for simple case that has the following form:
2.JPG
I used the following code with the attached spreadsheet files saved in the same folder with the .m file. When I ran the matlab for multiple times each time the values of C1, C2, C3, and C4 keep changing from very low values to extremely high values.
Could you please help me in fixing this problem?
clear all
clc
%% Input
input = xlsread('data','data'); % Import from excel
j=1;
[k,l]=size(input);
while j<=k
%Inputs
f1(j)=input(j,1); % independent variable 1
f2(j)=input(j,2); % independent variable 2
f3(j)=input(j,3); % dependent factor (response)
%%
j=j+1;
end
xi = [f1.' f2.']; % independent variables
xd = f3.'; % Dependent Variable
%% Create Independent Variable Matrix
R = @(c,x) c(1).*x(:,1).^c(2) + c(3).*x(:,2).^c(4);
ftnsfcn = @(c) norm(xd - R(c,xi)); % Fitness Function
B = ga(ftnsfcn, 4);
%% Export to excel
a=B.';
T=table(a);
T.Properties.VariableNames = {'B'};
fileName='Results.xlsx';
writetable(T,fileName);
fileName_template='Results_template.xlsx';
copyfile(fileName_template,fileName)
writetable(T,fileName)
winopen(fileName)
As always, my pleasure!
Getting different values for the coefficients is typical for ga. There may be many different combinations of them that create similar fitness values. See How to save data from Genetic Algorithm in case MATLAB crashes? for a way to store the best parameter values and assess them later.
Many thanks again.
I used the following code together with the attached spreadsheet file that have data used too. However, I am not able to get the best fit. The values I got are significantly higher than the actual values. Could you please help me to fix this for one more time.
input = xlsread('data','data'); % Import from excel
j=1;
[k,l]=size(input);
while j<=k
%Inputs
f1(j)=input(j,1); % independent variable 1
f2(j)=input(j,2); % independent variable 2
f3(j)=input(j,3); % dependent factor (response)
j=j+1;
end
xi = [f1.' f2.']; % independent variables
xd = f3.'; % Dependent Variable
%%
R = @(c,x) c(1).*x(:,1).^c(2) + c(3).*x(:,2).^c(4)+c(5).*x(:,1).^c(6).*x(:,2).^c(7)+c(8);
ftnsfcn = @(c) norm(xd - R(c,xi)); % Fitness Function, calculates the sqrt(sum(xd-R)^2)
B = ga(ftnsfcn, 8);
%% Export to excel
a=B.';
T=table(a);
T.Properties.VariableNames = {'B'};
fileName='Results.xlsx';
writetable(T,fileName);
fileName_template='Results_template.xlsx';
copyfile(fileName_template,fileName)
writetable(T,fileName)
winopen(fileName)
I changed your code slightly to:
input = xlsread('data.xlsx','data');
input = sortrows(input(:,1:3),2); % Easier To See When Plotted
j=1;
[k,l]=size(input);
% while j<=k
% %Inputs
% f1(j)=input(j,1); % independent variable 1
% f2(j)=input(j,2); % independent variable 2
% f3(j)=input(j,3); % dependent factor (response)
% j=j+1;
% end
xi = input(:,[1 2]); % independent variables
xd = input(:,3); % Dependent Variable
% %%
R = @(c,x) c(1).*x(:,1).^c(2) + c(3).*x(:,2).^c(4)+c(5).*x(:,1).^c(6).*x(:,2).^c(7)+c(8);
PopSz = 500;
Parms = 8;
opts = optimoptions('ga', 'PopulationSize',PopSz, 'InitialPopulationMatrix',randi(1E+4,PopSz,Parms)*1E-3, 'MaxGenerations',2E3, 'PlotFcn',@gaplotbestf, 'PlotInterval',1);
ftnsfcn = @(c) norm(xd - R(c,xi)); % Fitness Function, calculates the sqrt(sum(xd-R)^2)
B = ga(ftnsfcn, Parms, [],[],[],[],[],[],[],[],opts);
C_Parameters = B(:)
FitnessVal = ftnsfcn(B)
fit_R = R(B,xi);
figure
plot3(input(:,1), input(:,2), input(:,3), 'p')
hold on
plot3(input(:,1), input(:,2), fit_R, '-r', 'LineWidth',1)
hold off
grid on
view(-70,+30)
The best parameter estimates I was able to get (five runs):
C_Parameters =
-2724.82809644071
11.5903152582184
325.465409266854
0.030128569906985
9.1193510807236
2.64439184741602
5.637
-297.372618842624
FitnessVal =
1302.32568120466
You cannot plot your full data set because this universe only allows 3 large dimensions (although string theory posits 11 total). You have 5 variables in your full regression equation (4 independent and 1 dependent), so you can only plot 3 at a time.
However the plot illustrates the problem in fitting the data, since they do not all fit well to a particular pattern (this plot using the posted best parameter estimates, with fitness value 1302.3):
1Non-linear Multivariate regression using genetic algorithm- 2020 01 19.png
It is always difficult to fit an extremely noisy data set.
Many thanks again.
I have tried using the ga, but ended up with response values (dependent variable) that are significantly higher than the actual values.
I am wondering if there is another method of fitting data in matlab (non-linear multivariate curve fitting) to the following objective function:
I have tried to use th following but didn't work
input = xlsread('data.xlsx','data');
input = sortrows(input(:,1:3),1);
xi = input(:,[1 2]); % independent variables
xd = input(:,3); % Dependent Variable
%%
R = @(c,x) c(1).*x(:,1).^c(2) + c(3).*x(:,2).^c(4)+c(5).*x(:,1).^c(6).*x(:,2).^c(7)+c(8);
B0 = rand(8,1);
B = nlinfit(xi, xd, R, B0);
fit_R = R(B,xi);
figure
plot3(input(:,1), input(:,2), fit_R, '-r', 'LineWidth',1)
grid on
I will give it a try with ga a bit later, since the requires a bit of time to run.
This is going to be extremely difficult to fit with a gradient-descent algorithm such as nlinfit.
Follow-up — (21 Jan 2020 at 21:21)
I thought that was a different objective function. It is the same objective function I used to get the result I posted earlier, with ga.
For nlinfit, use the ‘C parameters’ I posted as your ‘B0’. You may be able to refine the parameter estimates beyond those the ga function was able to estimate, providing even better parameter estimates, and allowing you to estimate the parameter confidence intervals, as well as confidence intervals on the fit. (The ga function option described in When to Use a Hybrid Function allows parameter estimates to ‘fine-tune’ its parameter estimates. I do not routinely use that option.)
Dear Star Srider,
Hope you are doing great.
Please, how to use ga if I want to minimize two variables at the same time.
Examaple: determining the values of the parameters C1,C2, C3,... by minimizing the mean square error and at the same time the ratio of the predicted to the measured data of the response.
Thanks
Minimising the mean squared error should do both.

Sign in to comment.

More Answers (1)

The GA toolbox in Matlab is not an ideal tool for curve fitting with the goal of global optimizatiom result. Refer the result below, it should be the global solution which GA may never get.
Root of Mean Square Error (RMSE): 0.00114090675219561
Sum of Squared Residual: 0.000298082021740069
Correlation Coef. (R): 1
R-Square: 1
Adjusted R-Square: 1
Determination Coef. (DC): 1
Chi-Square: 1.30802089664868E-6
F-Statistic: 1.9563957244377E19
Parameter Best Estimate
---------- -------------
c1 1.21488501004067
c2 1.45076972134303
c3 2.07995659264768
c4 1.69518420103036
c5 1.97997575301966
c6 1.97999531406562
c7 6.30000528873042
c8 9.02324638354842

3 Comments

Dear Alex Sha,
Many thanks. Could you please let me know the software or method you have used so that I can use it.
Many thanks again.
Hi, Tad2020, the result above was obtained by using a software package named "1stOpt", it is a global optimization package, idea for problems such as equation solving and curve fitting.
Thanks again. I will try to get the software if it is available.

Sign in to comment.

Asked:

on 19 Jan 2020

Commented:

on 28 Jan 2020

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!