How to use the genetic algorithm to optimize a regression model with measured data instead of analytic fitness functions?

Question

MathWorks Support Team on 20 Jan 2025

0
Link

Direct link to this question

https://nl.mathworks.com/matlabcentral/answers/2173412-how-to-use-the-genetic-algorithm-to-optimize-a-regression-model-with-measured-data-instead-of-analyt

Answered: MathWorks Support Team on 28 Jan 2025

I have a measured numerical dataset with approximately 50 variables and labeled ground-truth output labels. However, I do not have an analytical relationship between the inputs and the output, so this is a black-box model.

I have a parametrized regression model and its fitness function, and I want to optimize the parameters of the fitness function to fit the dataset's output labels.

How can I use the genetic algorithm to optimize this black-box regression problem?

Sign in to answer this question.

Answer 1

MathWorks Support Team on 12 Aug 2025

0
Link

Direct link to this answer

https://nl.mathworks.com/matlabcentral/answers/2173412-how-to-use-the-genetic-algorithm-to-optimize-a-regression-model-with-measured-data-instead-of-analyt#answer_1558396

Open in MATLAB Online

ga_regression_example.mlx

You can define a loss metric that measures the numerical difference between your fitness function's predictions and the ground-truth labels. Then, you can optimize this loss metric using the genetic algorithm.

Specifically, this can be done as follows:

Instead of defining an analytic fitness function with respect to the optimization variable, pass the input variables as column vectors to your fitness function and compute a prediction (attempting to match each corresponding ground-truth output) for each row of the inputs.
Aggregate these predictions for all datapoints into a scalar loss metric, or otherwise depending on your workflow.
Define a single optimization vector composed of the parameters for your regression fitness function
Optimize this optimization vector with respect to the loss using the genetic algorithm. This finds optimal regression parameters with minimal loss, such that the fitness function outputs predictions matching the ground-truths.

If you want to consider other solvers of the Global Optimization Toolbox beyond the genetic algorithm, consider the tradeoffs in the solver characteristics.

The example script below shows a quartic fitness function parametrized by the polynomial coefficient terms (packaged in the optimization vector "p"), for a dummy dataset. The loss metric is the square loss. In this example, the genetic algorithm solution closely matches the quartic regression fit given by the "polyfit" function:

% Data

x = -2:0.05:1;
y = 0.5*x.^4 + x.^3 - 0.6*x + 0.1*randn(1,length(x));

% Polyfit regression model

p_polyfit = polyfit(x,y,4) % Parameters found from Quartic fit
predictions_polyfit = p_polyfit(1)*x.^4 + p_polyfit(2)*x.^3 + p_polyfit(3)*x.^2 + p_polyfit(4)*x + p_polyfit(5);
minimum_loss_polyfit = sum(abs(predictions_polyfit-y)) % Compare regression predictions to ground truth with Square Loss

% genetic algorithm

rng default % For reproducibility
options = optimoptions(@ga,'FunctionTolerance',1e-12); % Tolerance for solution (default 1e-6)
FitnessFunction = @(p) loss(p,x,y);
numberOfVariables = 5; % Seeking optimal solution p being a vector of 2 elements (slope and bias terms in linear regression fitness)
lb = [-2,-2,-2,-2,-2]; % Lower bounds for p(1) ... p(5)
ub = [2,2,2,2,2]; % Upper bounds for p(1) ... p(5)
[p_ga,minimum_loss_ga] = ga(FitnessFunction,numberOfVariables,[],[],[],[],lb,ub,[],options) % Parameters and loss from genetic algorithm

predictions_ga = fitness(p_ga,x);

% Plot comparison

plot(x,y,'o')
hold on
plot(x,predictions_polyfit,'r-')
plot(x,predictions_ga,'g*')
hold off
legend('data','polyfit','ga')

% Fitness and Loss functions

function pred = fitness(p,x)
xT = x';
pred = p(1)*xT.^4+p(2)*xT.^3+p(3)*xT.^2+p(4)*xT+p(5);
end

function L = loss(p,x,y)
L = sum(abs(fitness(p,x) - y'));
end

Additionally, the attached file "ga_regressioon_example.mlx" follows a similar implementation of the genetic algorithm on regression for linear noisy data.

0 Comments
Show -2 older commentsHide -2 older comments

Sign in to comment.

How to use the genetic algorithm to optimize a regression model with measured data instead of analytic fitness functions?

Accepted Answer

0 Comments
Show -2 older commentsHide -2 older comments

More Answers (0)

See Also

Categories

Tags

Products

Release

Community Treasure Hunt

How to use the genetic algorithm to optimize a regression model with measured data instead of analytic fitness functions?

Accepted Answer

0 Comments Show -2 older commentsHide -2 older comments

More Answers (0)

See Also

Categories

Tags

Products

Release

Community Treasure Hunt

0 Comments
Show -2 older commentsHide -2 older comments