- Instead of defining an analytic fitness function with respect to the optimization variable, pass the input variables as column vectors to your fitness function and compute a prediction (attempting to match each corresponding ground-truth output) for each row of the inputs.
- Aggregate these predictions for all datapoints into a scalar loss metric, or otherwise depending on your workflow.
- Define a single optimization vector composed of the parameters for your regression fitness function
- Optimize this optimization vector with respect to the loss using the genetic algorithm. This finds optimal regression parameters with minimal loss, such that the fitness function outputs predictions matching the ground-truths.
How to use the genetic algorithm to optimize a regression model with measured data instead of analytic fitness functions?
17 views (last 30 days)
Show older comments
MathWorks Support Team
on 20 Jan 2025
Answered: MathWorks Support Team
on 28 Jan 2025
I have a measured numerical dataset with approximately 50 variables and labeled ground-truth output labels. However, I do not have an analytical relationship between the inputs and the output, so this is a black-box model.
I have a parametrized regression model and its fitness function, and I want to optimize the parameters of the fitness function to fit the dataset's output labels.
How can I use the genetic algorithm to optimize this black-box regression problem?
Accepted Answer
MathWorks Support Team
on 12 Aug 2025
You can define a loss metric that measures the numerical difference between your fitness function's predictions and the ground-truth labels. Then, you can optimize this loss metric using the genetic algorithm.
Specifically, this can be done as follows:
If you want to consider other solvers of the Global Optimization Toolbox beyond the genetic algorithm, consider the tradeoffs in the solver characteristics.
The example script below shows a quartic fitness function parametrized by the polynomial coefficient terms (packaged in the optimization vector "p"), for a dummy dataset. The loss metric is the square loss. In this example, the genetic algorithm solution closely matches the quartic regression fit given by the "polyfit" function:
% Data
x = -2:0.05:1;
y = 0.5*x.^4 + x.^3 - 0.6*x + 0.1*randn(1,length(x));
% Polyfit regression model
p_polyfit = polyfit(x,y,4) % Parameters found from Quartic fit
predictions_polyfit = p_polyfit(1)*x.^4 + p_polyfit(2)*x.^3 + p_polyfit(3)*x.^2 + p_polyfit(4)*x + p_polyfit(5);
minimum_loss_polyfit = sum(abs(predictions_polyfit-y)) % Compare regression predictions to ground truth with Square Loss
% genetic algorithm
rng default % For reproducibility
options = optimoptions(@ga,'FunctionTolerance',1e-12); % Tolerance for solution (default 1e-6)
FitnessFunction = @(p) loss(p,x,y);
numberOfVariables = 5; % Seeking optimal solution p being a vector of 2 elements (slope and bias terms in linear regression fitness)
lb = [-2,-2,-2,-2,-2]; % Lower bounds for p(1) ... p(5)
ub = [2,2,2,2,2]; % Upper bounds for p(1) ... p(5)
[p_ga,minimum_loss_ga] = ga(FitnessFunction,numberOfVariables,[],[],[],[],lb,ub,[],options) % Parameters and loss from genetic algorithm
predictions_ga = fitness(p_ga,x);
% Plot comparison
plot(x,y,'o')
hold on
plot(x,predictions_polyfit,'r-')
plot(x,predictions_ga,'g*')
hold off
legend('data','polyfit','ga')
% Fitness and Loss functions
function pred = fitness(p,x)
xT = x';
pred = p(1)*xT.^4+p(2)*xT.^3+p(3)*xT.^2+p(4)*xT+p(5);
end
function L = loss(p,x,y)
L = sum(abs(fitness(p,x) - y'));
end
Additionally, the attached file "ga_regressioon_example.mlx" follows a similar implementation of the genetic algorithm on regression for linear noisy data.
0 Comments
More Answers (0)
See Also
Categories
Find more on Genetic Algorithm in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!