How to use the genetic algorithm to optimize a regression model with measured data instead of analytic fitness functions?

17 views (last 30 days)
I have a measured numerical dataset with approximately 50 variables and labeled ground-truth output labels. However, I do not have an analytical relationship between the inputs and the output, so this is a black-box model. 
I have a parametrized regression model and its fitness function, and I want to optimize the parameters of the fitness function to fit the dataset's output labels.
How can I use the genetic algorithm to optimize this black-box regression problem?

Accepted Answer

MathWorks Support Team
MathWorks Support Team on 12 Aug 2025
You can define a loss metric that measures the numerical difference between your fitness function's predictions and the ground-truth labels. Then, you can optimize this loss metric using the genetic algorithm.
Specifically, this can be done as follows:
  1. Instead of defining an analytic fitness function with respect to the optimization variable, pass the input variables as column vectors to your fitness function and compute a prediction (attempting to match each corresponding ground-truth output) for each row of the inputs.
  2. Aggregate these predictions for all datapoints into a scalar loss metric, or otherwise depending on your workflow.
  3. Define a single optimization vector composed of the parameters for your regression fitness function
  4. Optimize this optimization vector with respect to the loss using the genetic algorithm. This finds optimal regression parameters with minimal loss, such that the fitness function outputs predictions matching the ground-truths.
If you want to consider other solvers of the Global Optimization Toolbox beyond the genetic algorithm, consider the tradeoffs in the solver characteristics.
The example script below shows a quartic fitness function parametrized by the polynomial coefficient terms (packaged in the optimization vector "p"), for a dummy dataset. The loss metric is the square loss. In this example, the genetic algorithm solution closely matches the quartic regression fit given by the "polyfit" function:
% Data x = -2:0.05:1; y = 0.5*x.^4 + x.^3 - 0.6*x + 0.1*randn(1,length(x)); % Polyfit regression model p_polyfit = polyfit(x,y,4) % Parameters found from Quartic fit predictions_polyfit = p_polyfit(1)*x.^4 + p_polyfit(2)*x.^3 + p_polyfit(3)*x.^2 + p_polyfit(4)*x + p_polyfit(5); minimum_loss_polyfit = sum(abs(predictions_polyfit-y)) % Compare regression predictions to ground truth with Square Loss % genetic algorithm rng default % For reproducibility options = optimoptions(@ga,'FunctionTolerance',1e-12); % Tolerance for solution (default 1e-6) FitnessFunction = @(p) loss(p,x,y); numberOfVariables = 5; % Seeking optimal solution p being a vector of 2 elements (slope and bias terms in linear regression fitness) lb = [-2,-2,-2,-2,-2]; % Lower bounds for p(1) ... p(5) ub = [2,2,2,2,2]; % Upper bounds for p(1) ... p(5) [p_ga,minimum_loss_ga] = ga(FitnessFunction,numberOfVariables,[],[],[],[],lb,ub,[],options) % Parameters and loss from genetic algorithm predictions_ga = fitness(p_ga,x); % Plot comparison plot(x,y,'o') hold on plot(x,predictions_polyfit,'r-') plot(x,predictions_ga,'g*') hold off legend('data','polyfit','ga') % Fitness and Loss functions function pred = fitness(p,x) xT = x'; pred = p(1)*xT.^4+p(2)*xT.^3+p(3)*xT.^2+p(4)*xT+p(5); end function L = loss(p,x,y) L = sum(abs(fitness(p,x) - y')); end
Additionally, the attached file "ga_regressioon_example.mlx" follows a similar implementation of the genetic algorithm on regression for linear noisy data.

More Answers (0)

Products


Release

R2024b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!