Binary GA returns floating point numbers

10 views (last 30 days)
Hello,
I am trying to find optimal wavelengths in NIR spectra to perform PLS regression. I have working code but the solution includes sometimes floating point numbers. My question is now how to tell ga that only 0 and 1 are possible gene values.
Or can I simply say that any non zero value is true?
% Set seed for reproducability
rng(42);
% Download data and define arrays
url = 'https://figshare.com/ndownloader/files/1649903';
filename = 'data.xlsx';
data = readmatrix(websave(filename, url));
X = data(2:end,9:end);
y = data(2:end,1);
% Average blocks of 4 wavelengths
Xavg = mean(reshape(X, [size(X,1), size(X,2)/4, 4]), 3);
% Define the fitness function
fitness_function = @(solution) 1.0 / sqrt(mean((y - regressor(Xavg(:,logical(solution)), y)).^2));
% Define the initial population
init_pop = generate_initial_population(size(Xavg,2), 50, 25);
% Define the GA instance
options = optimoptions('ga', 'PopulationSize', 50, 'InitialPopulationMatrix', init_pop, ...
'MutationFcn', {@mutationadaptfeasible, 0.3}, 'CrossoverFcn', @crossoverscattered, ...
'EliteCount', 10, 'MaxGenerations', 100, 'UseParallel', true, 'Display', 'iter');
% Run GA
[solution, fval] = ga(fitness_function, size(Xavg,2), [], [], [], [], zeros(size(Xavg,2),1), ones(size(Xavg,2),1), [], options);
function y_pred = regressor(X, y)
% Specify parameter space
parameters_gs = 1:6;
best_mse = inf;
best_n_components = 0;
for n_components = parameters_gs
% Define PLSRegression object
[~,~,~,~,beta] = plsregress(X, y, n_components);
% Fit to data
y_pred = [ones(size(X,1),1) X] * beta;
% Calculate a final y with best choice of parameters
mse = mean((y - y_pred).^2);
if mse < best_mse
best_mse = mse;
best_n_components = n_components;
end
end
[~,~,~,~,beta] = plsregress(X, y, best_n_components);
y_pred = [ones(size(X,1),1) X] * beta;
end
function init_population = generate_initial_population(array_size, solutions_per_pop, number_of_bands)
% Starts with a boolean array of zeroes
init_population = false(solutions_per_pop, array_size);
% Define an index array the size of the spectral wavelengths
index_array = 1:array_size;
for i = 1:solutions_per_pop
% Randomly shuffle the array in place
index_array = index_array(randperm(length(index_array)));
% Select the first number_of_bands of the shuffled array and use it to flip the population array
init_population(i, index_array(1:number_of_bands)) = ~init_population(i, index_array(1:number_of_bands));
end
init_population = double(init_population);
end
Thanks for helping
F

Answers (1)

Walter Roberson
Walter Roberson on 19 Feb 2024
% Set seed for reproducability
rng(42);
% Load data and define arrays
data = readmatrix('Data/File_S1.xlsx');
Error using readmatrix
Unable to find or open 'Data/File_S1.xlsx'. Check the path and filename or file permissions.
X = data(2:end,9:end);
y = data(2:end,1);
% Average blocks of 4 wavelengths
Xavg = mean(reshape(X, [size(X,1), size(X,2)/4, 4]), 3);
% Define the fitness function
fitness_function = @(solution) 1.0 / sqrt(mean((y - cv_regressor(Xavg(:,logical(solution)), y)).^2));
% Define the initial population
init_pop = generate_initial_population(size(Xavg,2), 50, 25);
% Define the GA instance
options = optimoptions('ga', 'PopulationSize', 50, 'InitialPopulationMatrix', init_pop, ...
'MutationFcn', {@mutationadaptfeasible, 0.3}, 'CrossoverFcn', @crossoverscattered, ...
'EliteCount', 10, 'MaxGenerations', 100, 'UseParallel', true, 'Display', 'iter', ...
'PopulationType', 'bitstring');
% Run GA
[solution, fval] = ga(fitness_function, size(Xavg,2), [], [], [], [], zeros(size(Xavg,2),1), ones(size(Xavg,2),1), [], options);
function y_pred = regressor(X, y)
% Specify parameter space
parameters_gs = 1:6;
best_mse = inf;
best_n_components = 0;
for n_components = parameters_gs
% Define PLSRegression object
[~,~,~,~,beta] = plsregress(X, y, n_components);
% Fit to data
y_pred = [ones(size(X,1),1) X] * beta;
% Calculate a final y with best choice of parameters
mse = mean((y - y_pred).^2);
if mse < best_mse
best_mse = mse;
best_n_components = n_components;
end
end
[~,~,~,~,beta] = plsregress(X, y, best_n_components);
y_pred = [ones(size(X,1),1) X] * beta;
end
function init_population = generate_initial_population(array_size, solutions_per_pop, number_of_bands)
% Starts with a boolean array of zeroes
init_population = false(solutions_per_pop, array_size);
% Define an index array the size of the spectral wavelengths
index_array = 1:array_size;
for i = 1:solutions_per_pop
% Randomly shuffle the array in place
index_array = index_array(randperm(length(index_array)));
% Select the first number_of_bands of the shuffled array and use it to flip the population array
init_population(i, index_array(1:number_of_bands)) = ~init_population(i, index_array(1:number_of_bands));
end
init_population = double(init_population);
end
  1 Comment
Fabian Hofmann
Fabian Hofmann on 19 Feb 2024
Thank you for pointing out the error. The script is now running.

Sign in to comment.

Categories

Find more on Problem-Based Optimization Setup in Help Center and File Exchange

Products


Release

R2023b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!