Bayesian Optimization: How should we parameterize hidden units for changing number of layers (depth) of a BiLSTM network using bayesopt?

50 views (last 30 days)
Hi there,
I have been trying to use bayesian optimization for tuning my hyperparameters in my BiLSTM code (Hope this code helps some of the community because I saw unanswered questions on matlab related to LSTM bayesian optimization (similar to BiLSTM)).
In my code, one of the parameters I'm changing is depth of the BiLSTM network but, I should also try to find the best number of hidden units for each layer I think.
As you can see in the code, the maximum number of layers I want to test is 10 layers, so I created (HiddenUnits_1 --> HiddenUnits_10) under optimVars but, this number also depends on the number of layers we have in the network. For example: If a 5 layer (BiLSTM layers only) network needs to be adjusted, there should be 5 variables for hidden units (HiddenUnits_1 --> HiddenUnits_5) and the rest of the parameters (HiddenUnits_6 --> HiddenUnits_10) should not exist for that particular "experiment". I ran the code successfully but, it is trying to optimize for all 10 hidden units even if the layer size is smaller. Is there a way to avoid optimizing for unnecessary variables such as in this case (ignore hidden units 6-10 if there are only 5 layers in the current point being evaluated)?
Also, a little off topic but, related: Is there a way to optimize these hidden units in an array or a cell? Basically, can I write a cell array to be optimized with each cell being the different hidden units variables (HiddenUnits_1-HiddenUnits_10)? The reason I want to see if this is possible is becase I can modify the code to accept hidden units automatically from a cell array and I will not have to mention each hidden unit separetely because I can make that number dependent on the number of BiLSTM layers I believe (not tried it yet).
Thank you, any help or suggestions are appreciated.
Here is the code I have written for it:
%% Bayesian Optimization
optimVars = [
optimizableVariable('SectionDepth',[1 10],'Type','integer')
optimizableVariable('InitialLearnRate',[1e-2 1],'Transform','log')
optimizableVariable('HiddenUnits_1', [1 200], 'Type', 'integer')
optimizableVariable('HiddenUnits_2', [1 200], 'Type', 'integer')
optimizableVariable('HiddenUnits_3', [1 200], 'Type', 'integer')
optimizableVariable('HiddenUnits_4', [1 200], 'Type', 'integer')
optimizableVariable('HiddenUnits_5', [1 200], 'Type', 'integer')
optimizableVariable('HiddenUnits_6', [1 200], 'Type', 'integer')
optimizableVariable('HiddenUnits_7', [1 200], 'Type', 'integer')
optimizableVariable('HiddenUnits_8', [1 200], 'Type', 'integer')
optimizableVariable('HiddenUnits_9', [1 200], 'Type', 'integer')
optimizableVariable('HiddenUnits_10', [1 200], 'Type', 'integer')];
ObjFcn = makeObjFcn(Noisy_XTrain_PLE,Noisy_YTrain_PLE,PLE_Predictions_40_train,PLE_Predictions_40_test,mu_PLE,std_PLE);
% Perform bayesian optimization by minimizing error on validation set.
% Minimum of 30 runs is suggested for bayesian optimization (more can lead to better results).
BayesObject = bayesopt(ObjFcn,optimVars, ...
'MaxObj',30, ...
'MaxTime',14*60*60, ...
'IsObjectiveDeterministic',false, ...
% Load the best network found in optimization and load the filename
bestIdx = BayesObject.IndexOfMinimumTrace(end);
fileName = BayesObject.UserDataTrace{bestIdx};
savedStruct = load(fileName);
% Print validation error
TrainError = savedStruct.TotaltrainingError
valError = savedStruct.TotalvalError
%% Define the objective function for optimization
function ObjFcn = makeObjFcn(XTrain,YTrain,PLE_Predictions_training,PLE_Predictions_test,mu_PLE,std_PLE)
ObjFcn = @valErrorFun;
function [TotalvalError,cons,fileName] = valErrorFun(optVars)
% Create cell array of valError to save the validation error values
valError = cell(510,1);
TrainingError = cell(510,1);
% Random seed
seed = 100;
% Input - Output features
numFeatures = 1;
numResponses = 1;
% Hyperparameters
miniBatchSize = 1;
%numHiddenUnits = 50;
x = 0;
y = 1;
maxEpochs = 1;
% Layer structure
layers = [
bilstmBlock(optVars.SectionDepth,optVars.HiddenUnits_1,optVars.HiddenUnits_2,optVars.HiddenUnits_3,optVars.HiddenUnits_4,optVars.HiddenUnits_5,optVars.HiddenUnits_6,optVars.HiddenUnits_7,optVars.HiddenUnits_8,optVars.HiddenUnits_9,optVars.HiddenUnits_10,x,y) % Function
% Add the fully connected layer and the final softmax and
% classification layers.
fullyConnectedLayer(numResponses,'BiasInitializer','ones','WeightsInitializer',@(sz) normrnd(x,y,sz))
% Training options
options = trainingOptions('adam', ...
'InitialLearnRate',optVars.InitialLearnRate, ...
'GradientThreshold',1, ...
'MaxEpochs',maxEpochs, ...
'ExecutionEnvironment','gpu', ...
'LearnRateSchedule','piecewise', ...
'LearnRateDropPeriod',125, ...
'LearnRateDropFactor',1, ...
'MiniBatchSize',miniBatchSize, ...
'Shuffle','never', ...
'Verbose',false, ...
% Train network
net = trainNetwork(XTrain, YTrain, layers, options);
% Forecast future values
for i = 450:510
net = resetState(net); % Testing this reset option
[net,XPred] = predictAndUpdateState(net,XTrain(i,:),'MiniBatchSize', 1);
Ending = cellfun(@(x) x(end), YTrain(i,:), 'UniformOutput', false);
% Then Update the state again on the last point of Ytrain to get the next state update
[net,YPred] = predictAndUpdateState(net,Ending,'MiniBatchSize',1);
% Repeat the predictAndUpdateState in a for loop to get the next time steps (Forecast into the future)
for j = 2:40 % Need to change this to account for remaining months for each well
[net,YPred(:,j)] = predictAndUpdateState(net,YPred(:,j-1),'MiniBatchSize', 1,'ExecutionEnvironment','gpu');
% Convert cell to matrix since the amount of predictions is the same (not the total amount for each well but, the next 5 years for example)
YPred_new = cell2mat(YPred);
mu_3 = cell2mat(mu_PLE);
std_3 = cell2mat(std_PLE);
De_normalized_YPred = YPred_new.*std_3(i,:) + mu_3(i,:);
De_normalized_Xpred = cellfun(@(x,y,z) x.*y + z, std_PLE (i,1), XPred, mu_PLE (i,1), 'UniformOutput', false);
% Test PLE
PLE_test = cell2mat(PLE_Predictions_test(i,1));
% Training PLE
PLE_Predictions_train = cellfun(@(x) x(:,end-1), PLE_Predictions_training, 'UniformOutput', false);
PLE_train = cell2mat(PLE_Predictions_train(i,1));
valError{i,1} = mean((PLE_test(1,1:40) - De_normalized_YPred).^2);
TrainingError{i,1} = mean((PLE_train(1,:) - cell2mat(De_normalized_Xpred(:))).^2);
TotaltrainingError = sum([TrainingError{:}]);
TotalvalError = sum([valError{:}]);
fileName = num2str(TotaltrainingError) + "_" + num2str(TotalvalError) + ".mat";
% Constraints
cons = [];
%% Define a function for creating deeper networks
function layersan = bilstmBlock(numBiLSTMLayers,HiddenUnits_1,HiddenUnits_2,HiddenUnits_3,HiddenUnits_4,HiddenUnits_5,HiddenUnits_6,HiddenUnits_7,HiddenUnits_8,HiddenUnits_9,HiddenUnits_10,x,y)
numHiddenUnits = [HiddenUnits_1,HiddenUnits_2,HiddenUnits_3,HiddenUnits_4,HiddenUnits_5,HiddenUnits_6,HiddenUnits_7,HiddenUnits_8,HiddenUnits_9,HiddenUnits_10];
layersan = [];
for i = 1:numBiLSTMLayers
layers = bilstmLayer(numHiddenUnits(1,i),'BiasInitializer','ones','OutputMode','sequence','InputWeightsInitializer',@(sz) normrnd(x,y,sz),'RecurrentWeightsInitializer',@(sz) normrnd(x,y,sz));
layersan = [layersan; layers];

Accepted Answer

Alan Weiss
Alan Weiss on 3 Nov 2020
I believe that you can perform the optimization the way you want using conditional constraints. If M is the number of layers that you are using, then set the values of all parameters in layers M+1 through 10 to some default value so that they are not optimized.
As for your second question, I am sorry but I do not understand exactly what you are asking. Maybe you are asking if you can run a subsidiary optimization inside your objective function. The answer to that is of course yes, you can write anything you want inside your objective function, including another call to bayesopt. But perhaps i misunderstand what you are asking.
Good luck,
Alan Weiss
MATLAB mathematical toolbox documentation
Yildirim Kocoglu
Yildirim Kocoglu on 9 Nov 2020
I think I finally figured it out.
What I missed (or rather was not very clear in the documentation) was that as the number of points observed grows (next iteration), a table in the background also grows in rows at each iteration. I'll show an example of this table in a second but, before that I want to show how to write the conditional constraints correctly, I want to show the actual changes in the code. It was written similar to the documentation but, the reasons for the way it's written was not clearly explained.
Here is what I changed in my code to take care of this:
% Look inside the Bayesobject (or whatever you called it) by running without using conditional constraints first to see exactly what happened inside (it holds many details of bayesopt including the table I mentioned)
BayesObject = bayesopt(ObjFcn,optimVars, ...
'MaxObj',2, ...
'MaxTime',14*60*60, ...
'IsObjectiveDeterministic',false, ...
'ConditionalVariableFcn',@condvariablefcn, ... % Don't forget this part and make sure its name matches your written function (it is passed as a function handle)
function Xnew = condvariablefcn(X)
% X is a table in the background and Xnew is now assigned as the same table
Xnew = X;
% For loop goes through each column of the table Xnew
% Xnew is now a table (rather than optimVars I wrote earlier) --> name does not matter really but, tables need to be accessed using the rows.
% Xnew.(i) is the column of the table mentioned earlier which has the names assigned in optimVars and (Xnew.SectionDepth < i-2) is looking into all the rows where "SectionDepth < (chosen layer size at next point observation) and assigns it to a value of "NaN
for i = 3:12
Xnew.(i)(Xnew.SectionDepth < i-2) = NaN; % I have 12 variables and the 1st and 2nd variables (i=1,2) are 'SectionDepth' and 'learningrate' and the rest are HiddenUnits_1--> HiddenUnits_10
Here is the example of the table inside BayesObject (in this case I just did 2 iterations by using 'MaxObj' = 2) but, it correctly assigned a default value to HiddenUnits based on number of layers.
Here is my verbose output (matches the table):
| Iter | Eval | Objective | Objective | BestSoFar | BestSoFar | SectionDepth | InitialLearn-| HiddenUnits_1| HiddenUnits_2| HiddenUnits_3| HiddenUnits_4| HiddenUnits_5| HiddenUnits_6| HiddenUnits_7| HiddenUnits_8| HiddenUnits_9| HiddenUnits_-|
| | result | | runtime | (observed) | (estim.) | | Rate | | | | | | | | | | 10 |
| 1 | Error | NaN | 56.474 | NaN | NaN | 5 | 0.070007 | 199 | 5 | 45 | 152 | 149 | - | - | - | - | - |
| 2 | Error | NaN | 93.763 | NaN | NaN | 9 | 0.069531 | 168 | 197 | 99 | 174 | 116 | 118 | 9 | 147 | 60 | - |
Hope this helps someone else as well.
Thank you for pointing me in the right direction Mr. Weiss.

Sign in to comment.

More Answers (0)

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!