Bayesian Optimization: How should we parameterize hidden units for changing number of layers (depth) of a BiLSTM network using bayesopt?

Question

Yildirim Kocoglu on 2 Nov 2020

0
Link

Direct link to this question

https://nl.mathworks.com/matlabcentral/answers/634459-bayesian-optimization-how-should-we-parameterize-hidden-units-for-changing-number-of-layers-depth

Edited: Yildirim Kocoglu on 9 Nov 2020

Hi there,

I have been trying to use bayesian optimization for tuning my hyperparameters in my BiLSTM code (Hope this code helps some of the community because I saw unanswered questions on matlab related to LSTM bayesian optimization (similar to BiLSTM)).

In my code, one of the parameters I'm changing is depth of the BiLSTM network but, I should also try to find the best number of hidden units for each layer I think.

As you can see in the code, the maximum number of layers I want to test is 10 layers, so I created (HiddenUnits_1 --> HiddenUnits_10) under optimVars but, this number also depends on the number of layers we have in the network. For example: If a 5 layer (BiLSTM layers only) network needs to be adjusted, there should be 5 variables for hidden units (HiddenUnits_1 --> HiddenUnits_5) and the rest of the parameters (HiddenUnits_6 --> HiddenUnits_10) should not exist for that particular "experiment". I ran the code successfully but, it is trying to optimize for all 10 hidden units even if the layer size is smaller. Is there a way to avoid optimizing for unnecessary variables such as in this case (ignore hidden units 6-10 if there are only 5 layers in the current point being evaluated)?

Also, a little off topic but, related: Is there a way to optimize these hidden units in an array or a cell? Basically, can I write a cell array to be optimized with each cell being the different hidden units variables (HiddenUnits_1-HiddenUnits_10)? The reason I want to see if this is possible is becase I can modify the code to accept hidden units automatically from a cell array and I will not have to mention each hidden unit separetely because I can make that number dependent on the number of BiLSTM layers I believe (not tried it yet).

Thank you, any help or suggestions are appreciated.

Here is the code I have written for it:

%% Bayesian Optimization
optimVars = [
    optimizableVariable('SectionDepth',[1 10],'Type','integer')
    optimizableVariable('InitialLearnRate',[1e-2 1],'Transform','log')
    optimizableVariable('HiddenUnits_1',  [1 200], 'Type', 'integer')
    optimizableVariable('HiddenUnits_2',  [1 200], 'Type', 'integer')
    optimizableVariable('HiddenUnits_3',  [1 200], 'Type', 'integer')
    optimizableVariable('HiddenUnits_4',  [1 200], 'Type', 'integer')
    optimizableVariable('HiddenUnits_5',  [1 200], 'Type', 'integer')
    optimizableVariable('HiddenUnits_6',  [1 200], 'Type', 'integer')
    optimizableVariable('HiddenUnits_7',  [1 200], 'Type', 'integer')
    optimizableVariable('HiddenUnits_8',  [1 200], 'Type', 'integer')
    optimizableVariable('HiddenUnits_9',  [1 200], 'Type', 'integer')
    optimizableVariable('HiddenUnits_10', [1 200], 'Type', 'integer')];
ObjFcn = makeObjFcn(Noisy_XTrain_PLE,Noisy_YTrain_PLE,PLE_Predictions_40_train,PLE_Predictions_40_test,mu_PLE,std_PLE);
% Perform bayesian optimization by minimizing error on validation set.
% Minimum of 30 runs is suggested for bayesian optimization (more can lead to better results).
BayesObject = bayesopt(ObjFcn,optimVars, ...
    'MaxObj',30, ...
    'MaxTime',14*60*60, ...
    'IsObjectiveDeterministic',false, ...
    'UseParallel',false);
% Load the best network found in optimization and load the filename
bestIdx = BayesObject.IndexOfMinimumTrace(end);
fileName = BayesObject.UserDataTrace{bestIdx};
savedStruct = load(fileName);
% Print validation error
TrainError = savedStruct.TotaltrainingError
valError = savedStruct.TotalvalError
%% Define the objective function for optimization
function ObjFcn = makeObjFcn(XTrain,YTrain,PLE_Predictions_training,PLE_Predictions_test,mu_PLE,std_PLE)
ObjFcn = @valErrorFun;
    function [TotalvalError,cons,fileName] = valErrorFun(optVars)
        
        % Create cell array of valError to save the validation error values
        valError = cell(510,1);
        TrainingError = cell(510,1);
        
        % Random seed
        seed = 100;
        rng(seed);
        
        % Input - Output features
        numFeatures = 1;
        numResponses = 1;
        
        % Hyperparameters
        miniBatchSize = 1;
        %numHiddenUnits = 50;
        x = 0;
        y = 1;
        maxEpochs = 1;
        
        
        
        % Layer structure
        layers = [
            sequenceInputLayer(numFeatures)
            bilstmBlock(optVars.SectionDepth,optVars.HiddenUnits_1,optVars.HiddenUnits_2,optVars.HiddenUnits_3,optVars.HiddenUnits_4,optVars.HiddenUnits_5,optVars.HiddenUnits_6,optVars.HiddenUnits_7,optVars.HiddenUnits_8,optVars.HiddenUnits_9,optVars.HiddenUnits_10,x,y) % Function
            dropoutLayer(0)
            % Add the fully connected layer and the final softmax and
            % classification layers.
            fullyConnectedLayer(numResponses,'BiasInitializer','ones','WeightsInitializer',@(sz) normrnd(x,y,sz))
            regressionLayer];
        
        % Training options
        options = trainingOptions('adam', ...
            'InitialLearnRate',optVars.InitialLearnRate, ...
            'GradientThreshold',1, ...
            'MaxEpochs',maxEpochs, ...
            'ExecutionEnvironment','gpu', ...
            'LearnRateSchedule','piecewise', ...
            'LearnRateDropPeriod',125, ...
            'LearnRateDropFactor',1, ...
            'MiniBatchSize',miniBatchSize, ...
            'Shuffle','never', ...
            'Verbose',false, ...
            'Plots','training-progress');
        
        % Train network
        net = trainNetwork(XTrain, YTrain, layers, options);
        
        % Forecast future values
        
        for i = 450:510 
            net = resetState(net); % Testing this reset option
            [net,XPred] = predictAndUpdateState(net,XTrain(i,:),'MiniBatchSize', 1);
            
            Ending = cellfun(@(x) x(end), YTrain(i,:), 'UniformOutput', false);
            
            % Then Update the state again on the last point of Ytrain to get the next state update
            
            [net,YPred] = predictAndUpdateState(net,Ending,'MiniBatchSize',1);
            
            % Repeat the predictAndUpdateState in a for loop to get the next time steps (Forecast into the future)
            
            for j = 2:40 % Need to change this to account for remaining months for each well
                [net,YPred(:,j)] = predictAndUpdateState(net,YPred(:,j-1),'MiniBatchSize', 1,'ExecutionEnvironment','gpu');
            end
            
            % Convert cell to matrix since the amount of predictions is the same (not the total amount for each well but, the next 5 years for example)
            YPred_new = cell2mat(YPred);
            mu_3 = cell2mat(mu_PLE);
            std_3 = cell2mat(std_PLE);
            
            De_normalized_YPred = YPred_new.*std_3(i,:) + mu_3(i,:);
            De_normalized_Xpred = cellfun(@(x,y,z) x.*y + z, std_PLE (i,1), XPred, mu_PLE (i,1), 'UniformOutput', false);
            
            % Test PLE
            PLE_test = cell2mat(PLE_Predictions_test(i,1));
            
            % Training PLE
            PLE_Predictions_train = cellfun(@(x) x(:,end-1), PLE_Predictions_training, 'UniformOutput', false);
            PLE_train = cell2mat(PLE_Predictions_train(i,1));
            
            valError{i,1} = mean((PLE_test(1,1:40) - De_normalized_YPred).^2);
            TrainingError{i,1} = mean((PLE_train(1,:) - cell2mat(De_normalized_Xpred(:))).^2);
        end
        
        TotaltrainingError = sum([TrainingError{:}]);
        TotalvalError = sum([valError{:}]);
        
        
        fileName = num2str(TotaltrainingError) + "_" + num2str(TotalvalError) + ".mat";
        save(fileName,'net','TotalvalError','TotaltrainingError','options','layers')
        
        % Constraints
        cons = [];
        
    end
end
%% Define a function for creating deeper networks
function layersan = bilstmBlock(numBiLSTMLayers,HiddenUnits_1,HiddenUnits_2,HiddenUnits_3,HiddenUnits_4,HiddenUnits_5,HiddenUnits_6,HiddenUnits_7,HiddenUnits_8,HiddenUnits_9,HiddenUnits_10,x,y)
numHiddenUnits = [HiddenUnits_1,HiddenUnits_2,HiddenUnits_3,HiddenUnits_4,HiddenUnits_5,HiddenUnits_6,HiddenUnits_7,HiddenUnits_8,HiddenUnits_9,HiddenUnits_10];
layersan = [];
for i = 1:numBiLSTMLayers
    layers = bilstmLayer(numHiddenUnits(1,i),'BiasInitializer','ones','OutputMode','sequence','InputWeightsInitializer',@(sz) normrnd(x,y,sz),'RecurrentWeightsInitializer',@(sz) normrnd(x,y,sz));
    layersan = [layersan; layers];
end
end

0 Comments
Show -2 older commentsHide -2 older comments

Sign in to comment.

Sign in to answer this question.

Answer 1

Alan Weiss on 3 Nov 2020

0
Link

Direct link to this answer

https://nl.mathworks.com/matlabcentral/answers/634459-bayesian-optimization-how-should-we-parameterize-hidden-units-for-changing-number-of-layers-depth#answer_532759

I believe that you can perform the optimization the way you want using conditional constraints. If M is the number of layers that you are using, then set the values of all parameters in layers M+1 through 10 to some default value so that they are not optimized.

As for your second question, I am sorry but I do not understand exactly what you are asking. Maybe you are asking if you can run a subsidiary optimization inside your objective function. The answer to that is of course yes, you can write anything you want inside your objective function, including another call to bayesopt. But perhaps i misunderstand what you are asking.

Good luck,

Alan Weiss

MATLAB mathematical toolbox documentation

4 Comments
Show 2 older commentsHide 2 older comments

Yildirim Kocoglu on 8 Nov 2020

Open in MATLAB Online

Thank you Mr. Weiss for your answer. I will check the conditional constraints and try to implement it as you mentioned and post my solution if I get it right.

What I meant by my 2nd question was:

Creating a cell array variable of HiddenUnits to optimize such as below:

% Create a cell array of variables (HiddenUnits) to optimize
HiddenUnits = repmat({[1 200]},10,1);
% Layer size range to optimize
Layersize = [1 10];
optimVars = [
    optimizeVariable('HiddenUnits', HiddenUnits, 'Type', 'integer')
    optimizeVariable('Layersize', Layersize, 'Type', 'integer') ];

In this case, the HiddenUnits variable would be a cell array contaning the range "[1 200]" for all 10 HiddenUnits for 10 layers (maximum number of layers) but, I'm not sure if this is possible.

Also, if it's possible, I'm not certain what would happen when I try to apply conditional constraints to it.

I hope this clarifies my question.

The reason I was trying to do something like this is because it would make the code much easier to write and read because I most likely will keep a similar range for all layers if I don't know any better (especially if I need a very deep network (not necessarily BiLSTM)).

Yildirim Kocoglu on 8 Nov 2020

Open in MATLAB Online

Mr. Weiss,

I looked into conditional constraints but, could not exactly figure out how I would write it since the problem is a little different or I might have not completely understood when you say "If M is the number of layers that you are using, then set the values of all parameters in layers M+1 through 10 to some default value so that they are not optimized."

Can you please clarify your earlier answer for me? My understanding--> I believe what you are saying is: if a random observation for layer size is picked (for example: layersize = 3) then it should only optimize HiddenUnits_1, HiddenUnits_2, HiddenUnits_3 and ignore the remaining HiddenUnits_4 --> HiddenUnits_10 since it will be set to a default value (for example: HiddenUnits_4--> HiddenUnits_10 = 200 (default value)) ? Also, since HiddenUnits_4 --> HiddenUnits_10 has no effect (because layersize = 3), it should work fine?

I believe I should pass the function I write such as in the example into bayesopt() as an input argument right?

I still can't understand how it will know which parameters to ignore (not optimize for that specific observation) depending on the number of layers (random observation).

The below example is given in conditional constraints documentation but, I can't clearly see a way to modify it to fit to my problem (perhaps because I'm new to it):

Xnew = condvariablefcn(X)
function Xnew = condvariablefcn(X)
Xnew = X;
Xnew.PolynomialOrder(Xnew.KernelFunction ~= 'polynomial') = NaN; 

In any case, I'll really appreciate it if you can give me a simple example.

Thank you.

Yildirim Kocoglu on 9 Nov 2020

Open in MATLAB Online

After more research, I tried changing my code like this below just to test it first (only changing HiddenUnits_10 value to a default value = NaN):

BayesObject = bayesopt(ObjFcn,optimVars, ...
    'MaxObj',5, ...
    'MaxTime',14*60*60, ...
    'IsObjectiveDeterministic',false, ...
    'ConditionalVariableFcn',@condvariablefcn, ...
    'UseParallel',false);
%% Conditional constraints function
function optVarsnew = condvariablefcn(optVars)
    
    optVarsnew = optVars;    
    
    M = optVarsnew.SectionDepth;
    if M < 10
        optVarsnew.HiddenUnits_10  = NaN;
    end
    
end

The 1st iteration worked and gave me the output below (notice HiddenUnits_10 = NaN):

|

===========================================================================================================================================================================================================================================================|
| Iter | Eval   | Objective   | Objective   | BestSoFar   | BestSoFar   | SectionDepth | InitialLearn-| HiddenUnits_1| HiddenUnits_2| HiddenUnits_3| HiddenUnits_4| HiddenUnits_5| HiddenUnits_6| HiddenUnits_7| HiddenUnits_8| HiddenUnits_9| HiddenUnits_-|
|      | result |             | runtime     | (observed)  | (estim.)    |              | Rate         |              |              |              |              |              |              |              |              |              | 10           |
|===========================================================================================================================================================================================================================================================|
|    1 | Error  |         NaN |      55.666 |         NaN |         NaN |            5 |     0.070007 |          199 |            5 |           45 |          152 |          149 |          199 |          115 |           81 |          102 |            - |

The 2nd iteration however, gave me an error as below (it is a long error and I did not want to post everything unless I'm asked to):

Error using Create_Save_Sorted_Sequences>condvariablefcn (line 568)
To assign to or create a variable in a table, the number of rows must match the height of the table.

Does anyone happen to know the reason for this error?

Any help is appreciated.

Thank you.

Yildirim Kocoglu on 9 Nov 2020

Edited: Yildirim Kocoglu on 9 Nov 2020

Open in MATLAB Online

I think I finally figured it out.

What I missed (or rather was not very clear in the documentation) was that as the number of points observed grows (next iteration), a table in the background also grows in rows at each iteration. I'll show an example of this table in a second but, before that I want to show how to write the conditional constraints correctly, I want to show the actual changes in the code. It was written similar to the documentation but, the reasons for the way it's written was not clearly explained.

Here is what I changed in my code to take care of this:

% Look inside the Bayesobject (or whatever you called it) by running without using conditional constraints first to see exactly what happened inside (it holds many details of bayesopt including the table I mentioned)
BayesObject = bayesopt(ObjFcn,optimVars, ...
    'MaxObj',2, ...
    'MaxTime',14*60*60, ...
    'IsObjectiveDeterministic',false, ...
    'ConditionalVariableFcn',@condvariablefcn, ... % Don't forget this part and make sure its name matches your written function (it is passed as a function handle)
    'UseParallel',false);
function Xnew = condvariablefcn(X)
    % X is a table in the background and Xnew is now assigned as the same table
    Xnew = X;    
    
    % For loop goes through each column of the table Xnew
    % Xnew is now a table (rather than optimVars I wrote earlier) --> name does not matter really but, tables need to be accessed using the rows.
    % Xnew.(i) is the column of the table mentioned earlier which has the names assigned in optimVars and (Xnew.SectionDepth < i-2) is looking into all the rows where "SectionDepth < (chosen layer size at next point observation) and assigns it to a value of "NaN
    for i = 3:12
    Xnew.(i)(Xnew.SectionDepth < i-2)  = NaN; % I have 12 variables and the 1st and 2nd variables (i=1,2) are 'SectionDepth' and 'learningrate' and the rest are HiddenUnits_1--> HiddenUnits_10
    end
    
end

Here is the example of the table inside BayesObject (in this case I just did 2 iterations by using 'MaxObj' = 2) but, it correctly assigned a default value to HiddenUnits based on number of layers.

Here is my verbose output (matches the table):

|===========================================================================================================================================================================================================================================================|
| Iter | Eval   | Objective   | Objective   | BestSoFar   | BestSoFar   | SectionDepth | InitialLearn-| HiddenUnits_1| HiddenUnits_2| HiddenUnits_3| HiddenUnits_4| HiddenUnits_5| HiddenUnits_6| HiddenUnits_7| HiddenUnits_8| HiddenUnits_9| HiddenUnits_-|
|      | result |             | runtime     | (observed)  | (estim.)    |              | Rate         |              |              |              |              |              |              |              |              |              | 10           |
|===========================================================================================================================================================================================================================================================|
|    1 | Error  |         NaN |      56.474 |         NaN |         NaN |            5 |     0.070007 |          199 |            5 |           45 |          152 |          149 |            - |            - |            - |            - |            - |
|    2 | Error  |         NaN |      93.763 |         NaN |         NaN |            9 |     0.069531 |          168 |          197 |           99 |          174 |          116 |          118 |            9 |          147 |           60 |            - |

Hope this helps someone else as well.

Thank you for pointing me in the right direction Mr. Weiss.

Sign in to comment.

Bayesian Optimization: How should we parameterize hidden units for changing number of layers (depth) of a BiLSTM network using bayesopt?

0 Comments
Show -2 older commentsHide -2 older comments

Accepted Answer

4 Comments
Show 2 older commentsHide 2 older comments

More Answers (0)

See Also

Categories

Tags

Products

Release

Community Treasure Hunt

Bayesian Optimization: How should we parameterize hidden units for changing number of layers (depth) of a BiLSTM network using bayesopt?

0 Comments Show -2 older commentsHide -2 older comments

Accepted Answer

4 Comments Show 2 older commentsHide 2 older comments

More Answers (0)

See Also

Categories

Tags

Products

Release

Community Treasure Hunt

0 Comments
Show -2 older commentsHide -2 older comments

4 Comments
Show 2 older commentsHide 2 older comments