How to create a initialize function for a custom layer where the learnable parameters have same size of input?
Show older comments
I want to form a initialize function inside a custom layer where the learnable parameters have same size as the unknown input size. Is it possible? I understood from Define Custom Deep Learning Layer with Learnable Parameters - MATLAB & Simulink - MathWorks India that this can be acquired by utilizing networkDataLayout objects. For instance while creating the deep learning network matlab will analyze the network by using input having a batch size '1' and later on training it will change based on the batch size that we provide in the training options. Is there any way to initialize the custom layer in that way?
Answers (1)
Malay Agarwal
on 25 Sep 2024
Edited: Malay Agarwal
on 25 Sep 2024
I am assuming you are using MATLAB R2024b.
You can initialize such a layer by implementing the initialize() method of your custom layer. The initialize() method has two arguments, the layer object itself and an instance of networkDataLayout which represents the input layout of the layer (usually referred to as layout in the method's implementation):
function layer = initialize(layer,layout)
% (Optional) Initialize layer learnable and state parameters.
%
% Inputs:
% layer - Layer to initialize
% layout - Data layout, specified as a networkDataLayout
% object
%
% Outputs:
% layer - Initialized layer
%
% - For layers with multiple inputs, replace layout with
% layout1,...,layoutN, where N is the number of inputs.
% Define layer initialization function here.
end
To access the input size, you can use the Size property of the networkDataLayout object. When you train a network which contains your custom layer, MATLAB will automatically create a networkDataLayout object with the size of the incoming inputs to the layer and pass it to this method for layer initialization.
The example you shared also shows an implementation of the initialize() function where the parameters are initialized based on the channel dimension of the input: https://www.mathworks.com/help/deeplearning/ug/define-custom-deep-learning-layer.html#mw_0679ac65-be66-477c-9a76-912c32c1ab27.
You can adopt the example to use all the dimensions of the input. For example:
classdef customLayer < nnet.layer.Layer
properties (Learnable)
Parameter
end
% Other code
methods
function layer = initialize(layer, layout)
if isempty(layer.Parameter)
Parameter = randn(layout.Size);
end
end
end
end
If you'd like to test the initialization without creating a full-fledged network, you can do something like this:
% Define input size - This will vary based on what your layer does
inputSize = [224 224 3];
% Manually create a networkDataLayout object
layout = networkDataLayout(inputSize, "SSC");
% Create layer
layer = customLayer()
% Manually initialize the layer
layer = initialize(layer, layout);
% Check the size of the parameter
size(layer.Parameter) == inputSize;
Refer to the following resource for more information:
- networkDataLayout documentation - https://www.mathworks.com/help/deeplearning/ref/networkdatalayout.html
Hope this helps!
13 Comments
BIPIN SAMUEL
on 25 Sep 2024
Edited: BIPIN SAMUEL
on 25 Sep 2024
Malay Agarwal
on 25 Sep 2024
Edited: Malay Agarwal
on 25 Sep 2024
Could you share the implementation of the initialize() method and how you're calling it?
BIPIN SAMUEL
on 25 Sep 2024
Malay Agarwal
on 25 Sep 2024
Edited: Malay Agarwal
on 25 Sep 2024
The issue doesn't seem to be reproducible on my end. I am able to initialize the layer successfully:
layer = sample1();
inputSize = [224 224 3];
% Pass the input size and the input format to create the layout
layout = networkDataLayout(inputSize, "SSC");
layer = initialize(layer, layout)
Make sure you're creating the networkDataLayout object correctly. You might be using a NaN in the input size to denote the batch dimension. Generally speaking, layer parameters do not depend on the batch dimension (due to vectorization). You can do something like this:
function layer = initialize(layer,layout)
% All dimensions except the last one, which is NaN
sz = layout.Size(1:end-1);
if isempty(layer.Wq)
layer.Wq = dlarray(rand(sz,'single'));
end
if isempty(layer.Wk)
layer.Wk = dlarray(rand(sz,'single'));
end
if isempty(layer.Wv)
layer.Wv = dlarray(rand(sz,'single'));
end
if isempty(layer.Wo)
layer.Wo = dlarray(rand(sz,'single'));
end
end
For a more robust solution, you can extract the specific dimensions you care about using the finddim function and initialize the parameters based on that: https://www.mathworks.com/help/releases/R2023a/deeplearning/ref/dlarray.finddim.html.
BIPIN SAMUEL
on 25 Sep 2024
Malay Agarwal
on 25 Sep 2024
Edited: Malay Agarwal
on 25 Sep 2024
To use pagemtimes, at least one of the matrices needs to have 3 or more dimensions: https://www.mathworks.com/help/releases/R2023a/matlab/ref/pagemtimes.html#mw_b21f0712-c0fb-44a9-b169-cdf4f3e00218.
If your input is in "CB" format, this is not possible and pagemtimes is not going to work.
Is there any reason why you're using pagemtimes? Maybe you can provide more details about the kind of layer you're creating such as the input it expects (images, sequences) and the equation for the predict function?
BIPIN SAMUEL
on 25 Sep 2024
Malay Agarwal
on 25 Sep 2024
Edited: Malay Agarwal
on 25 Sep 2024
Actually I just checked and pagemtimes does work with 2D matrices and just does a normal multiplication between the matrices.
X = rand(3, 2);
Y = rand(2, 3);
pagemtimes(X, Y)
This suggests the sizes of X and Wq are not compatible. If you want to do matrix multiplication, the number of columns in the left operand should be the same as the number of rows in the right operand. I assume you are initializing Wq as:
sz = layout.Size(1:end-1);
Wq = randn(sz);
Since your input is in "CB" format, sz is a single number. When randn is passed a single number for the size, it creates a square matrix.
randn(3)
So it's possible that Wq is actually
. I think what you need is a
matrix (row vector). You can do this instead in the initialize() method:
sz = layout.Size(1:end-1);
Wq = randn([1 sz]); % Create a row vector
Then in the predict() method, you can multiply as follows:
Z = layer.Wq * X; % 1xc * cxb
This will output a matrix in "CB" format.
BIPIN SAMUEL
on 25 Sep 2024
Malay Agarwal
on 25 Sep 2024
Edited: Malay Agarwal
on 25 Sep 2024
If you are implementing attention, you actually need the channel dimension and not the batch dimension for the multiplication so the current dimensions of Wq are correct. You only need to reverse the order of the multiplication:
Z = layer.Wq * X;
In other frameworks, the batch dimension is usually the first dimension followed by the channels dimension, which might be causing a confusion here. In MATLAB, the channel dimension comes first followed by the batch.
You can take a look at this resource for understanding how attention works (here, the batch dimension is in the front): https://sebastianraschka.com/blog/2023/self-attention-from-scratch.html.
BIPIN SAMUEL
on 25 Sep 2024
Malay Agarwal
on 25 Sep 2024
Edited: Malay Agarwal
on 25 Sep 2024
I think if you use the layer in a network, you can access the batch dimension just like any other dimension using finddim by passing "B" to the label argument. The issue with NaN that you were facing earlier was only because you were passing NaN as the last dimension in the input size.
inputSize = [3 NaN];
But if you do use the batch dimension, your implementation might not be correct and you may not get the results you're expecting.
BIPIN SAMUEL
on 26 Sep 2024
Categories
Find more on Tuning in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!