How to create a initialize function for a custom layer where the learnable parameters have same size of input?

I want to form a initialize function inside a custom layer where the learnable parameters have same size as the unknown input size. Is it possible? I understood from Define Custom Deep Learning Layer with Learnable Parameters - MATLAB & Simulink - MathWorks India that this can be acquired by utilizing networkDataLayout objects. For instance while creating the deep learning network matlab will analyze the network by using input having a batch size '1' and later on training it will change based on the batch size that we provide in the training options. Is there any way to initialize the custom layer in that way?

Answers (1)

I am assuming you are using MATLAB R2024b.
You can initialize such a layer by implementing the initialize() method of your custom layer. The initialize() method has two arguments, the layer object itself and an instance of networkDataLayout which represents the input layout of the layer (usually referred to as layout in the method's implementation):
function layer = initialize(layer,layout)
% (Optional) Initialize layer learnable and state parameters.
%
% Inputs:
% layer - Layer to initialize
% layout - Data layout, specified as a networkDataLayout
% object
%
% Outputs:
% layer - Initialized layer
%
% - For layers with multiple inputs, replace layout with
% layout1,...,layoutN, where N is the number of inputs.
% Define layer initialization function here.
end
To access the input size, you can use the Size property of the networkDataLayout object. When you train a network which contains your custom layer, MATLAB will automatically create a networkDataLayout object with the size of the incoming inputs to the layer and pass it to this method for layer initialization.
The example you shared also shows an implementation of the initialize() function where the parameters are initialized based on the channel dimension of the input: https://www.mathworks.com/help/deeplearning/ug/define-custom-deep-learning-layer.html#mw_0679ac65-be66-477c-9a76-912c32c1ab27.
You can adopt the example to use all the dimensions of the input. For example:
classdef customLayer < nnet.layer.Layer
properties (Learnable)
Parameter
end
% Other code
methods
function layer = initialize(layer, layout)
if isempty(layer.Parameter)
Parameter = randn(layout.Size);
end
end
end
end
If you'd like to test the initialization without creating a full-fledged network, you can do something like this:
% Define input size - This will vary based on what your layer does
inputSize = [224 224 3];
% Manually create a networkDataLayout object
layout = networkDataLayout(inputSize, "SSC");
% Create layer
layer = customLayer()
% Manually initialize the layer
layer = initialize(layer, layout);
% Check the size of the parameter
size(layer.Parameter) == inputSize;
Refer to the following resource for more information:
Hope this helps!

13 Comments

Thank you@Malay Agarwal for the response. I have tried: Parameter = randn(layout.Size); to initialize the parameter but this error is showing:
"The function
threw an error and could not be executed.
Error using randn
NaN and Inf not allowed."
I am using MATLAB R2023a
Could you share the implementation of the initialize() method and how you're calling it?
I hereby attach the initialize function file that I have used. Kindly check and thank you for the help.
The issue doesn't seem to be reproducible on my end. I am able to initialize the layer successfully:
layer = sample1();
inputSize = [224 224 3];
% Pass the input size and the input format to create the layout
layout = networkDataLayout(inputSize, "SSC");
layer = initialize(layer, layout)
layer =
sample1 with properties: Name: '' Learnable Parameters Wq: [224x224x3 dlarray] Wk: [224x224x3 dlarray] Wv: [224x224x3 dlarray] Wo: [224x224x3 dlarray] State Parameters No properties. Use properties method to see a list of all properties.
Make sure you're creating the networkDataLayout object correctly. You might be using a NaN in the input size to denote the batch dimension. Generally speaking, layer parameters do not depend on the batch dimension (due to vectorization). You can do something like this:
function layer = initialize(layer,layout)
% All dimensions except the last one, which is NaN
sz = layout.Size(1:end-1);
if isempty(layer.Wq)
layer.Wq = dlarray(rand(sz,'single'));
end
if isempty(layer.Wk)
layer.Wk = dlarray(rand(sz,'single'));
end
if isempty(layer.Wv)
layer.Wv = dlarray(rand(sz,'single'));
end
if isempty(layer.Wo)
layer.Wo = dlarray(rand(sz,'single'));
end
end
For a more robust solution, you can extract the specific dimensions you care about using the finddim function and initialize the parameters based on that: https://www.mathworks.com/help/releases/R2023a/deeplearning/ref/dlarray.finddim.html.
Thank you @Malay Agarwal actually my input data has a diamension of format "CB". Now the initialize function is not throwing the error. But in the predict function
function Z = predict(layer, X) %only part of the predict fuction is shown
X=stripdims(X);
S = pagemtimes(X,layer.Wq);
Z=dlarray(S,"CB");
end
error is thrown like:
"Error using pagemtimes Incorrect dimensions for matrix multiplication. Check that the number of columns in the first array matches the number of rows in the second array."
Even I used the transpose of the first and second variable inside pagemtimes that also is not working. Is there any problem with the layer to extract the batch diamension? Without taking the batch diamension how pagemtimes execution is possible because stripdims is required to eleminate the diamension format? Is there anything wrong I am doing in dealing with the batch diamension?
Intially, I utilized finddim function to extract specific diamension but when I checked the DAGNetwork after training the layers have learnable parameters of size 1-by-1 (singleton) thats why I came with the doubt.
To use pagemtimes, at least one of the matrices needs to have 3 or more dimensions: https://www.mathworks.com/help/releases/R2023a/matlab/ref/pagemtimes.html#mw_b21f0712-c0fb-44a9-b169-cdf4f3e00218.
If your input is in "CB" format, this is not possible and pagemtimes is not going to work.
Is there any reason why you're using pagemtimes? Maybe you can provide more details about the kind of layer you're creating such as the input it expects (images, sequences) and the equation for the predict function?
Inputs to the network are images which is in "SSCB" format but the input to the above-metioned custom layer is from fullyconnected layer (followed by flatten layer). So the format of the input diamension to the custom layer is "CB". I want to multiply the input of the custom layer (format- "CB") with the learnable parameters Wq, Wk, Wv and Wo like X*Wq, X*Wk, X*Wv and X*Wo. Is it possible to do the multiplication like this inside the predict function? Instead of pagemtimes * can be used for matrix multiplication but the error is same.
Actually I just checked and pagemtimes does work with 2D matrices and just does a normal multiplication between the matrices.
X = rand(3, 2);
Y = rand(2, 3);
pagemtimes(X, Y)
ans = 3×3
0.4673 0.5950 0.5225 0.4376 0.5502 0.4481 0.4874 0.6189 0.5352
<mw-icon class=""></mw-icon>
<mw-icon class=""></mw-icon>
This suggests the sizes of X and Wq are not compatible. If you want to do matrix multiplication, the number of columns in the left operand should be the same as the number of rows in the right operand. I assume you are initializing Wq as:
sz = layout.Size(1:end-1);
Wq = randn(sz);
Since your input is in "CB" format, sz is a single number. When randn is passed a single number for the size, it creates a square matrix.
randn(3)
ans = 3×3
-0.4397 0.1507 -0.6891 0.2824 -0.9414 0.6363 -0.8052 -0.4188 0.1079
<mw-icon class=""></mw-icon>
<mw-icon class=""></mw-icon>
So it's possible that Wq is actually . I think what you need is a matrix (row vector). You can do this instead in the initialize() method:
sz = layout.Size(1:end-1);
Wq = randn([1 sz]); % Create a row vector
Then in the predict() method, you can multiply as follows:
Z = layer.Wq * X; % 1xc * cxb
This will output a matrix in "CB" format.
Thank you @Malay Agarwal for your kind patience. Yes pagemtimes will work as normal matrix multipication for 2D inputs. I will elaborate my problem again:
  1. X is the input with formatted diamension "CB" with length M-by-N.
  2. Wq, Wk, Wv and Wo are learnable parameters. Every parameter is of size N-by-N.
  3. Q=X*Wq; K=X*Wk; V=X*Wv % Q, K and V are not formatted dlarray objects, so that, Q,K and V are obtained with sizes of M-by-N.
  4. A=attention(Q,K,V,NumHeads,AttentionMask="causal",DataFormat="CB") %so that, A will be of size M-by-N with dataformat "CB".
  5. A=stripdims(A);
Z=pagemtimes(A,layer.Wo);
Z=dlarray(Z,"CB");
So that, the ouput is of same size and format of input. So, if Wq, Wk, Wv and Wo can be defined using dlarray(rand(sz,'single')) where sz is of size N or N-by-N then the problem will be solved. I intially tried with finddim to extract the batch size but it is extracted as singleton. Is it possible to extract the batch diamension of the input in the custom layer?
If you are implementing attention, you actually need the channel dimension and not the batch dimension for the multiplication so the current dimensions of Wq are correct. You only need to reverse the order of the multiplication:
Z = layer.Wq * X;
In other frameworks, the batch dimension is usually the first dimension followed by the channels dimension, which might be causing a confusion here. In MATLAB, the channel dimension comes first followed by the batch.
You can take a look at this resource for understanding how attention works (here, the batch dimension is in the front): https://sebastianraschka.com/blog/2023/self-attention-from-scratch.html.
Thank you for spending time for this question. Yes, what you said is right, however, since I am using flatten layer before custom layer channel diamension is high. So, while training the system is going "out of memory". This is the reason I tried with extracting batch diamension. Is it possible to extract any diamensions other than channel diamension?
I think if you use the layer in a network, you can access the batch dimension just like any other dimension using finddim by passing "B" to the label argument. The issue with NaN that you were facing earlier was only because you were passing NaN as the last dimension in the input size.
inputSize = [3 NaN];
But if you do use the batch dimension, your implementation might not be correct and you may not get the results you're expecting.
Yes, may be this is right for higher diamensional input data with format like "SSCB", "SSCBT".... But in this case input is 2D so both rows and column will engage in multiplication operation so it may not affect the efficiency of the network. Anyway I will try that again.

Sign in to comment.

Asked:

on 25 Sep 2024

Commented:

on 26 Sep 2024

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!