Can automatic differentiation in a custom deep learning layer keep track of the random numbers generated in the forward function of the layer?

2 views (last 30 days)
I'm trying to create a gating neural network (NN) to use in a Mixture of Experts (MoE) settings a schematic of similar to what is shown bellow.
The MOE network will output probabilities of selecting each expert and the gate network (that I'm building) will pick one expert based on those probabilities stochastically at training time (only).
Since the behavior of the gate network is stochastic at training time, its forward function will generate a random vector every time it is evoked.
My understanding is that I also have to keep track of this random vector and use it in a backward function since if I leave the job of the backward function to automatic differentiation another random number will be generated at backpropagation and hence will ruin my training. (Right?)
My problem is that I'm not sure how I can keep track of this random vector. There are 3 possibilities in my opinion
  1. Create an ordinary properties for the random number so that, this number can be recalled each time the backward function is called. My attempts to do this have, so far, failed as it seems that custom NN layers strangly do not keep the property as the program runs. (Maybe it is due to the fact that such objects aren't handle objects?)
  2. Use the memory property of the custom layers. This is not allowed by the compiler as it seems using memory in dlnetworks is not permitted for some reason!
  3. Use a state property. In such a case, I will also have to provide derivatives of such a state property. However, I do not want the compiler to make any changes to the state so providing the derivative is meaningless in this case.
How can I solve this problem?

Answers (1)

Katja Mogalle
Katja Mogalle on 21 Jan 2022
The automatic differentiation framework stores the actual random numbers generated during the forward pass and uses them directly during the backward pass. So you shouldn't have to do anything special and you can make use of automatic differentiation in your custom layer (by not defining your own backward function).
You can also read a bit more about automatic differentiation in MATLAB here:
There it says: "In other words, automatic differentiation evaluates derivatives at particular numeric values; it does not construct symbolic expressions for derivatives." Maybe this piece of information helps with understanding the behaviour when using random numbers.
I also put together a small example to illustrate what I mean. It is quite simplified from your example but hopefully you can use it to better understand or play around with autodiff framework and can transfer the idea to your implementation:
% Construct a simple network with some learnable layers and a custom layer
% in the middle which sets some channels of the data to zero.
layers = [ featureInputLayer(10)
net = dlnetwork(layers);
in = dlarray(rand(10,3),'CB');
% Now let's compute gradients. Note that the custom layer does not specify
% a backward function and hence automatic differentiation is used.
for i=1:5
disp("Execution #"+i)
% Every time we do a forward pass a different channel is dropped. The
% gradient of the custom layer's output with respect to its input
% contains zeros in the same channel that was randomly dropped during
% forward.
[layerOutput,layerGrad] = dlfeval(@customLayerGradients,net,in)
function [layerOutput,grad] = customLayerGradients(net,in)
% Compute gradients of the custom layer output with respect to its input.
% This gradient is used for backpropagation through the whole network.
[layerInput,layerOutput] = net.forward(in,Outputs=["fc1","channelDrop"]);
combinedOutput = sum(layerOutput,'all');
grad = dlgradient(combinedOutput,layerInput);
And here is the definition of the custom layer:
classdef randomChannelDropLayer < nnet.layer.Layer
% randomChannelDropLayer sets one randomly selected input channel to
% all zeros during training. The data passes through the layer
% unchanged during prediction.
function layer = randomChannelDropLayer(numChannels,name)
layer.NumChannels = numChannels;
layer.Name = name;
function Y = forward(layer,X)
channelToDrop = randi(layer.NumChannels,1);
Y = X;
Y(channelToDrop,:,:) = 0;
function Y = predict(~,X)
Y = X;




Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!