Clear Filters
Clear Filters

Custom deep learning network - gradient function using dlfeval

9 views (last 30 days)
I want to create a custom deep learning training function, the output of which is an array Y. I have two inputs, the arrays X1 and X2. I want to find the gradient of Y with respect to X1 and X2.
This is my network:
layers1 = [
sequenceInputLayer(sizeInput,"Name","XTrain1")
fullyConnectedLayer(numHiddenDimension,"Name","fc_1")
softplusLayer('Name','s_1')];
layers2 = [
sequenceInputLayer(sizeInput,"Name","XTrain2")
fullyConnectedLayer(numHiddenDimension,"Name","fc_2")
softplusLayer('Name','s_2')];
lgraph = layerGraph(layers1);
lgraph = addLayers(lgraph,layers2); % connect layers -> 2 in, 1 out
add = additionLayer(2,'Name','add');
lgraph = addLayers(lgraph,add);
lgraph = connectLayers(lgraph,'s_1','add/in1');
lgraph = connectLayers(lgraph,'s_2','add/in2');
fc = fullyConnectedLayer(sizeInput,"Name","fc_3");
lgraph = addLayers(lgraph,fc);
lgraph = connectLayers(lgraph,'add','fc_3');
dlnet = dlnetwork(lgraph);
My should become my output. Then every iteration, I do:
dlX1 = dlarray(X1,'CTB');
dlX2 = dlarray(X2,'CTB');% to differentiate: dlarray/dlgradient
for i = 1:sizeInput
[gradx1(i), gradx2(i), dlY] = dlfeval(@modelGradientsX,dlnet,dlX1(i),dlX2(i)); % here is where I get my error
end
and I call my function , which is supposed to get the derivative of my output with respect to my inputs.
function [gradx1, gradx2, dlY] = modelGradientsX(dlnet,dlX1,dlX2)
dlY = forward(dlnet,dlX1,dlX2);
[gradx1, gradx2] = dlgradient(dlY,dlX1,dlX2);
end
And the error I get is: "Input data must be formatted dlarray objects". I have seen similar approaches in other examples (like this one: https://www.mathworks.com/matlabcentral/fileexchange/74760-image-classification-using-cnn-with-multi-input-cnn) so I don't understand - why is not the correct type of data?

Accepted Answer

Raunak Gupta
Raunak Gupta on 18 Jul 2020
Hi,
From the code I only see a syntax error on the following line
[gradx1(i), gradx2(i)] = dlfeval(@modelGradientsX,dlnet,dlX1(i),dlX2(i));
Here the modelGradientsX outputs three variables but you have assigned only gradx1 and gradx2 while calling it. This may be one issue. Other than that, I think loss should also be returned from the modelGradientsX function so that for next iteration the weights can be updated.
If still the error persist you may check that dlX1(i) and dlX2(i) are indeed a dlarray object because dlgradient only accept dlarray object.
  2 Comments
Iris Soa
Iris Soa on 19 Jul 2020
Edited: Iris Soa on 26 Jul 2020
Sir,
Thank you very much for your answer. I will reply to each of the ideas in turn:
  • On the line that you have emphasised
[gradx1(i), gradx2(i)] = dlfeval(@modelGradientsX,dlnet,dlX1(i),dlX2(i));
unfortunately I have just provided the incorrect code. In fact I am sending back three outputs. I have updated the issue I have opened to reflect this.
  • I see now that I should get a loss returned from my function, thank you very much for this. I think this is my problem. Thank you.
Iris Soa
Iris Soa on 23 Jul 2020
Here is the code that I am using to compare with my own, and it works for some reason...
iteration = 0;
start = tic;
% Loop over epochs.
for epoch = 1:numEpochs
% Shuffle data.
idx = randperm(numel(YTrain));
XTrain1 = XTrain1(:,:,:,idx);
XTrain2 = XTrain2(:,:,:,idx);
YTrain = YTrain(idx);
% Loop over mini-batches.
for i = 1:numIterationsPerEpoch
iteration = iteration + 1;
% Read mini-batch of data and convert the labels to dummy
% variables.
idx = (i-1)*miniBatchSize+1:i*miniBatchSize;
X1 = XTrain1(:,:,:,idx);
X2 = XTrain2(:,:,:,idx);
% convert the label into one-hot vector to calculate the loss
Y = zeros(numClasses, miniBatchSize, 'single');
for c = 1:numClasses
Y(c,YTrain(idx)==classes(c)) = 1;
end
% Convert mini-batch of data to dlarray.
dlX1 = dlarray(single(X1),'SSCB');
dlX2 = dlarray(single(X2),'SSCB');
% If training on a GPU, then convert data to gpuArray.
if (executionEnvironment == "auto" && canUseGPU) || executionEnvironment == "gpu"
dlX1 = gpuArray(dlX1);
dlX2 = gpuArray(dlX2);
end
%the traning loss and the gradients after the backpropagation were
%calculated using the helper function modelGradients_demo
% --------------- below: call to my dlfeval function, working -----------------
[gradients1,gradients2,gradients3,loss] = dlfeval(@modelGradients_demo,dlnet1,dlnet2,dlnet3,dlX1,dlX2,dlarray(Y));
% -----------------------------------------------------------------------------
learnRate = initialLearnRate/(1 + decay*iteration);
% Update the network parameters using the SGDM optimizer.
% Update the parameters in dlnet1 to 3 sequentially
[dlnet3.Learnables, velocity3] = sgdmupdate(dlnet3.Learnables, gradients3, velocity3, learnRate, momentum);
[dlnet2.Learnables, velocity2] = sgdmupdate(dlnet2.Learnables, gradients2, velocity2, learnRate, momentum);
[dlnet1.Learnables, velocity1] = sgdmupdate(dlnet1.Learnables, gradients1, velocity1, learnRate, momentum);
% Display the training progress.
D = duration(0,0,toc(start),'Format','hh:mm:ss');
addpoints(lineLossTrain,iteration,double(gather(extractdata(loss))))
title("Epoch: " + epoch + ", Elapsed: " + string(D))
drawnow
end
end
function dlnet=createLayer(XTrain,numHiddenDimension)
layers = [
imageInputLayer([14 28 1],"Name","imageinput","Mean",mean(XTrain,4))
convolution2dLayer([3 3],8,"Name","conv_1","Padding","same")
batchNormalizationLayer("Name","batchnorm_1")
reluLayer("Name","relu_1")
maxPooling2dLayer([2 2],"Name","maxpool_1","Stride",[2 2])
convolution2dLayer([3 3],16,"Name","conv_2","Padding","same")
batchNormalizationLayer("Name","batchnorm_2")
reluLayer("Name","relu_2")
maxPooling2dLayer([2 2],"Name","maxpool_2","Stride",[2 2])
convolution2dLayer([3 3],32,"Name","conv_3","Padding","same")
batchNormalizationLayer("Name","batchnorm_3")
reluLayer("Name","relu_3")
fullyConnectedLayer(numHiddenDimension,"Name","fc")];
lgraph = layerGraph(layers);
dlnet = dlnetwork(lgraph);
end
function dlnet=createLayerFullyConnect(numHiddenDimension)
layers = [
imageInputLayer([1 numHiddenDimension*2 1],"Name","imageinput","Normalization","none")
fullyConnectedLayer(20,"Name","fc_1")
fullyConnectedLayer(10,"Name","fc_2")];
lgraph = layerGraph(layers);
dlnet = dlnetwork(lgraph);
end
% ----------------- below - the function called by dlfeval, working --------------------
function [gradients1,gradients2,gradients3, loss] = modelGradients_demo(dlnet1,dlnet2,dlnet3,dlX1,dlX2,Y)
dlYPred1 = forward(dlnet1,dlX1);
dlYPred2 = forward(dlnet2,dlX2);
dlX_concat=[dlYPred1;dlYPred2];
dlX_concat=reshape(dlX_concat,[1 40, 1, 128]);%the value 128 corresponds the mini batch size
dlX_concat=dlarray(single(dlX_concat),'SSCB');
dlY_concat=forward(dlnet3,dlX_concat);
dlYPred_concat = softmax(dlY_concat);
loss = crossentropy(dlYPred_concat,Y);
[gradients1,gradients2,gradients3] = dlgradient(loss,dlnet1.Learnables,dlnet2.Learnables,dlnet3.Learnables);
end

Sign in to comment.

More Answers (1)

Iris Soa
Iris Soa on 27 Jul 2020
Derivative Trace
To evaluate a gradient numerically, a dlarray constructs a data structure for reverse mode differentiation, as described in Automatic Differentiation Background. This data structure is the trace of the derivative computation. Keep in mind these guidelines when using automatic differentiation and the derivative trace:
  • Do not introduce a new dlarray inside of an objective function calculation and attempt to differentiate with respect to that object. For example:function [dy,dy1] = fun(x1)
function [dy,dy1] = fun(x1)
x2 = dlarray(0);
y = x1 + x2;
dy = dlgradient(y,x2); % Error: x2 is untraced
dy1 = dlgradient(y,x1); % No error even though y has an untraced portion
end

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!