Custom deep learning network - gradient function using dlfeval

Question

Iris Soa on 15 Jul 2020

0
Link

Direct link to this question

https://nl.mathworks.com/matlabcentral/answers/565595-custom-deep-learning-network-gradient-function-using-dlfeval

Answered: Iris Soa on 27 Jul 2020

I want to create a custom deep learning training function, the output of which is an array Y. I have two inputs, the arrays X1 and X2. I want to find the gradient of Y with respect to X1 and X2.

This is my network:

layers1 = [
    sequenceInputLayer(sizeInput,"Name","XTrain1")
    fullyConnectedLayer(numHiddenDimension,"Name","fc_1")
    softplusLayer('Name','s_1')];
layers2 = [
    sequenceInputLayer(sizeInput,"Name","XTrain2")
    fullyConnectedLayer(numHiddenDimension,"Name","fc_2")
    softplusLayer('Name','s_2')];
lgraph = layerGraph(layers1); 
lgraph = addLayers(lgraph,layers2); % connect layers -> 2 in, 1 out
add = additionLayer(2,'Name','add');
lgraph = addLayers(lgraph,add); 
lgraph = connectLayers(lgraph,'s_1','add/in1');
lgraph = connectLayers(lgraph,'s_2','add/in2');
fc = fullyConnectedLayer(sizeInput,"Name","fc_3");
lgraph = addLayers(lgraph,fc);
lgraph = connectLayers(lgraph,'add','fc_3');
dlnet = dlnetwork(lgraph);

My

should become my output. Then every iteration, I do:

dlX1 = dlarray(X1,'CTB'); 
dlX2 = dlarray(X2,'CTB');% to differentiate: dlarray/dlgradient
for i = 1:sizeInput
    [gradx1(i), gradx2(i), dlY] = dlfeval(@modelGradientsX,dlnet,dlX1(i),dlX2(i)); % here is where I get my error
end

and I call my function

, which is supposed to get the derivative of my output with respect to my inputs.

function [gradx1, gradx2, dlY] = modelGradientsX(dlnet,dlX1,dlX2)
    dlY = forward(dlnet,dlX1,dlX2); 
    [gradx1, gradx2] = dlgradient(dlY,dlX1,dlX2);
end

And the error I get is: "Input data must be formatted dlarray objects". I have seen similar approaches in other examples (like this one: https://www.mathworks.com/matlabcentral/fileexchange/74760-image-classification-using-cnn-with-multi-input-cnn) so I don't understand - why is

not the correct type of data?

0 Comments
Show -2 older commentsHide -2 older comments

Sign in to comment.

Sign in to answer this question.

Answer 1

Raunak Gupta on 18 Jul 2020

0
Link

Direct link to this answer

https://nl.mathworks.com/matlabcentral/answers/565595-custom-deep-learning-network-gradient-function-using-dlfeval#answer_467433

Open in MATLAB Online

Hi,

From the code I only see a syntax error on the following line

[gradx1(i), gradx2(i)] = dlfeval(@modelGradientsX,dlnet,dlX1(i),dlX2(i));

Here the modelGradientsX outputs three variables but you have assigned only gradx1 and gradx2 while calling it. This may be one issue. Other than that, I think loss should also be returned from the modelGradientsX function so that for next iteration the weights can be updated.

If still the error persist you may check that dlX1(i) and dlX2(i) are indeed a dlarray object because dlgradient only accept dlarray object.

2 Comments
Show NoneHide None

Iris Soa on 19 Jul 2020

Edited: Iris Soa on 26 Jul 2020

Open in MATLAB Online

Sir,

Thank you very much for your answer. I will reply to each of the ideas in turn:

On the line that you have emphasised

[gradx1(i), gradx2(i)] = dlfeval(@modelGradientsX,dlnet,dlX1(i),dlX2(i));

unfortunately I have just provided the incorrect code. In fact I am sending back three outputs. I have updated the issue I have opened to reflect this.

I see now that I should get a loss returned from my function, thank you very much for this. I think this is my problem. Thank you.

Iris Soa on 23 Jul 2020

Open in MATLAB Online

Here is the code that I am using to compare with my own, and it works for some reason...

iteration = 0;
start = tic;
% Loop over epochs.
for epoch = 1:numEpochs
    % Shuffle data.
    idx = randperm(numel(YTrain));
    XTrain1 = XTrain1(:,:,:,idx);
    XTrain2 = XTrain2(:,:,:,idx);
    YTrain = YTrain(idx);
    
    % Loop over mini-batches.
     for i = 1:numIterationsPerEpoch
        iteration = iteration + 1;
        
        % Read mini-batch of data and convert the labels to dummy
        % variables.
        idx = (i-1)*miniBatchSize+1:i*miniBatchSize;
        X1 = XTrain1(:,:,:,idx);
        X2 = XTrain2(:,:,:,idx);
        % convert the label into one-hot vector to calculate the loss
        Y = zeros(numClasses, miniBatchSize, 'single');
        for c = 1:numClasses
            Y(c,YTrain(idx)==classes(c)) = 1;
        end
        
        % Convert mini-batch of data to dlarray.
        dlX1 = dlarray(single(X1),'SSCB');
        dlX2 = dlarray(single(X2),'SSCB');
        
        % If training on a GPU, then convert data to gpuArray.
        if (executionEnvironment == "auto" && canUseGPU) || executionEnvironment == "gpu"
            dlX1 = gpuArray(dlX1);
            dlX2 = gpuArray(dlX2);
        end
        %the traning loss and the gradients after the backpropagation were
        %calculated using the helper function modelGradients_demo
        
        % --------------- below: call to my dlfeval function, working -----------------
        [gradients1,gradients2,gradients3,loss] = dlfeval(@modelGradients_demo,dlnet1,dlnet2,dlnet3,dlX1,dlX2,dlarray(Y));
        % -----------------------------------------------------------------------------
        learnRate = initialLearnRate/(1 + decay*iteration);
        % Update the network parameters using the SGDM optimizer.
        % Update the parameters in dlnet1 to 3 sequentially 
        [dlnet3.Learnables, velocity3] = sgdmupdate(dlnet3.Learnables, gradients3, velocity3, learnRate, momentum);
        [dlnet2.Learnables, velocity2] = sgdmupdate(dlnet2.Learnables, gradients2, velocity2, learnRate, momentum);
        [dlnet1.Learnables, velocity1] = sgdmupdate(dlnet1.Learnables, gradients1, velocity1, learnRate, momentum);
        % Display the training progress.
        D = duration(0,0,toc(start),'Format','hh:mm:ss');
            addpoints(lineLossTrain,iteration,double(gather(extractdata(loss))))
            title("Epoch: " + epoch + ", Elapsed: " + string(D))
            drawnow
    end
end
function dlnet=createLayer(XTrain,numHiddenDimension)
layers = [
    imageInputLayer([14 28 1],"Name","imageinput","Mean",mean(XTrain,4))
    convolution2dLayer([3 3],8,"Name","conv_1","Padding","same")
    batchNormalizationLayer("Name","batchnorm_1")
    reluLayer("Name","relu_1")
    maxPooling2dLayer([2 2],"Name","maxpool_1","Stride",[2 2])
    convolution2dLayer([3 3],16,"Name","conv_2","Padding","same")
    batchNormalizationLayer("Name","batchnorm_2")
    reluLayer("Name","relu_2")
    maxPooling2dLayer([2 2],"Name","maxpool_2","Stride",[2 2])
    convolution2dLayer([3 3],32,"Name","conv_3","Padding","same")
    batchNormalizationLayer("Name","batchnorm_3")
    reluLayer("Name","relu_3")
    fullyConnectedLayer(numHiddenDimension,"Name","fc")];
    lgraph = layerGraph(layers);
    dlnet = dlnetwork(lgraph);
end
function dlnet=createLayerFullyConnect(numHiddenDimension)
    layers = [
        imageInputLayer([1 numHiddenDimension*2 1],"Name","imageinput","Normalization","none")
        fullyConnectedLayer(20,"Name","fc_1")
        fullyConnectedLayer(10,"Name","fc_2")];
    lgraph = layerGraph(layers);
    dlnet = dlnetwork(lgraph);
end
% ----------------- below - the function called by dlfeval, working --------------------
function [gradients1,gradients2,gradients3, loss] = modelGradients_demo(dlnet1,dlnet2,dlnet3,dlX1,dlX2,Y)
    dlYPred1 = forward(dlnet1,dlX1);
    dlYPred2 = forward(dlnet2,dlX2);
    dlX_concat=[dlYPred1;dlYPred2];
    dlX_concat=reshape(dlX_concat,[1 40, 1, 128]);%the value 128 corresponds the mini batch size
    dlX_concat=dlarray(single(dlX_concat),'SSCB');
    dlY_concat=forward(dlnet3,dlX_concat);
    dlYPred_concat = softmax(dlY_concat);
    loss = crossentropy(dlYPred_concat,Y);
    [gradients1,gradients2,gradients3] = dlgradient(loss,dlnet1.Learnables,dlnet2.Learnables,dlnet3.Learnables);
end

Sign in to comment.

Answer 2

Iris Soa on 27 Jul 2020

0
Link

Direct link to this answer

https://nl.mathworks.com/matlabcentral/answers/565595-custom-deep-learning-network-gradient-function-using-dlfeval#answer_471451

Open in MATLAB Online

Update on this issue, see here: https://uk.mathworks.com/help/deeplearning/ug/include-automatic-differentiation.html

Derivative Trace

To evaluate a gradient numerically, a dlarray constructs a data structure for reverse mode differentiation, as described in Automatic Differentiation Background. This data structure is the trace of the derivative computation. Keep in mind these guidelines when using automatic differentiation and the derivative trace:

Do not introduce a new dlarray inside of an objective function calculation and attempt to differentiate with respect to that object. For example:function [dy,dy1] = fun(x1)

function [dy,dy1] = fun(x1)
x2 = dlarray(0);
y = x1 + x2;
dy = dlgradient(y,x2); % Error: x2 is untraced
dy1 = dlgradient(y,x1); % No error even though y has an untraced portion
end

0 Comments
Show -2 older commentsHide -2 older comments

Sign in to comment.

Custom deep learning network - gradient function using dlfeval

0 Comments
Show -2 older commentsHide -2 older comments

Accepted Answer

2 Comments
Show NoneHide None

More Answers (1)

0 Comments
Show -2 older commentsHide -2 older comments

See Also

Categories

Tags

Community Treasure Hunt

Custom deep learning network - gradient function using dlfeval

0 Comments Show -2 older commentsHide -2 older comments

Accepted Answer

2 Comments Show NoneHide None

More Answers (1)

0 Comments Show -2 older commentsHide -2 older comments

See Also

Categories

Tags

Community Treasure Hunt

0 Comments
Show -2 older commentsHide -2 older comments

2 Comments
Show NoneHide None

0 Comments
Show -2 older commentsHide -2 older comments