- This page also helps to understand what can be done with trainNetwork and what can be achieved with custom training loops and what options there are for defining loss functions: https://www.mathworks.com/help/deeplearning/ug/define-custom-training-loops-loss-functions-and-networks.html#mw_1e1e702a-95f2-411a-a85c-b2060c2a5734
- This page nicely explains the key parts of a custom training loop, I highly recommend it: https://www.mathworks.com/help/deeplearning/ug/define-model-gradients-function-for-custom-training-loop.html
- Here is a basic example showing a custom training loop for image classification where you can see how everything comes together: https://www.mathworks.com/help/deeplearning/ug/train-network-using-custom-training-loop.html
- Note that a dlnetwork cannot contain output layers (neither built-in ones nor custom output layers). Instead, the loss function (forwardLoss in your case) can be written directly in the custom training loop. Have a look at the function modelGradients in the example above. There, crossentropy is used but you can implement whatever you want there (probably just copy over your code from forwardLoss)
- If you really need a custom backwardLoss function, then this will be a bit more difficult. Usually, it is not needed since the automatic differentiation can figure out the derivative of the loss function.
In a custom deep learning training loop, can I use my own custom function for computing the gradients?
11 views (last 30 days)
I would like to train a CNN using a custom training loop. However, I am wondering if there is a way that I can use my own gradient computation function instead of using the automatic differentiation provided by dlfeval(), modelGradients(), and dlgradient(), that are used with a dlnetwork(), which is used in custom training loops, versus an lgraph.
For example, in place of dlgradient(), can I use my own custom gradient function?
To expand, I currently have a custom MATLAB Fully Convolutional Network (for image-to-image regression), for which I use trainNetwork() for the training, which is typical. However, I would like to control certain convolution layer weights during a few iterations of the training, as in "set and hold them to certain values" for a few iterations. My understanding is that I would have to use a custom training loop for this. (Is this true?) However, in my current custom CNN, I also have a custom output layer which computes my loss and gradients, via custom forward and backward loss functions. e.g.,
loss = forwardLoss(layer, Y, T)
dLdY = backwardLoss(layer, Y, T)
I would like to maintain the functionality of my custom forwardLoss() and backwardLoss() functions since I have certain analytical and diagnostic capabilities in them. So again, the questions are...
- In a custom training loop, can I use my own custom gradient function? If so, are there unique conventions I must follow for the custom training loop?
- Also, if my goal is primarily just to dynamically set the weights of certain convolutional layers to certain values for a few iterations of training, then do I even need a custom loop, using dlnetwork() for this, or can I do this with a network trained using trainNetwork()? I have checked the various documentation but have not found an answer yet.
Any assistance would be appreciated.
Katja Mogalle on 21 Jan 2022
Re "However, I would like to control certain convolution layer weights during a few iterations of the training, as in "set and hold them to certain values" for a few iterations. My understanding is that I would have to use a custom training loop for this. (Is this true?)"
Yes, it is true. This is the perfect scenario for using custom training loops.
I might be understanding your approach incorrectly but I suspect you don't want a custom gradient computation but instead you want to use a custom loss function. A few bits and bobs about custom training loops:
Re "my goal is primarily just to dynamically set the weights of certain convolutional layers to certain values for a few iterations of training"
Indeed, these would need to be done with a custom training loop. You can access and edit the learnable parameters of a specific layer by using the Learnables property of dlnetwork.
I hope this helps. Please provide some more details if I misunderstood your request.
Reece Teramoto on 27 Jan 2022
Great speaking with you the other day. As we discussed, it would be good to post the solution here for others to use. I did speak with @Katja Mogalle and she had some additional info to add to this solution.
Here is a summary of our understanding of your workflow: In your neural network, you have some layers that you'd like to set to initial values and have their learnables remain frozen for around half the total epochs, while the other layers learn. Then, you'd like to unfreeze the weights for the remaining epochs of training.
Here is how you can approach this using trainNetwork without needing a custom training loop:
- Set the initial weights of the desired layers.
- Freeze the weights of the desired layers.
- Call trainNetwork for half the total epochs (or whatever the desired amount is).
- Unfreeze the weights.
- Retrain for the remainder of the epochs by calling trainNetwork again, using the unfrozen layers of the previously trained network.
Specifically, here are some references on what you can use for each step:
Set the initial weights of the desired layers.
- Set either the "Weights" or "WeightsInitializer" property of the desired convolution layers. https://www.mathworks.com/help/deeplearning/ref/nnet.cnn.layer.convolution2dlayer.html?searchHighlight=convolution2dLayer&s_tid=doc_srchtitle#mw_2d97b6cd-f8aa-4fad-88d6-d34875484820_sep_mw_bec5cf10-a8e8-4560-be5a-c4ccb6594b02
Freeze the weights of the desired layers.
- Freeze the weights using the "freezeWeights" helper function. This function ships with MATLAB but is not on the default path. It simply sets the "WeightLearnRateFactor" property of the desired layers to 0.
- Here is an example of using the function to freeze the first 5 layers of a network:
layers(1:5) = freezeWeights(layers(1:5));
Call trainNetwork for half the total epochs (or whatever the desired amount is).
- Specify the epochs in the trainingOptions.
net = trainNetwork(data, layers, opts)
Unfreeze the weights.
- I've attached a function that does this. It just sets the "WeightLearnRateFactor" of the desired layers to 1.
net.Layers(1:5) = unfreezeWeights(net.Layers(1:5));
Retrain for the remainder of the epochs by calling trainNetwork again, using the unfrozen layers of the previously trained network.
net = trainNetwork(data, net.Layers, opts)
Now, a small disclaimer about this proposed method that @Katja Mogalle mentioned when I spoke to her about this:
- When calling trainNetwork for the second time, it’s not 100% the same as if we’d been able to continuously train and unfreeze some weights after a few epochs (e.g. via a custom training loop). The optimization algorithm (e.g. SGDM) has some parameters that would be reset on the second call to trainNetwork. Also be careful if you have some learning rate dropping scheme. This might not at all be a problem, but we just wanted to mention it in case you expected that training of the other weights (ones that weren’t frozen) would precisely continue where it left off after the first call to trainNetwork.