rmspropupdate

Update parameters using root mean squared propagation (RMSProp)

Syntax

[netUpdated,averageSqGrad] = rmspropupdate(net,grad,averageSqGrad)

[params,averageSqGrad] = rmspropupdate(params,grad,averageSqGrad)

[___] = rmspropupdate(___learnRate,sqGradDecay,epsilon)

Description

Update the network learnable parameters in a custom training loop using the root mean squared propagation (RMSProp) algorithm.

Note

This function applies the RMSProp optimization algorithm to update network parameters in custom training loops. To train a neural network using the trainnet function using the RMSProp solver, use the trainingOptions function and set the solver to "rmsprop".

[netUpdated,averageSqGrad] = rmspropupdate(net,grad,averageSqGrad) updates the learnable parameters of the network net using the RMSProp algorithm. Use this syntax in a training loop to iteratively update a network defined as a dlnetwork object.

example

[params,averageSqGrad] = rmspropupdate(params,grad,averageSqGrad) updates the learnable parameters in params using the RMSProp algorithm. Use this syntax in a training loop to iteratively update the learnable parameters of a network defined using functions.

example

[___] = rmspropupdate(___learnRate,sqGradDecay,epsilon) also specifies values to use for the global learning rate, square gradient decay, and small constant epsilon, in addition to the input arguments in previous syntaxes.

example

Examples

collapse all

Update Learnable Parameters Using `rmspropupdate`

Perform a single root mean squared propagation update step with a global learning rate of 0.05 and squared gradient decay factor of 0.95.

Create the parameters and parameter gradients as numeric arrays.

params = rand(3,3,4);
grad = ones(3,3,4);

Initialize the average squared gradient for the first iteration.

averageSqGrad = [];

Specify custom values for the global learning rate and squared gradient decay factor.

learnRate = 0.05;
sqGradDecay = 0.95;

Update the learnable parameters using rmspropupdate.

[params,averageSqGrad] = rmspropupdate(params,grad,averageSqGrad,learnRate,sqGradDecay);

Train a Network Using `rmspropupdate`

Open Live Script

Use rmspropupdate to train a network using the root mean squared propagation (RMSProp) algorithm.

Load Training Data

Load the digits training data.

[XTrain,TTrain] = digitTrain4DArrayData;
classes = categories(TTrain);
numClasses = numel(classes);

Define the Network

Define the network architecture and specify the average image value using the Mean option in the image input layer.

layers = [
    imageInputLayer([28 28 1],'Mean',mean(XTrain,4))
    convolution2dLayer(5,20)
    reluLayer
    convolution2dLayer(3,20,'Padding',1)
    reluLayer
    convolution2dLayer(3,20,'Padding',1)
    reluLayer
    fullyConnectedLayer(numClasses)
    softmaxLayer];

Create a dlnetwork object from the layer array.

net = dlnetwork(layers);

Define Model Loss Function

Create the helper function modelLoss, listed at the end of the example. The function takes a dlnetwork object and a mini-batch of input data with corresponding labels, and returns the loss and the gradients of the loss with respect to the learnable parameters.

Specify Training Options

Specify the options to use during training.

miniBatchSize = 128;
numEpochs = 20;
numObservations = numel(TTrain);
numIterationsPerEpoch = floor(numObservations./miniBatchSize);

Train Network

Initialize the squared average gradients.

averageSqGrad = [];

Calculate the total number of iterations for the training progress monitor.

numIterations = numEpochs * numIterationsPerEpoch;

Initialize the TrainingProgressMonitor object. Because the timer starts when you create the monitor object, make sure that you create the object close to the training loop.

monitor = trainingProgressMonitor(Metrics="Loss",Info="Epoch",XLabel="Iteration");

Train the model using a custom training loop. For each epoch, shuffle the data and loop over mini-batches of data. Update the network parameters using the rmspropupdate function. At the end of each iteration, display the training progress.

Train on a GPU, if one is available. Using a GPU requires Parallel Computing Toolbox™ and a supported GPU device. For information on supported devices, see GPU Computing Requirements (Parallel Computing Toolbox).

Train the network.

iteration = 0;
epoch = 0;

while epoch < numEpochs && ~monitor.Stop
    epoch = epoch + 1;

    % Shuffle data.
    idx = randperm(numel(TTrain));
    XTrain = XTrain(:,:,:,idx);
    TTrain = TTrain(idx);

    i = 0;
    while i < numIterationsPerEpoch && ~monitor.Stop
        i = i + 1;
        iteration = iteration + 1;

        % Read mini-batch of data and convert the labels to dummy
        % variables.
        idx = (i-1)*miniBatchSize+1:i*miniBatchSize;
        X = XTrain(:,:,:,idx);

        T = zeros(numClasses,miniBatchSize,"single");
        for c = 1:numClasses
            T(c,TTrain(idx)==classes(c)) = 1;
        end
        
        % Convert mini-batch of data to a dlarray.
        X = dlarray(single(X),"SSCB");
        
        % If training on a GPU, then convert data to a gpuArray.
        if  canUseGPU
            X = gpuArray(X);
        end
        
        % Evaluate the model loss and gradients using dlfeval and the
        % modelLoss function.
        [loss,gradients] = dlfeval(@modelLoss,net,X,T);
        
        % Update the network parameters using the RMSProp optimizer.
        [net,averageSqGrad] = rmspropupdate(net,gradients,averageSqGrad);

        % Update the training progress monitor.
        recordMetrics(monitor,iteration,Loss=loss);
        updateInfo(monitor,Epoch=epoch + " of " + numEpochs);
        monitor.Progress = 100 * iteration/numIterations;
    end
end

Test the Network

Test the classification accuracy of the model by comparing the predictions on a test set with the true labels.

[XTest,TTest] = digitTest4DArrayData;

Convert the data to a dlarray with dimension format "SSCB". For GPU prediction, also convert the data to a gpuArray.

XTest = dlarray(XTest,"SSCB");
if canUseGPU
    XTest = gpuArray(XTest);
end

To classify images using a dlnetwork object, use the predict function and find the classes with the highest scores.

YTest = predict(net,XTest);
[~,idx] = max(extractdata(YTest),[],1);
YTest = classes(idx);

Evaluate the classification accuracy.

accuracy = mean(YTest==TTest)

accuracy = 0.9926

Model Loss Function

The helper function modelLoss takes a dlnetwork object net and a mini-batch of input data X with corresponding labels T, and returns the loss and the gradients of the loss with respect to the learnable parameters in net. To compute the gradients automatically, use the dlgradient function.

function [loss,gradients] = modelLoss(net,X,T)

Y = forward(net,X);

loss = crossentropy(Y,T);

gradients = dlgradient(loss,net.Learnables);

end

Input Arguments

collapse all

`net` — Network
`dlnetwork` object

Network, specified as a dlnetwork object.

The function updates the Learnables property of the dlnetwork object. net.Learnables is a table with three variables:

Layer — Layer name, specified as a string scalar.
Parameter — Parameter name, specified as a string scalar.
Value — Value of parameter, specified as a cell array containing a dlarray.

The input argument grad must be a table of the same form as net.Learnables.

`params` — Network learnable parameters
`dlarray` | numeric array | cell array | structure | table

Network learnable parameters, specified as a dlarray, a numeric array, a cell array, a structure, or a table.

If you specify params as a table, it must contain the following three variables.

Layer — Layer name, specified as a string scalar.
Parameter — Parameter name, specified as a string scalar.
Value — Value of parameter, specified as a cell array containing a dlarray.

You can specify params as a container of learnable parameters for your network using a cell array, structure, or table, or nested cell arrays or structures. The learnable parameters inside the cell array, structure, or table must be dlarray or numeric values of data type double or single.

The input argument grad must be provided with exactly the same data type, ordering, and fields (for structures) or variables (for tables) as params.

The learnables can be complex-valued. (since R2024a) Ensure that the corresponding operations support complex-valued learnables.

Before R2024a: The learnables must not be complex-valued. If your model involves complex learnables, then convert the learnables to real values before calculating the gradients.

`grad` — Gradients of loss
`dlarray` | numeric array | cell array | structure | table

Gradients of the loss, specified as a dlarray, a numeric array, a cell array, a structure, or a table.

The exact form of grad depends on the input network or learnable parameters. The following table shows the required format for grad for possible inputs to rmspropupdate.

Input	Learnable Parameters	Gradients
`net`	Table `net.Learnables` containing `Layer`, `Parameter`, and `Value` variables. The `Value` variable consists of cell arrays that contain each learnable parameter as a `dlarray`.	Table with the same data type, variables, and ordering as `net.Learnables`. `grad` must have a `Value` variable consisting of cell arrays that contain the gradient of each learnable parameter.
`params`	`dlarray`	`dlarray` with the same data type and ordering as `params`
	Numeric array	Numeric array with the same data type and ordering as `params`
	Cell array	Cell array with the same data types, structure, and ordering as `params`
	Structure	Structure with the same data types, fields, and ordering as `params`
	Table with `Layer`, `Parameter`, and `Value` variables. The `Value` variable must consist of cell arrays that contain each learnable parameter as a `dlarray`.	Table with the same data types, variables, and ordering as `params`. `grad` must have a `Value` variable consisting of cell arrays that contain the gradient of each learnable parameter.

You can obtain grad from a call to dlfeval that evaluates a function that contains a call to dlgradient. For more information, see Use Automatic Differentiation In Deep Learning Toolbox.

The gradients can be complex-valued. (since R2024a) Using complex valued gradients can lead to complex-valued learnable parameters. Ensure that the corresponding operations support complex-valued learnables.

Before R2024a: The gradients must not be complex-valued. If your model involves complex numbers, then convert all outputs to real values before calculating the gradients.

`averageSqGrad` — Moving average of squared parameter gradients
`[]` | `dlarray` | numeric array | cell array | structure | table

Moving average of squared parameter gradients, specified as an empty array, a dlarray, a numeric array, a cell array, a structure, or a table.

The exact form of averageSqGrad depends on the input network or learnable parameters. The following table shows the required format for averageSqGrad for possible inputs to rmspropupdate.

Input	Learnable Parameters	Average Squared Gradients
`net`	Table `net.Learnables` containing `Layer`, `Parameter`, and `Value` variables. The `Value` variable consists of cell arrays that contain each learnable parameter as a `dlarray`.	Table with the same data type, variables, and ordering as `net.Learnables`. `averageSqGrad` must have a `Value` variable consisting of cell arrays that contain the average squared gradient of each learnable parameter.
`params`	`dlarray`	`dlarray` with the same data type and ordering as `params`
	Numeric array	Numeric array with the same data type and ordering as `params`
	Cell array	Cell array with the same data types, structure, and ordering as `params`
	Structure	Structure with the same data types, fields, and ordering as `params`
	Table with `Layer`, `Parameter`, and `Value` variables. The `Value` variable must consist of cell arrays that contain each learnable parameter as a `dlarray`.	Table with the same data types, variables, and ordering as `params`. `averageSqGrad` must have a `Value` variable consisting of cell arrays that contain the average squared gradient of each learnable parameter.

If you specify averageSqGrad as an empty array, the function assumes no previous gradients and runs in the same way as for the first update in a series of iterations. To update the learnable parameters iteratively, use the averageSqGrad output of a previous call to rmspropupdate as the averageSqGrad input.

Before R2024a: The gradients must not be complex-valued. If your model involves complex numbers, then convert all outputs to real values before calculating the gradients.

`learnRate` — Global learning rate
`0.001` (default) | positive scalar

Global learning rate, specified as a positive scalar. The default value of learnRate is 0.001.

If you specify the network parameters as a dlnetwork, the learning rate for each parameter is the global learning rate multiplied by the corresponding learning rate factor property defined in the network layers.

`sqGradDecay` — Squared gradient decay factor
`0.9` (default) | positive scalar between `0` and `1`.

Squared gradient decay factor, specified as a positive scalar between 0 and 1. The default value of sqGradDecay is 0.9.

`epsilon` — Small constant
`1e-8` (default) | positive scalar

Small constant for preventing divide-by-zero errors, specified as a positive scalar. The default value of epsilon is 1e-8.

Output Arguments

collapse all

`netUpdated` — Updated network
`dlnetwork` object

Updated network, returned as a dlnetwork object.

The function updates the Learnables property of the dlnetwork object.

`params` — Updated network learnable parameters
`dlarray` | numeric array | cell array | structure | table

Updated network learnable parameters, returned as a dlarray, a numeric array, a cell array, a structure, or a table with a Value variable containing the updated learnable parameters of the network.

The learnables can be complex-valued. (since R2024a) Ensure that the corresponding operations support complex-valued learnables.

Before R2024a: The learnables must not be complex-valued. If your model involves complex learnables, then convert the learnables to real values before calculating the gradients.

`averageSqGrad` — Updated moving average of squared parameter gradients
`dlarray` | numeric array | cell array | structure | table

Updated moving average of squared parameter gradients, returned as a dlarray, a numeric array, a cell array, a structure, or a table.

Before R2024a: The gradients must not be complex-valued. If your model involves complex numbers, then convert all outputs to real values before calculating the gradients.

Algorithms

collapse all

Root Mean Square Propagation

Stochastic gradient descent with momentum uses a single learning rate for all the parameters. Other optimization algorithms seek to improve network training by using learning rates that differ by parameter and can automatically adapt to the loss function being optimized. Root mean square propagation (RMSProp) is one such algorithm. It keeps a moving average of the element-wise squares of the parameter gradients,

$v_{ℓ} = β_{2} v_{ℓ - 1} + (1 - β_{2}) {[\nabla E (θ_{ℓ})]}^{2}$

β₂ is the squared gradient decay factor of the moving average. Common values of the decay rate are 0.9, 0.99, and 0.999. The corresponding averaging lengths of the squared gradients equal 1/(1-β₂), that is, 10, 100, and 1000 parameter updates, respectively. The RMSProp algorithm uses this moving average to normalize the updates of each parameter individually,

$θ_{ℓ + 1} = θ_{ℓ} - \frac{α \nabla E (θ_{ℓ})}{\sqrt{v_{ℓ}} + ϵ}$

where the division is performed element-wise. Using RMSProp effectively decreases the learning rates of parameters with large gradients and increases the learning rates of parameters with small gradients. ɛ is a small constant added to avoid division by zero.

Extended Capabilities

expand all

GPU Arrays
Accelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™.

The rmspropupdate function supports GPU array input with these usage notes and limitations:

When at least one of the following input arguments is a gpuArray or a dlarray with underlying data of type gpuArray, this function runs on the GPU.
- grad
- averageSqGrad
- params

For more information, see Run MATLAB Functions on a GPU (Parallel Computing Toolbox).

Version History

Introduced in R2019b

expand all

R2024a: Complex-valued learnable parameters and gradients

The learnable parameters, gradients, and moving average of squared gradients can be complex-valued. When the updated learnable parameters are complex-valued, ensure that the corresponding operations support complex-valued parameters.

R2020a: `rmspropupdate` squared gradient decay factor default is `0.9`

Starting in R2020a, the default value of the squared gradient decay factor in rmspropupdate is 0.9. In previous versions, the default value was 0.999. To reproduce the previous default behavior, use one of the following syntaxes:

[net,averageSqGrad] = rmspropupdate(net,grad,averageSqGrad,0.001,0.999)
[params,averageSqGrad] = rmspropupdate(params,grad,averageSqGrad,0.001,0.999)

rmspropupdate

Syntax

Description

Examples

Update Learnable Parameters Using `rmspropupdate`

Train a Network Using `rmspropupdate`

Input Arguments

`net` — Network
`dlnetwork` object

`params` — Network learnable parameters
`dlarray` | numeric array | cell array | structure | table

`grad` — Gradients of loss
`dlarray` | numeric array | cell array | structure | table

`averageSqGrad` — Moving average of squared parameter gradients
`[]` | `dlarray` | numeric array | cell array | structure | table

`learnRate` — Global learning rate
`0.001` (default) | positive scalar

`sqGradDecay` — Squared gradient decay factor
`0.9` (default) | positive scalar between `0` and `1`.

`epsilon` — Small constant
`1e-8` (default) | positive scalar

Output Arguments

`netUpdated` — Updated network
`dlnetwork` object

`params` — Updated network learnable parameters
`dlarray` | numeric array | cell array | structure | table

`averageSqGrad` — Updated moving average of squared parameter gradients
`dlarray` | numeric array | cell array | structure | table

Algorithms

Root Mean Square Propagation

Extended Capabilities

GPU Arrays
Accelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™.

Version History

R2024a: Complex-valued learnable parameters and gradients

R2020a: `rmspropupdate` squared gradient decay factor default is `0.9`

See Also

Topics

rmspropupdate

Syntax

Description

Examples

Update Learnable Parameters Using rmspropupdate

Train a Network Using rmspropupdate

Input Arguments

net — Network dlnetwork object

params — Network learnable parameters dlarray | numeric array | cell array | structure | table

grad — Gradients of loss dlarray | numeric array | cell array | structure | table

averageSqGrad — Moving average of squared parameter gradients [] | dlarray | numeric array | cell array | structure | table

learnRate — Global learning rate 0.001 (default) | positive scalar

sqGradDecay — Squared gradient decay factor 0.9 (default) | positive scalar between 0 and 1.

epsilon — Small constant 1e-8 (default) | positive scalar

Output Arguments

netUpdated — Updated network dlnetwork object

params — Updated network learnable parameters dlarray | numeric array | cell array | structure | table

averageSqGrad — Updated moving average of squared parameter gradients dlarray | numeric array | cell array | structure | table

Algorithms

Root Mean Square Propagation

Extended Capabilities

GPU Arrays Accelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™.

Version History

R2024a: Complex-valued learnable parameters and gradients

R2020a: rmspropupdate squared gradient decay factor default is 0.9

See Also

Topics

Update Learnable Parameters Using `rmspropupdate`

Train a Network Using `rmspropupdate`

`net` — Network
`dlnetwork` object

`params` — Network learnable parameters
`dlarray` | numeric array | cell array | structure | table

`grad` — Gradients of loss
`dlarray` | numeric array | cell array | structure | table

`averageSqGrad` — Moving average of squared parameter gradients
`[]` | `dlarray` | numeric array | cell array | structure | table

`learnRate` — Global learning rate
`0.001` (default) | positive scalar

`sqGradDecay` — Squared gradient decay factor
`0.9` (default) | positive scalar between `0` and `1`.

`epsilon` — Small constant
`1e-8` (default) | positive scalar

`netUpdated` — Updated network
`dlnetwork` object

`params` — Updated network learnable parameters
`dlarray` | numeric array | cell array | structure | table

`averageSqGrad` — Updated moving average of squared parameter gradients
`dlarray` | numeric array | cell array | structure | table

GPU Arrays
Accelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™.

R2020a: `rmspropupdate` squared gradient decay factor default is `0.9`