rlRepresentationOptions

Create options for reinforcement learning agent representations

Syntax

repOpts = rlRepresentationOptions
repOpts = rlRepresentationOptions(Name,Value)

Description

example

repOpts = rlRepresentationOptions returns the default options for defining a representation for a reinforcement learning agent.

example

repOpts = rlRepresentationOptions(Name,Value) option set using the specified name-value pairs to override default option values.

Examples

collapse all

Create an options set for creating a critic or actor representation for a reinforcement learning agent. Set the learning rate for the representation to 0.05, and set the gradient threshold to 1. You can set the options using Name,Value pairs when you create the options set. Any options that you do not explicitly set have their default values.

repOpts = rlRepresentationOptions('LearnRate',5e-2,...
                                  'GradientThreshold',1)
repOpts = 
  rlRepresentationOptions with properties:

                  LearnRate: 0.0500
                  Optimizer: "adam"
        OptimizerParameters: [1×1 rl.option.OptimizerParameters]
          GradientThreshold: 1
    GradientThresholdMethod: "l2norm"
     L2RegularizationFactor: 1
                  UseDevice: "CPU"
              MiniBatchSize: Inf

Alternatively, create a default options set and use dot notation to change some of the values.

repOpts = rlRepresentationOptions;
repOpts.LearnRate = 5e-2;
repOpts.GradientThreshold = 1
repOpts = 
  rlRepresentationOptions with properties:

                  LearnRate: 0.0500
                  Optimizer: "adam"
        OptimizerParameters: [1×1 rl.option.OptimizerParameters]
          GradientThreshold: 1
    GradientThresholdMethod: "l2norm"
     L2RegularizationFactor: 1
                  UseDevice: "CPU"
              MiniBatchSize: Inf

If you want to change the properties of the OptimizerParameters option, use dot notation to access them.

repOpts.OptimizerParameters.Epsilon = 1e-7;
repOpts.OptimizerParameters
ans = 
  OptimizerParameters with properties:

                      Momentum: "Not applicable"
                       Epsilon: 1.0000e-07
           GradientDecayFactor: 0.9000
    SquaredGradientDecayFactor: 0.9990

Input Arguments

collapse all

Name-Value Pair Arguments

Specify optional comma-separated pairs of Name,Value arguments. Name is the argument name and Value is the corresponding value. Name must appear inside quotes. You can specify several name and value pair arguments in any order as Name1,Value1,...,NameN,ValueN.

Example: 'Optimizer',"rmsprop"

Learning rate for the representation, specified as the comma-separated pair consisting of 'LearnRate' and a positive scalar. If the learning rate is too low, then training takes a long time. If the learning rate is too high, then training might reach a suboptimal result or diverge.

Example: 'LearnRate',0.025

Optimizer for training the network of the representation, specified as the comma-separated pair consisting of 'Optimizer' and one of the following strings:

  • "adam" — Use the Adam optimizer. You can specify the decay rates of the gradient and squared gradient moving averages using the GradientDecayFactor and SquaredGradientDecayFactor fields of the OptimizerParameters option.

  • "sgdm" — Use the stochastic gradient descent with momentum (SGDM) optimizer. You can specify the momentum value using the Momentum field of the OptimizerParameters option.

  • "rmsprop" — Use the RMSProp optimizer. You can specify the decay rate of the squared gradient moving average using the SquaredGradientDecayFactor fields of the OptimizerParameters option.

For more information about these optimizers, see Stochastic Gradient Descent (Deep Learning Toolbox) in the Algorithms section of trainingOptions in Deep Learning Toolbox™.

Example: 'Optimizer',"sgdm"

Applicable parameters for the optimizer, specified as the comma-separated pair consisting of 'OptimizerParameters' and an OptimizerParameters object.

The OptimizerParameters object has the following properties.

  
Momentum

Contribution of previous step, specified as a scalar from 0 to 1. A value of 0 means no contribution from the previous step. A value of 1 means maximal contribution.

This parameter applies only when Optimizer is "sgdm". In that case, the default value is 0.9. This default value works well for most problems.

Epsilon

Denominator offset, specified as a positive scalar. The optimizer adds this offset to the denominator in the network parameter updates to avoid division by zero.

This parameter applies only when Optimizer is "adam" or rmsprop. In that case, the default value is 10–8. This default value works well for most problems.

GradientDecayFactor

Decay rate of gradient moving average, specified as a positive scalar from 0 to 1.

This parameter applies only when Optimizer is "adam". In that case, the default value is 0.9. This default value works well for most problems.

SquaredGradientDecayFactor

Decay rate of squared gradient moving average, specified as a positive scalar from 0 to 1.

This parameter applies only when Optimizer is "adam" or "rmsprop". In that case, the default value is 0.999. This default value works well for most problems.

When a particular property of OptimizerParameters is not applicable to the optimizer type specified in the Optimizer option, that property is set to "Not applicable".

To change the default values, create an rlRepresentationOptions set and use dot notation to access and change the properties of OptimizerParameters.

repOpts = rlRepresentationOptions;
repOpts.OptimizerParameters.Epsilon = 1e-7;

Threshold value for the representation gradient, specified as the comma-separated pair consisting of 'GradientThreshold' and Inf or a positive scalar. If the gradient exceeds this value, the gradient is clipped as specified by the GradientThresholdOption. Clipping the gradient limits how much the network parameters change in a training iteration.

Example: 'GradientThreshold',1

Gradient threshold method used to clip gradient values that exceed the gradient threshold, specified as the comma-separated pair consisting of 'GradientThresholdMethod' and one of the following strings:

  • "l2norm" — If the L2 norm of the gradient of a learnable parameter is larger than GradientThreshold, then scale the gradient so that the L2 norm equals GradientThreshold.

  • "global-l2norm" — If the global L2 norm, L, is larger than GradientThreshold, then scale all gradients by a factor of GradientThreshold/L. The global L2 norm considers all learnable parameters.

  • "absolute-value" — If the absolute value of an individual partial derivative in the gradient of a learnable parameter is larger than GradientThreshold, then scale the partial derivative to have magnitude equal to GradientThreshold and retain the sign of the partial derivative.

For more information, see Gradient Clipping (Deep Learning Toolbox) in the Algorithms section of trainingOptions in Deep Learning Toolbox.

Example: 'GradientThresholdMethod',"absolute-value"

Factor for L2 regularization (weight decay), specified as the comma-separated pair consisting of 'L2RegularizationFactor' and a nonnegative scalar. For more information, see L2 Regularization (Deep Learning Toolbox) in the Algorithms section of trainingOptions in Deep Learning Toolbox.

To avoid overfitting when using a representation with many parameters, consider increasing the L2RegularizationFactor option.

Example: 'L2RegularizationFactor',0.0005

Computation device for training an agent that uses the representation, specified as the comma-separated pair consisting of 'UseDevice' and either "cpu" or "gpu".

The "gpu" option requires Parallel Computing Toolbox™. To use a GPU for training a network, you must also have a CUDA® enabled NVIDIA® GPU with compute capability 3.0 or higher.

Example: 'UseDevice',"gpu"

Output Arguments

collapse all

Option set for defining a representation for a reinforcement learning agent., returned as an rlRepresentationgOptions object. The property values of repOpts are initialized to the default values or to the values you specify with Name,Value pairs. You can further modify the property values using dot notation. Use the options set as an input argument with rlRepresentation when you create reinforcement learning representations.

See Also

Functions

Introduced in R2019a