rlMBPOAgentOptions

Options for MBPO agent

Since R2022a

Description

Use an rlMBPOAgentOptions object to specify options for model-based policy optimization (MBPO) agents. To create an MBPO agent, use rlMBPOAgent.

For more information, see Model-Based Policy Optimization (MBPO) Agent.

Creation

Syntax

opt = rlMBPOAgentOptions

opt = rlMBPOAgentOptions(Name=Value)

Description

opt = rlMBPOAgentOptions creates an option object for use as an argument when creating an MBPO agent using all default options. You can modify the object properties using dot notation.

example

opt = rlMBPOAgentOptions(Name=Value) creates the options set opt and sets its properties using one or more name-value arguments. For example, rlMBPOAgentOptions(DiscountFactor=0.95) creates an option set with a discount factor of 0.95. You can specify multiple name-value pair arguments.

Properties

expand all

`NumEpochForTrainingModel` — Number of epochs
`1` (default) | positive integer

Number of epochs for training the environment model, specified as a positive integer.

Example: NumEpochForTrainingModel=2

`NumMiniBatches` — Number of mini-batches
`10` (default) | positive integer | `"all"`

Number of mini-batches used in each environment model training epoch, specified as a positive scalar or "all". When you specify NumMiniBatches to "all", the agent selects the number of mini-batches such that all samples in the base agents experience buffer are used to train the model.

Example: NumMiniBatches=20

`MiniBatchSize` — Size of random experience mini-batch
`128` (default) | positive integer

Size of random experience mini-batch for training environment model, specified as a positive integer. During each model training episode, the agent randomly samples experiences from the experience buffer when computing gradients for updating the environment model properties. Large mini-batches reduce the variance when computing gradients but increase the computational effort.

Example: MiniBatchSize=256

`TransitionOptimizerOptions` — Transition function optimizer options
`rlOptimizerOptions` object | array of `rlOptimizerOptions` objects

Transition function optimizer options, specified as one of the following:

rlOptimizerOptions object — When your neural network environment has a single transition function or if you want to use the same options for multiple transition functions, specify a single options object.
Array of rlOptimizerOptions objects — When your neural network environment agent has multiple transition functions and you want to use different optimizer options for the transition functions, specify an array of options objects with length equal to the number of transition functions.

Using these objects, you can specify training parameters for the transition deep neural network approximators as well as the optimizer algorithms and parameters.

If you have previously trained transition models and do not want the MBPO agent to modify these models during training, set TransitionOptimizerOptions.LearnRate to 0.

`RewardOptimizerOptions` — Reward function optimizer options
`rlOptimizerOptions` object

Reward function optimizer options, specified as an rlOptimizerOptions object. Using this object, you can specify training parameters for the reward deep neural network approximator as well as the optimizer algorithm and its parameters.

If you specify a ground-truth reward function using a custom function, the MBPO agent ignores these options.

If you have a previously trained reward model and do not want the MBPO agent to modify the model during training, set RewardOptimizerOptions.LearnRate to 0.

`IsDoneOptimizerOptions` — Is-done function optimizer options
`rlOptimizerOptions` object

Is-done function optimizer options, specified as an rlOptimizerOptions object. Using this object, you can specify training parameters for the is-done deep neural network approximator as well as the optimizer algorithm and its parameters.

If you specify a ground-truth is-done function using a custom function, the MBPO agent ignores these options.

If you have a previously trained is-done model and do not want the MBPO agent to modify the model during training, set IsDoneOptimizerOptions.LearnRate to 0.

`ModelExperienceBufferLength` — Generated experience buffer size
`100000` (default) | positive integer

Generated experience buffer size, specified as a positive integer. When the agent generates experiences, they are added to the model experience buffer.

Example: ModelExperienceBufferLength=50000

`ModelRolloutOptions` — Model roll-out options
`rlModelRolloutOptions` object

Model roll-out options for controlling the number and length of generated experience trajectories, specified as an rlModelRolloutOptions object with the following fields. At the start of each epoch, the agent generates the roll-out trajectories and adds them to the model experience buffer. To modify the roll-out options, use dot notation.

`NumRollout` — Number of trajectories
`2000` (default) | positive integer

Number of trajectories for generating samples, specified as a positive integer.

Example: NumRollout=4000

`Horizon` — Initial trajectory horizon
`1` (default) | positive integer

Initial trajectory horizon, specified as a positive integer.

Example: Horizon=2

`HorizonUpdateSchedule` — Option for increasing horizon length
`"none"` (default) | `"piecewise"`

Option for increasing the horizon length, specified as one of the following values.

"none" — Do not increase the horizon length.
"piecewise" — Increase the horizon length by one after every N model training epochs, where N is equal to HorizonUpdateFrequency.

Example: HorizonUpdateSchedule="piecewise"

`RolloutHorizonUpdateFrequency` — Number of epochs after which the horizon increases
`100` (default) | positive integer

Number of epochs after which the horizon increases, specified as a positive integer. When RolloutHorizonSchedule is "none" this option is ignored.

Example: RolloutHorizonUpdateFrequency=200

`HorizonMax` — Maximum horizon length
`20` (default) | positive integer

Maximum horizon length, specified as a positive integer greater than or equal to RolloutHorizon. When RolloutHorizonSchedule is "none" this option is ignored.

Example: HorizonMax=5

`HorizonUpdateStartEpoch` — Training epoch at which to start generating trajectories
`1` (default) | positive integer

Training epoch at which to start generating trajectories, specified as a positive integer.

Example: HorizonUpdateStartEpoch=100

`NoiseOptions` — Exploration model options
`[]` (default) | `EpsilonGreedyExploration` object | `GaussianActionNoise` object

Exploration model options for generating experiences using the internal environment model, specified as one of the following:

[] — Use the exploration policy of the base agent. You must use this option when training a SAC base agent.
EpsilonGreedyExploration object — You can use this option when training a DQN base agent.
GaussianActionNoise object — You can use this option when training a DDPG or TD3 base agent.

The exploration model uses only the initial noise option values and does not update the values during training.

To specify NoiseOptions, create a default model object. Then, specify any nondefault model properties using dot notation.

Specify epsilon greedy exploration options.

opt = rlMBPOAgentOptions;
opt.ModelRolloutOptions.NoiseOptions = ...
    rl.option.EpsilonGreedyExploration;
opt.ModelRolloutOptions.NoiseOptions.EpsilonMin = 0.03;

Specify Gaussian action noise options.

opt = rlMBPOAgentOptions;
opt.ModelRolloutOptions.NoiseOptions = ...
    rl.option.GaussianActionNoise;
opt.ModelRolloutOptions.NoiseOptions.StandardDeviation = sqrt(0.15);

For more information on noise models, see Noise Models.

`RealSampleRatio` — Ratio of real experiences in a mini-batch
`0.2` (default) | nonnegative scalar less than or equal to 1

Ratio of real experiences in a mini-batch for agent training, specified as a nonnegative scalar less than or equal to 1.

Example: RealSampleRatio=0.1

`InfoToSave` — Options to save additional agent data
structure (default)

Options to save additional agent data, specified as a structure containing the Optimizer field.

You can save an agent object in several ways, for example:

Using the save command
Specifying saveAgentCriteria and saveAgentValue in an rlTrainingOptions object
Specifying an appropriate logging function within a FileLogger object.

When you save an agent using any method, the fields in the InfoToSave structure determine whether the corresponding data is saved with the agent. For example, if you set the Optimizer field to true, then the transition, reward, and is-done functions optimizers are saved along with the agent.

You can modify the InfoToSave property only after the agent options object is created.

Example: options.InfoToSave.Optimizer=true

`Optimizer` — Option to save agent optimizer
`false` (default) | `true`

Option to save the agent optimizer, specified as a logical value. If the Optimizer field is set to false, then the transition, reward, and is-done functions optimizers (which are hidden properties of the agent and can contain internal states) are not saved along with the agent, therefore saving disk space and memory. However, when the optimizers contain internal states, the state of the saved agent is not identical to the state of the original agent.

Example: true

Object Functions

rlMBPOAgent Model-based policy optimization (MBPO) reinforcement learning agent

Examples

collapse all

Create MBPO Agent Options Object

Open Live Script

Create an MBPO agent options object, specifying the ratio of real experiences to use for training the agent as 30%.

opt = rlMBPOAgentOptions(RealSampleRatio=0.3)

opt = 
  rlMBPOAgentOptions with properties:

       NumEpochForTrainingModel: 1
                 NumMiniBatches: 10
                  MiniBatchSize: 128
     TransitionOptimizerOptions: [1x1 rl.option.rlOptimizerOptions]
         RewardOptimizerOptions: [1x1 rl.option.rlOptimizerOptions]
         IsDoneOptimizerOptions: [1x1 rl.option.rlOptimizerOptions]
    ModelExperienceBufferLength: 100000
            ModelRolloutOptions: [1x1 rl.option.rlModelRolloutOptions]
                RealSampleRatio: 0.3000
                     InfoToSave: [1x1 struct]

You can modify options using dot notation. For example, set the mini-batch size to 64.

opt.MiniBatchSize = 64;

Algorithms

expand all

Noise Models

Gaussian Action Noise

A GaussianActionNoise object has the following numeric value properties. When generating experiences, MBPO agents do not update their exploration model parameters.

Property	Description	Default Value
`Mean`	Noise model mean	`0`
`StandardDeviation`	Noise model standard deviation	`sqrt(0.2)`
`StandardDeviationDecayRate`	Decay rate of the standard deviation (not used for generating samples)	`0`
`StandardDeviationMin`	Minimum standard deviation, which must be less than `StandardDeviation` (not used for generating samples)	`0.1`
`LowerLimit`	Noise sample lower limit	`-Inf`
`UpperLimit`	Noise sample upper limit	`Inf`

At each time step k, the Gaussian noise v is sampled as shown in the following code.

w = Mean + rand(ActionSize).*StandardDeviation(k);
v(k+1) = min(max(w,LowerLimit),UpperLimit);

Epsilon Greedy Exploration

An EpsilonGreedyExploration object has the following numeric value properties. When generating experiences, MBPO agents do not update their exploration model parameters.

Property Description Default Value

Epsilon Probability threshold to either randomly select an action or select the action that maximizes the state-action value function. A larger value of Epsilon means that the agent randomly explores the action space at a higher rate. 1

Property	Description	Default Value
`Epsilon`	Probability threshold to either randomly select an action or select the action that maximizes the state-action value function. A larger value of `Epsilon` means that the agent randomly explores the action space at a higher rate.	`1`
`EpsilonMin`	Minimum value of `Epsilon` (not used for generating samples)	`0.01`
`EpsilonDecay`	Decay rate (not used for generating samples)	`0.005`

EpsilonMin

Minimum value of Epsilon

(not used for generating samples)

0.01

EpsilonDecay

Decay rate

(not used for generating samples)

0.005

Version History

Introduced in R2022a

rlMBPOAgentOptions

Description

Creation

Syntax

Description

Properties

`NumEpochForTrainingModel` — Number of epochs
`1` (default) | positive integer

`NumMiniBatches` — Number of mini-batches
`10` (default) | positive integer | `"all"`

`MiniBatchSize` — Size of random experience mini-batch
`128` (default) | positive integer

`TransitionOptimizerOptions` — Transition function optimizer options
`rlOptimizerOptions` object | array of `rlOptimizerOptions` objects

`RewardOptimizerOptions` — Reward function optimizer options
`rlOptimizerOptions` object

`IsDoneOptimizerOptions` — Is-done function optimizer options
`rlOptimizerOptions` object

`ModelExperienceBufferLength` — Generated experience buffer size
`100000` (default) | positive integer

`ModelRolloutOptions` — Model roll-out options
`rlModelRolloutOptions` object

`NumRollout` — Number of trajectories
`2000` (default) | positive integer

`Horizon` — Initial trajectory horizon
`1` (default) | positive integer

`HorizonUpdateSchedule` — Option for increasing horizon length
`"none"` (default) | `"piecewise"`

`RolloutHorizonUpdateFrequency` — Number of epochs after which the horizon increases
`100` (default) | positive integer

`HorizonMax` — Maximum horizon length
`20` (default) | positive integer

`HorizonUpdateStartEpoch` — Training epoch at which to start generating trajectories
`1` (default) | positive integer

`NoiseOptions` — Exploration model options
`[]` (default) | `EpsilonGreedyExploration` object | `GaussianActionNoise` object

`RealSampleRatio` — Ratio of real experiences in a mini-batch
`0.2` (default) | nonnegative scalar less than or equal to 1

`InfoToSave` — Options to save additional agent data
structure (default)

`Optimizer` — Option to save agent optimizer
`false` (default) | `true`

Object Functions

Examples

Create MBPO Agent Options Object

Algorithms

Noise Models

Version History

See Also

Objects

Topics

rlMBPOAgentOptions

Description

Creation

Syntax

Description

Properties

NumEpochForTrainingModel — Number of epochs 1 (default) | positive integer

NumMiniBatches — Number of mini-batches 10 (default) | positive integer | "all"

MiniBatchSize — Size of random experience mini-batch 128 (default) | positive integer

TransitionOptimizerOptions — Transition function optimizer options rlOptimizerOptions object | array of rlOptimizerOptions objects

RewardOptimizerOptions — Reward function optimizer options rlOptimizerOptions object

IsDoneOptimizerOptions — Is-done function optimizer options rlOptimizerOptions object

ModelExperienceBufferLength — Generated experience buffer size 100000 (default) | positive integer

ModelRolloutOptions — Model roll-out options rlModelRolloutOptions object

NumRollout — Number of trajectories 2000 (default) | positive integer

Horizon — Initial trajectory horizon 1 (default) | positive integer

HorizonUpdateSchedule — Option for increasing horizon length "none" (default) | "piecewise"

RolloutHorizonUpdateFrequency — Number of epochs after which the horizon increases 100 (default) | positive integer

HorizonMax — Maximum horizon length 20 (default) | positive integer

HorizonUpdateStartEpoch — Training epoch at which to start generating trajectories 1 (default) | positive integer

NoiseOptions — Exploration model options [] (default) | EpsilonGreedyExploration object | GaussianActionNoise object

RealSampleRatio — Ratio of real experiences in a mini-batch 0.2 (default) | nonnegative scalar less than or equal to 1

InfoToSave — Options to save additional agent data structure (default)

Optimizer — Option to save agent optimizer false (default) | true

Object Functions

Examples

Create MBPO Agent Options Object

Algorithms

Noise Models

Version History

See Also

Objects

Topics

`NumEpochForTrainingModel` — Number of epochs
`1` (default) | positive integer

`NumMiniBatches` — Number of mini-batches
`10` (default) | positive integer | `"all"`

`MiniBatchSize` — Size of random experience mini-batch
`128` (default) | positive integer

`TransitionOptimizerOptions` — Transition function optimizer options
`rlOptimizerOptions` object | array of `rlOptimizerOptions` objects

`RewardOptimizerOptions` — Reward function optimizer options
`rlOptimizerOptions` object

`IsDoneOptimizerOptions` — Is-done function optimizer options
`rlOptimizerOptions` object

`ModelExperienceBufferLength` — Generated experience buffer size
`100000` (default) | positive integer

`ModelRolloutOptions` — Model roll-out options
`rlModelRolloutOptions` object

`NumRollout` — Number of trajectories
`2000` (default) | positive integer

`Horizon` — Initial trajectory horizon
`1` (default) | positive integer

`HorizonUpdateSchedule` — Option for increasing horizon length
`"none"` (default) | `"piecewise"`

`RolloutHorizonUpdateFrequency` — Number of epochs after which the horizon increases
`100` (default) | positive integer

`HorizonMax` — Maximum horizon length
`20` (default) | positive integer

`HorizonUpdateStartEpoch` — Training epoch at which to start generating trajectories
`1` (default) | positive integer

`NoiseOptions` — Exploration model options
`[]` (default) | `EpsilonGreedyExploration` object | `GaussianActionNoise` object

`RealSampleRatio` — Ratio of real experiences in a mini-batch
`0.2` (default) | nonnegative scalar less than or equal to 1

`InfoToSave` — Options to save additional agent data
structure (default)

`Optimizer` — Option to save agent optimizer
`false` (default) | `true`