This is machine translation

Translated by Microsoft
Mouseover text to see original. Click the button below to return to the English version of the page.

Note: This page has been translated by MathWorks. Click here to see
To view all translated materials including this page, select Country from the country navigator on the bottom of this page.

rlTrainingOptions

Options for training reinforcement learning agents

Syntax

trainOpts = rlTrainingOptions
trainOpts = rlTrainingOptions(Name,Value)

Description

example

trainOpts = rlTrainingOptions returns the default options for training a reinforcement learning agent. You use training options to specify parameters about the training session such as the maximum number of episodes to train, criteria for stopping training, criteria for saving agents, and how to use parallel computing. After you configure the options, use trainOpts as an input argument for train.

example

trainOpts = rlTrainingOptions(Name,Value) creates an option set for training using the specified name-value pairs to override default option values.

Examples

collapse all

Create an options set for training a reinforcement learning agent. Set the maximum number of episodes and the maximum steps per episode to 1000. Configure the options to stop training when the average reward equals or exceeds 480, and turn on both the command-line display and the Reinforcement Learning Episode Manager for displaying training results. You can set the options using Name,Value pairs when you create the options set. Any options that you do not explicitly set have their default values.

trainOpts = rlTrainingOptions(...
    'MaxEpisodes',1000,...
    'MaxStepsPerEpisode',1000,...
    'StopTrainingCriteria',"AverageReward",...
    'StopTrainingValue',480,...
    'Verbose',true,...
    'Plots',"training-progress")
trainOpts = 
  rlTrainingOptions with properties:

                   MaxEpisodes: 1000
            MaxStepsPerEpisode: 1000
    ScoreAveragingWindowLength: 5
          StopTrainingCriteria: "AverageReward"
             StopTrainingValue: 480
             SaveAgentCriteria: "None"
                SaveAgentValue: "None"
            SaveAgentDirectory: "savedAgents"
               Parallelization: "none"
        ParallelizationOptions: []
                   StopOnError: "on"
                       Verbose: 1
                         Plots: "training-progress"

Alternatively, create a default options set and use dot notation to change some of the values.

trainOpts = rlTrainingOptions;
trainOpts.MaxEpisodes = 1000;
trainOpts.MaxStepsPerEpisode = 1000;
trainOpts.StopTrainingCriteria = "AverageReward";
trainOpts.StopTrainingValue = 480;
trainOpts.Verbose = true;
trainOpts.Plots = "training-progress";

trainOpts
trainOpts = 
  rlTrainingOptions with properties:

                   MaxEpisodes: 1000
            MaxStepsPerEpisode: 1000
    ScoreAveragingWindowLength: 5
          StopTrainingCriteria: "AverageReward"
             StopTrainingValue: 480
             SaveAgentCriteria: "None"
                SaveAgentValue: "None"
            SaveAgentDirectory: "savedAgents"
               Parallelization: "none"
        ParallelizationOptions: []
                   StopOnError: "on"
                       Verbose: 1
                         Plots: "training-progress"

You can now use trainOpts as an input argument to the train command.

To turn on parallel computing for training a reinforcement learning agent, you first create an options set with the Parallelization option set to a value other than "none". For this example, configure an options set for asynchronous parallel training.

trainOpts = rlTrainingOptions('Parallelization',"async");

When you set Parallelization to this value, the software populates the ParallelizationOptions option with a default ParallelTraining object.

trainOpts.ParallelizationOptions
ans = 
  ParallelTraining with properties:

             DataToSendFromWorkers: "Experiences"
              StepsUntilDataIsSent: -1
                 WorkerRandomSeeds: -1
    TransferBaseWorkspaceVariables: "on"
                     AttachedFiles: []
                          SetupFcn: []
                        CleanupFcn: []

You can further configure the parallel computing options using dot notation. For instance, configure the workers to send data to the host every 100 steps within a training episode. Further configure the workers to send gradient data, rather than experience data.

trainOpts.ParallelizationOptions.StepsUntilDataIsSent = 100;
trainOpts.ParallelizationOptions.DataToSendFromWorkers = "Gradients";
trainOpts.ParallelizationOptions
ans = 
  ParallelTraining with properties:

             DataToSendFromWorkers: "Gradients"
              StepsUntilDataIsSent: 100
                 WorkerRandomSeeds: -1
    TransferBaseWorkspaceVariables: "on"
                     AttachedFiles: []
                          SetupFcn: []
                        CleanupFcn: []

You can now use trainOpts as an input argument to the train command to perform training with parallel computing.

Input Arguments

collapse all

Name-Value Pair Arguments

Specify optional comma-separated pairs of Name,Value arguments. Name is the argument name and Value is the corresponding value. Name must appear inside quotes. You can specify several name and value pair arguments in any order as Name1,Value1,...,NameN,ValueN.

Example: 'StopTrainingCriteria',"AverageReward",'StopTrainingValue',100

Maximum number of episodes to train the agent, specified as the comma-separated pair consisting of 'MaxEpisodes' and a positive integer. Regardless of other criteria for termination, training terminates after this many episodes.

Example: 'MaxEpisodes',1000

Maximum number of steps to run per episode, specified as the comma-separated pair consisting of 'MaxStepsPerEpisode' and a positive integer. In general, you define episode termination conditions in the environment. This value is the maximum number of steps to run in the episode if those termination conditions are not met.

Example: 'MaxStepsPerEpisode',1000

Window length for averaging scores, rewards, and numbers of steps, specified as the comma-separated pair consisting of 'ScoreAveragingWindowLength' and a positive integer. For options expressed in terms of averages, this is the number of episodes included in the average. For instance suppose that StopTrainingCriteria is "AverageReward", and StopTrainingValue is 500. Training terminates when the reward averaged over the number of episodes specified by this parameter is 500 or greater.

Example: 'ScoreAveragingWindowLength',10

Training termination condition, specified as the comma-separated pair consisting of 'StopTrainingCriteria' and one of the following strings:

  • "AverageSteps" — Stop training when the running average number of steps per episode equals or exceeds the critical value specified by the option StopTrainingValue. The average is computed using the window 'ScoreAveragingWindowLength'.

  • "AverageReward" — Stop training when the running average reward equals or exceeds the critical value.

  • "EpisodeReward" — Stop training when the reward in the current episode equals or exceeds the critical value.

  • "GlobalStepCount" — Stop training when the total number of steps in all episodes (the total number of times the agent is invoked) equals or exceeds the critical value.

  • "EpisodeCount" — Stop training when the number of training episodes equals or exceeds the critical value.

Example: 'StopTrainingCriteria',"AverageReward"

Critical value of training termination condition, specified as the comma-separated pair consisting of 'StopTrainingValue' and a scalar. Training terminates when the termination condition specified by the StopTrainingCriteria option equals or exceeds this value. For instance, if StopTrainingCriteria is "AverageReward", and StopTrainingValue is 100, then training terminates when the average reward over the number of episodes specified in 'ScoreAveragingWindowLength' equals or exceeds 100.

Example: 'StopTrainingValue',100

Condition for saving agent during training, specified as the comma-separated pair consisting of 'SaveAgentCriteria' and one of the following strings:

  • "none" — Do not save any agents during training.

  • "EpisodeReward" — Save agent when the reward in the current episode equals or exceeds the critical value.

  • "AverageSteps" — Save agent when the running average number of steps per episode equals or exceeds the critical value specified by the option StopTrainingValue. The average is computed using the window 'ScoreAveragingWindowLength'.

  • "AverageReward" — Save agent when the running average reward over all episodes equals or exceeds the critical value.

  • "GlobalStepCount" — Save agent when the total number of steps in all episodes (the total number of times the agent is invoked) equals or exceeds the critical value.

  • "EpisodeCount" — Save agent when the number of training episodes equals or exceeds the critical value.

Set this option to store candidate agents that perform well according to the criteria you specify. When you set this option to a value other than "none", the software sets the SaveAgentValue option to 500. You can change that value to specify the condition for saving the agent.

For instance, suppose you want to store for further testing any agent that yields an episode reward that equals or exceeds 100. To do so, set SaveAgentCriteria to "EpisodeReward" and set the SaveAgentValue option to 100. When an episode reward equals or exceeds 100, train saves the corresponding agent in a MAT-file in the folder specified by the SaveAgentDirectory option. The MAT-file is called AgentK.mat where K is the number of the corresponding episode. The agent is stored within that MAT-file as saved_agent.

Example: 'SaveAgentCriteria',"EpisodeReward"

Critical value of condition for saving agent, specified as the comma-separated pair consisting of 'SaveAgentValue' and "none" or a numeric scalar.

When you specify a condition for saving candidate agents using SaveAgentCriteria, the software sets this value to 500. Change the value to specify the condition for saving the agent. See the SaveAgentValue option for more details.

Example: 'SaveAgentValue',100

Folder for saved agents, specified as the comma-separated pair consisting of 'SaveAgentDirectory' and a string or character vector. The folder name can contain a full or relative path. When an episode occurs that satisfies the condition specified by the SaveAgentCriteria and SaveAgentValue options, the software saves the agent in a MAT-file in this folder. If the folder doesn't exist, train creates it. When SaveAgentCriteria is "none", this option is ignored and train does not create a folder.

Example: 'SaveAgentDirectory', pwd + "run1\Agents"

Flag for using parallel training, specified as the comma-separated pair consisting of 'UseParallel' and either true or false. Setting this option to true configures training to use parallel computing. To specify options for parallel training, use the ParallelizationOptions property.

For more information about training using parallel computing, see Train Reinforcement Learning Agents.

Using parallel computing requires Parallel Computing Toolbox™ software.

Example: 'UseParallel',true

Parallelization options to control parallel training, specified as the comma-separated pair consisting of 'ParallelizationOptions' and a ParallelTraining object. For more information about training using parallel computing, see Train Reinforcement Learning Agents.

The ParallelTraining object has the following properties, which you can modify using dot notation after creating the rlTrainingOptions object.

Parallel computing mode, specified as one of the following:

  • "sync" — Use parpool to run synchronous training on the available workers. In this case, workers pause execution until all workers are finished. The host updates the actor and critic parameters based on the results from all the workers and sends the updated parameters to all workers.

  • "async" — Use parpool to run asynchronous training on the available workers. In this case, workers send their data back to the host as soon as they finish, and then they receive updated parameters from the host. The workers then continue with their task.

Type of data that workers send to the host, specified as one of the following strings:

  • "experiences" — Send experience data (observation, action, reward, next observation, is done) to the host. For agents with gradients, the host computes gradients from the experiences.

  • "gradients" — Compute and send gradients to the host. The host applies gradients to update networks parameters.

Note

AC and PG agents accept only DataToSendFromWorkers = "gradients". DQN and DDPG agents accept only DataToSendFromWorkers = "experiences".

When workers send data to host and receive updated parameters, specified as –1 or a positive integer. This number indicates how many steps to compute during the episode before sending data to the host. When this option is –1, the worker waits until the end of the episode and then sends all step data to the host. Otherwise, the worker waits the specified number of steps before sending data.

Note

  • AC agents do not accept StepUntilDataIsSent = -1. To mimic A3C training, set StepUntilDataIsSent equal to the NumStepToLookAhead AC agent option.

  • PG agents accept only StepUntilDataIsSent = -1.

Randomizer initialization for workers, specified as one the following:

  • –1 — Assign a unique random seed to each worker. The value of the seed is the worker ID.

  • –2 — Do not assign a random seed to the workers.

  • Vector — Manually specify the random seed for each work. The number of elements in the vector must match the number of workers.

Send model and workspace variables to parallel workers, specified as "on" or "off". When the option is "on", the host sends variables used in models and defined in the base MATLAB® workspace to the host.

Additional files to attach to the parallel pool, specified as a string or string array.

Function to run before training starts, specified as a handle to a function having no input arguments. This function is run once per worker before training begins. Write this function to perform any processing that you need prior to training.

Function to run after training ends, specified as a handle to a function having no input arguments. You can write this function to clean up the workspace or perform other processing after training terminates.

Display training progress on the command line, specified as the logical values false (0) or true (1). Set to true to write information from each training episode to the MATLAB command line during training.

Stop training when an error occurs during an episode, specified as "on" or "off". When this option is "off", errors are captured and returned in the SimulationInfo output of train, and training continues to the next episode.

Display training progress with the Episode Manager, specified as "training-progress" or "none". By default, calling train opens the Reinforcement Learning Episode Manager, which graphically and numerically displays information about the training progress, such as the reward for each episode, average reward, number of episodes, and total number of steps. (For more information, see train.) To turn off this display, set this option to "none".

Output Arguments

collapse all

Option set for training reinforcement learning agents, returned as an rlTrainingOptions object. The property values of trainOpts are initialized to the default values or to the values you specify with Name,Value pairs. You can further modify the property values using dot notation. Use the options set as an input argument with train when you train reinforcement learning agents.

Introduced in R2019a