train
Syntax
Description
trains one or more reinforcement learning agents within a specified environment,
using default training options. Although trainStats
= train(env
,agents
)agents
is an input
argument, after each training episode, train
updates the
parameters of each agent specified in agents
to maximize
their expected long-term reward from the environment. This is possible because
each agent is an handle object. When training terminates,
agents
reflects the state of each agent at the end of
the final training episode.
Note
To train an off-policy agent offline using existing data, use
trainFromData
.
performs the same training as the previous syntax.trainStats
= train(agents
,env
)
trains trainStats
= train(___,trainOpts
)agents
within env
, using the
training options object trainOpts
. Use training options to
specify training parameters such as the criteria for terminating training, when
to save agents, the maximum number of episodes to train, and the maximum number
of steps per episode. Use this syntax after any of the input arguments in the
previous syntaxes.
resumes training from the last values of the agent parameters and training
results contained in trainStats
= train(agents
,env
,prevTrainStats
)prevTrainStats
obtained after the
previous function call to train
.
logs training data using the trainStats
= train(___,logger=lgr
)FileLogger
or
MonitorLogger
lgr
object.
Examples
Input Arguments
Output Arguments
Tips
train
updates the agents as training progresses. To preserve the original agent parameters for later use, save the agents to a MAT-file.By default, calling
train
opens the Reinforcement Learning Episode Manager, which lets you visualize the progress of the training. The Episode Manager plot shows the reward for each episode, a running average reward value, and the critic estimate Q0 (for agents that have critics). The Episode Manager also displays various episode and training statistics. To turn off the Reinforcement Learning Episode Manager, set thePlots
option oftrainOpts
to"none"
.If you use a predefined environment for which there is a visualization, you can use
plot(env)
to visualize the environment. If you callplot(env)
before training, then the visualization updates during training to allow you to visualize the progress of each episode. (For custom environments, you must implement your ownplot
method.)Training terminates when the conditions specified in
trainOpts
are satisfied. To terminate training in progress, in the Reinforcement Learning Episode Manager, click Stop Training. Becausetrain
updates the agent at each episode, you can resume training by callingtrain(agent,env,trainOpts)
again, without losing the trained parameters learned during the first call totrain
.During training, you can save candidate agents that meet conditions you specify with
trainOpts
. For instance, you can save any agent whose episode reward exceeds a certain value, even if the overall condition for terminating training is not yet satisfied.train
stores saved agents in a MAT-file in the folder you specify withtrainOpts
. Saved agents can be useful, for instance, to allow you to test candidate agents generated during a long-running training process. For details about saving criteria and saving location, seerlTrainingOptions
.
Algorithms
In general, train
performs the following iterative steps:
Initialize
agent
.For each episode:
Reset the environment.
Get the initial observation s0 from the environment.
Compute the initial action a0 = μ(s0).
Set the current action to the initial action (a←a0) and set the current observation to the initial observation (s←s0).
While the episode is not finished or terminated:
Step the environment with action a to obtain the next observation s' and the reward r.
Learn from the experience set (s,a,r,s').
Compute the next action a' = μ(s').
Update the current action with the next action (a←a') and update the current observation with the next observation (s←s').
Break if the episode termination conditions defined in the environment are met.
If the training termination condition defined by
trainOpts
is met, terminate training. Otherwise, begin the next episode.
The specifics of how train
performs these computations depends on
your configuration of the agent and environment. For instance, resetting the environment
at the start of each episode can include randomizing initial state values, if you
configure your environment to do so.