5 views (last 30 days)
Akira Agata on 21 Oct 2021
"Window length for averaging the scores, rewards, and number of steps for each agent, specified as a scalar or vector.
If the training environment contains a single agent, specify ScoreAveragingWindowLength as a scalar.
If the training environment is a multi-agent Simulink® environment, specify a scalar to apply the same window length to all agents.
To use a different window length for each agent, specify ScoreAveragingWindowLength as a vector. In this case, the order of the elements in the vector correspond to the order of the agents used during environment creation.
For options expressed in terms of averages, ScoreAveragingWindowLength is the number of episodes included in the average. For instance, if StopTrainingCriteria is "AverageReward", and StopTrainingValue is 500 for a given agent, then for that agent, training terminates when the average reward over the number of episodes specified in ScoreAveragingWindowLength equals or exceeds 500."