太雅 藤森
太雅 藤森 on 21 Oct 2021
Answered: Akira Agata on 21 Oct 2021
太雅 藤森
太雅 藤森 on 21 Oct 2021

Accepted Answer

Akira Agata
Akira Agata on 21 Oct 2021
"Window length for averaging the scores, rewards, and number of steps for each agent, specified as a scalar or vector.
If the training environment contains a single agent, specify ScoreAveragingWindowLength as a scalar.
If the training environment is a multi-agent Simulink® environment, specify a scalar to apply the same window length to all agents.
To use a different window length for each agent, specify ScoreAveragingWindowLength as a vector. In this case, the order of the elements in the vector correspond to the order of the agents used during environment creation.
For options expressed in terms of averages, ScoreAveragingWindowLength is the number of episodes included in the average. For instance, if StopTrainingCriteria is "AverageReward", and StopTrainingValue is 500 for a given agent, then for that agent, training terminates when the average reward over the number of episodes specified in ScoreAveragingWindowLength equals or exceeds 500."

