PPO algorithm training problem in Reinforcement Learning Toolbox
Show older comments

In the PPO training algorithm , here mentioned “For each experience sequence that does not contain a terminal state, N is equal to the ExperienceHorizon option value. Otherwise, N is less than ExperienceHorizon and SN is the terminal state.” ,
Here's my question :When N is smaller than ExperienceHorizon and N is also smaller than the size of mini-batch data, and this continues for multiple consecutive episodes, When does the algorithm update the parameters in this case?
AND another one question is :When will the PPO parameter be updated under the following parameter Settings:
agentOpts = rlPPOAgentOptions(...
'ExperienceHorizon',10000,...
'MiniBatchSize',64,...
'NumEpoch',3,...)
trainOpts = rlTrainingOptions(...
'MaxEpisodes',10000,...
'MaxStepsPerEpisode',30,... )
Accepted Answer
More Answers (0)
Categories
Find more on Reinforcement Learning in Help Center and File Exchange
Products
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!