Can episode Q0 ( DDPG agent) be referred as an indicator for training quality?
3 views (last 30 days)
Show older comments
I am trying to use RL toolbox to obtain engine emission controller, using a DDPG agent to obtain the actions. I am training the agent for 3000 episodes and wanted to understand about the training termination criteria.
- In my case, the episode reward varies a lot for almost the complete training process (probably because I set 'ISDONE' signal to False)
- The episode Q0, however, is unstable in the beginning and later reaches almost saturation after around 1700 episodes
Hence, I would like to understand whether a stable episode Q0 can be used as an indication for the learning quality of the RL agent?
PS- I am using the DDPG agent for my problem statement.
0 Comments
Answers (1)
Ayush Modi
on 17 Jan 2024
Hi Pradyumna,
I found following answer in the community regarding Episode Q0. It is not necessary for Episode Q0 to be an indication of the learning quality of the RL agent for actor-critic methods.
"In general, it is not required for this to happen for actor-critic mathods. The actor may converge first and at that point it would be totally fine to stop training."
0 Comments
See Also
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!