Can episode Q0 ( DDPG agent) be referred as an indicator for training quality?

Question

Pradyumna Saripalli on 16 Apr 2022

0
Link

Direct link to this question

https://nl.mathworks.com/matlabcentral/answers/1697825-can-episode-q0-ddpg-agent-be-referred-as-an-indicator-for-training-quality

Answered: Ayush Modi on 17 Jan 2024

I am trying to use RL toolbox to obtain engine emission controller, using a DDPG agent to obtain the actions. I am training the agent for 3000 episodes and wanted to understand about the training termination criteria.

In my case, the episode reward varies a lot for almost the complete training process (probably because I set 'ISDONE' signal to False)
The episode Q0, however, is unstable in the beginning and later reaches almost saturation after around 1700 episodes

Hence, I would like to understand whether a stable episode Q0 can be used as an indication for the learning quality of the RL agent?

PS- I am using the DDPG agent for my problem statement.

0 Comments
Show -2 older commentsHide -2 older comments

Sign in to comment.

Sign in to answer this question.

Answer 1

Ayush Modi on 17 Jan 2024

0
Link

Direct link to this answer

https://nl.mathworks.com/matlabcentral/answers/1697825-can-episode-q0-ddpg-agent-be-referred-as-an-indicator-for-training-quality#answer_1391451

Open in MATLAB Online

Hi Pradyumna,

I found following answer in the community regarding Episode Q0. It is not necessary for Episode Q0 to be an indication of the learning quality of the RL agent for actor-critic methods.

https://www.mathworks.com/matlabcentral/answers/854195-what-exactly-is-episode-q0-what-information-is-it-giving

"In general, it is not required for this to happen for actor-critic mathods. The actor may converge first and at that point it would be totally fine to stop training."

0 Comments
Show -2 older commentsHide -2 older comments

Sign in to comment.

Can episode Q0 ( DDPG agent) be referred as an indicator for training quality?

0 Comments
Show -2 older commentsHide -2 older comments

Answers (1)

0 Comments
Show -2 older commentsHide -2 older comments

See Also

Categories

Tags

Products

Release

Community Treasure Hunt

Can episode Q0 ( DDPG agent) be referred as an indicator for training quality?

0 Comments Show -2 older commentsHide -2 older comments

Answers (1)

0 Comments Show -2 older commentsHide -2 older comments

See Also

Categories

Tags

Products

Release

Community Treasure Hunt

0 Comments
Show -2 older commentsHide -2 older comments

0 Comments
Show -2 older commentsHide -2 older comments