Custom Action Space DDPG Reinforcement Learning Agent

9 views (last 30 days)
After running into a challenge with my reinforcement learning agent I hope you can help me with at least a little hint.
My DDPG agent has a continuous action space which works totally fine. Unfortunately it cannot get transfered to a real-life system this way. Trying to find an optimal value for the actions in different situations the agent should avoid certain combinations.
The action space is defined like:
actionInfo = rlNumericSpec([4 1], ...
'LowerLimit', [0; 0; 0; 0], ...
'UpperLimit', [maxA1; maxA2; maxA3; maxA4]);
But due to restrictions in the real-life system it should more be like
A1 = (0 || [minA1; maxA1])
to avoid actions in the range
A1 = ]0; minA1[
Is there any possibility to define my action space this way?
Note:
I have already tried to route the agent to avoid actions in this range by penalizing it via the reward but it doesn't seem to work out. Instead of steadily improving over the episodes it now tends more to a sideways movement after reaching a certain (not desirable) level.
Thanks in advance!

Accepted Answer

Emmanouil Tzorakoleftherakis
To my knowledge, you cannot implement a custom action space with rlNumericSpec, but what you could possibly do (since adding penalty terms in the reward does not help), is to add some additional logic to manipulate the agent's actions/output of RL agent block. Your policy would then be the combined neural network+new logic. Just an idea
  3 Comments
Emmanouil Tzorakoleftherakis
This will change the data stored in experience buffers/mini batches during training, as well as logged data when you perform simulations after training. For the latter, you can just choose to log the respective signal after the action transformation. For the former, I don't think it will cause issues. You can think of the additional logic as an extra layer in your neural network that only does algebraic manipulations (like a scaling layer for instance). There are no weights/parameters to be learned.
The three candidate places you mentioned should lead to the same results. Just for visualization purposes (I am assuming you use Simulink since you mentioned 'AgentWrapper'), I would add the logic right after the agent block, and put both under a separate subsystem so that you can treat the agent+logic as your new decision making system.
Hans-Joachim Steinort
Hans-Joachim Steinort on 6 Mar 2020
Edited: Hans-Joachim Steinort on 12 Mar 2020
Thank you for your explanation!
This actually helped me to wrap my head around this issue. I will definitively try out your suggestion with the additional logic and will come back to you afterwards.
EDIT:
It worked the way you suggested, thanks a lot!

Sign in to comment.

More Answers (0)

Products


Release

R2019a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!