DDPG: Actor clips outputs to zero, thus, keeping exploration minimal

3 views (last 30 days)
I'm training a DDPG agent from the Reinforcement Learning Toolbox to adjust a PI controller. Thus, the agent should output P and I. After some initial learning episodes (~ 10 to 50) with high values for both, P and I, both outputs decrease to zero.
This is followed by either of the two cases, switching from time to time:
  1. Both output values stay at zero. (marked green in the following picture)
  2. Output I stays at zero while P being a very low value. (marked purple in the following picture)
The actor is structured as follows:
featureInputLayer(20, 'Normalization', 'none', 'Name', 'state vector')
fullyConnectedLayer(20, 'Name', 'fc1')
reluLayer('Name', 'relu1')
fullyConnectedLayer(256, 'Name', 'fc2')
reluLayer('Name', 'relu2')
fullyConnectedLayer(2, 'Name', 'fc3')
tanhLayer('Name', 'output')];
The PI controller is used to control a transfer function while a timed disturbance occurs. The disturbance is always identical.
The used fitness function is the IAE-value of the speed error:
The reward then is calculated by this formula:
r = r1*(2*exp(r2*I/In)-r3) + p;
with r1, r2, r3 being constants; I is the DDPG agent's IAE value and In the IAE value of the reference system; and p being a punishment, that is capped to [-15, 0]:
p = -max(|n_ref-n_act|²) * p1;
What have I done so far:
  • trying to recreate a paper's solution
  • - agent should take action once per episode as the disturbance is detected
  • - copied the transfer function, networks sizes, observation and all options (critic, actor, DDPG agent, training)
  • - added a flexible punishment (for the system to not oscillate)
  • adjusted the range of the punishment to the range of the reward
  • set gradient threshold from 'inf' to '1'
  • set lower and upper limit within actionInfo
  • set standarddeviation to different values
  • - currently being 0.1
  • - while 1% of the action range may be 0.8943 and 10% corresponds to 8.943
  • - with standarddeviation being 0.8943: I stays zero; P explores a bit after then staying on it's max value
Many thanks in advance!
// discreteSys_Script_05.m being the main script

Answers (0)

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!