DDPG: Actor clips outputs to zero, thus, keeping exploration minimal
3 views (last 30 days)
Show older comments
I'm training a DDPG agent from the Reinforcement Learning Toolbox to adjust a PI controller. Thus, the agent should output P and I. After some initial learning episodes (~ 10 to 50) with high values for both, P and I, both outputs decrease to zero.
This is followed by either of the two cases, switching from time to time:
- Both output values stay at zero. (marked green in the following picture)
- Output I stays at zero while P being a very low value. (marked purple in the following picture)

The actor is structured as follows:
featureInputLayer(20, 'Normalization', 'none', 'Name', 'state vector')
fullyConnectedLayer(20, 'Name', 'fc1')
reluLayer('Name', 'relu1')
fullyConnectedLayer(256, 'Name', 'fc2')
reluLayer('Name', 'relu2')
fullyConnectedLayer(2, 'Name', 'fc3')
tanhLayer('Name', 'output')];
The PI controller is used to control a transfer function while a timed disturbance occurs. The disturbance is always identical.
The used fitness function is the IAE-value of the speed error:

The reward then is calculated by this formula:
r = r1*(2*exp(r2*I/In)-r3) + p;
with r1, r2, r3 being constants; I is the DDPG agent's IAE value and In the IAE value of the reference system; and p being a punishment, that is capped to [-15, 0]:
p = -max(|n_ref-n_act|²) * p1;
What have I done so far:
- trying to recreate a paper's solution
- - agent should take action once per episode as the disturbance is detected
- - copied the transfer function, networks sizes, observation and all options (critic, actor, DDPG agent, training)
- - added a flexible punishment (for the system to not oscillate)
- adjusted the range of the punishment to the range of the reward
- set gradient threshold from 'inf' to '1'
- set lower and upper limit within actionInfo
- set standarddeviation to different values
- - currently being 0.1
- - while 1% of the action range may be 0.8943 and 10% corresponds to 8.943
- - with standarddeviation being 0.8943: I stays zero; P explores a bit after then staying on it's max value
Many thanks in advance!
// discreteSys_Script_05.m being the main script
0 Comments
Answers (0)
See Also
Categories
Find more on Agents in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!