Hello everyone:
I am a beginner in reinforcement learning , and I am trying to design a RL controller to improve the response of traditional PID controller.
I choose the MATLAB built-in example "Rankine cycle(Steam Turbine)" which is originally controlled by a PI controller.
As far as I know, the modal parameter is well balanced. if there is a little variance on initial values or too fast changing rate, this model go diverge easily.
At first, I choose the TD3 agents and at the beginning of training, I find it always diverge (maybe caused by the initial random actions)
So I choose another direction, i enforce the TD3 agent to learn the moves of orignal PI controller in finite times.
What I thought is at least achieving the replacement first then i can refine it further. My reward function is designed as the sum of the errors squares between each agent action and the PI controller's move, then multiplied by -1 , so that if the error gets smaller, the reward getting higher.
And the Isdone condition, i used the original PI controller upper and lower limits, so that the system wouldn't go out of the designed range.
The result is : Even I can finish the training , but when i replace the PI controller with the agent, I can pass the compiler of simulink, but get nothing out. I don't know how to continue, this is going to drive me crazy.
I upload all my codes and simulink model, and asking for help. thank you for reading.
plus, because the system diverges easily, the solver setting is various step, i don't know if this affect the agent learning.