How to train the RL agent on Organic rankine cycle to replace the PI controller on Simulink

5 views (last 30 days)
Hello everyone:
I am a beginner in reinforcement learning , and I am trying to design a RL controller to improve the response of traditional PID controller.
I choose the MATLAB built-in example "Rankine cycle(Steam Turbine)" which is originally controlled by a PI controller.
As far as I know, the modal parameter is well balanced. if there is a little variance on initial values or too fast changing rate, this model go diverge easily.
At first, I choose the TD3 agents and at the beginning of training, I find it always diverge (maybe caused by the initial random actions)
So I choose another direction, i enforce the TD3 agent to learn the moves of orignal PI controller in finite times.
What I thought is at least achieving the replacement first then i can refine it further. My reward function is designed as the sum of the errors squares between each agent action and the PI controller's move, then multiplied by -1 , so that if the error gets smaller, the reward getting higher.
And the Isdone condition, i used the original PI controller upper and lower limits, so that the system wouldn't go out of the designed range.
The result is : Even I can finish the training , but when i replace the PI controller with the agent, I can pass the compiler of simulink, but get nothing out. I don't know how to continue, this is going to drive me crazy.
I upload all my codes and simulink model, and asking for help. thank you for reading.
plus, because the system diverges easily, the solver setting is various step, i don't know if this affect the agent learning.

Answers (1)

Prathamesh
Prathamesh on 29 Jul 2025
Hi @JHEWEI,
I understand that you want to replace traditional PI controllers in a Rankine cycle (steam turbine) model with a Reinforcement Learning (RL) controller using the TD3 (Twin-Delayed Deep Deterministic Policy Gradient) algorithm. Due to the diverging nature of TD3, you enforced the TD3 agent to learn the moves of original PI controller in finite times.
Your Rankine cycle model is very sensitive. When the Reinforcement Learning agent starts, it makes random moves, which can easily destabilize the system and cause the simulation to crash or give no output, especially since your reward function might not be directly getting stability.
As I do not have the GPU to train the agent, I have made some assumptions
  1. To stop the "Unable to find a supported GPU device" error, I have removed the
  2. “UseDevice="gpu");“ line from the code.
  3. I assume that the Simulink model file named "RL_RC_2024b.slx" (or .mdl) is available in your MATLAB path.
Below are the steps to solve the issue:
  • In your "RL_RC_2024b" Simulink model, go inside the "RL Agent" block.
  • Add an output port for Reward and connect your reward calculation logic to it (combining PI imitation and performance goals).
  • Add another output port for ‘IsDone’ and connect your safety/termination logic to it (e.g., if levels go too high/low).
  • Before training, make the agent learn a bit from how your original PI controller works
  • Make sure your "RL Agent" block in Simulink tells you the ‘Reward’ and if the episode is ‘IsDone’.
  • Run your script and keep an eye on the training plot. You want to see the average reward steadily increasing and episodes lasting longer.
  • If the agent isn't learning well, you might need to slightly change its learning speed or how much it explores.Watch the training progress: is the reward going up? Are episodes lasting longer?
  • Place "Scope" blocks in your Simulink model on important signals (like observations, actions, reward, and IsDone) to visually check what's going on if things aren't working.
Refer below documentation link for “Twin-Delayed Deep Deterministic (TD3) Policy Gradient Agent

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!