Receiving only one joint angle instead of a cycle of values necessary for walking during simulation?

2 views (last 30 days)
I am currently working on an Imitation Learning algorithm that teaches a humanoid robot how to walk by controlling its six joint angles. After training the agent with both the TD3 and DDPG algorithms for 2000 episodes, each lasting 10 seconds, I was able to achieve a reward of approximately 900 by ensuring the robot used the correct angles to walk.
However, when I try to simulate the robot with the trained agent, it only receives one joint value for each joint rather than a continuous cycle of values needed for walking. As a result, the robot only raises its knee and ankle slightly before standing still until the simulation time is up. I currently use six actions and 23 observations to control the following joints:
  • Left Hip Pitch
  • Right Hip Pitch
  • Left Knee Pitch
  • Right Knee Pitch
  • Left Ankle Pitch
  • Right Ankle Pitch
I read on the Reinforcement Learning Toolbox page that external action signals are used for Imitation Learning applications. I'm wondering why I am only receiving one joint angle instead of the cycle of values necessary for walking, or if there is anything I may have overlooked.
Ultimately, my goal is to train the agent with control angles and apply this policy in a new environment with similar observations and joint angles to avoid having to start training from scratch each time. Thank you in advance for any guidance or advice you may have.

Accepted Answer

Emmanouil Tzorakoleftherakis
Hello,
There are several open questions here:
1) If you want to use imitation learning, you need to have input output data. In the model you are showing above, you are feeding input data and are collecting the output data from the simulation. Question is how do you know that the input data you provide actually lead to walking behavior? If that's not the case, then the imitation learning will learn something, but not what you might expect. What I mean is, there should be another feedback loop around the controller you are trying to imitate.
2) When you test the trained policy, the "use external action" signal should be zero. The model above shows it's one.
3) The control input you pass as external actions seem to come from a constant block? That's not how it should be, maybe that's why you are seeing a single value. There should be some logic that selects the appropriate action at eaqch time step and feeds that to the RL Agent block. Related to #1, you would also most likely need to have a feedback loop here. Maybe you can use a MATLAB Function block to accomplish this.
Hope this is helpful
  3 Comments
Emmanouil Tzorakoleftherakis
Think of a case where the initial position of the robot is different than what you have now (e.g. the robot is falling). In that case , if you still use the same control angles you are using now, the robot won't be able to recover. That's why you would need your "expert" to use feedback, so that your policy is also robust. Regardless of that, I think the issue is that you provide the whole "data" (values of every time step until the simulation is finished) whereas you should only provide the value for the current time step
Lokeshwaran Manohar
Lokeshwaran Manohar on 6 Mar 2023
If the robot is about to fall state (i.e there is a set of joint limits it should not exceed) if it's exceeds then the episode is finished and the new episode starts with the same initial position again. So this helps the robot to recover. The same data is fed for every time step. Since I use a reset function, the robot always recovers. Regardless of the issue, if I train a robot with imitation learning for 2000 episodes and another robot with 10000 episodes. Will the behaviour robot with more trained episodes behave better? Because in reinforcement learning, the robot is trained as much as episodes possible.
In my case, I trained the robot for 2000 episodes with the control angles and I used that agent to train with DDPG algorithm for another 5000 episodes. The robot performed better after training for 5000 episodes with DDPG.

Sign in to comment.

More Answers (0)

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!