Receiving only one joint angle instead of a cycle of values necessary for walking during simulation?

Question

Lokeshwaran Manohar on 4 Mar 2023

0
Link

Direct link to this question

https://nl.mathworks.com/matlabcentral/answers/1923000-receiving-only-one-joint-angle-instead-of-a-cycle-of-values-necessary-for-walking-during-simulation

Commented: Lokeshwaran Manohar on 6 Mar 2023

Accepted Answer: Emmanouil Tzorakoleftherakis

I am currently working on an Imitation Learning algorithm that teaches a humanoid robot how to walk by controlling its six joint angles. After training the agent with both the TD3 and DDPG algorithms for 2000 episodes, each lasting 10 seconds, I was able to achieve a reward of approximately 900 by ensuring the robot used the correct angles to walk.

However, when I try to simulate the robot with the trained agent, it only receives one joint value for each joint rather than a continuous cycle of values needed for walking. As a result, the robot only raises its knee and ankle slightly before standing still until the simulation time is up. I currently use six actions and 23 observations to control the following joints:

Left Hip Pitch
Right Hip Pitch
Left Knee Pitch
Right Knee Pitch
Left Ankle Pitch
Right Ankle Pitch

I read on the Reinforcement Learning Toolbox page that external action signals are used for Imitation Learning applications. I'm wondering why I am only receiving one joint angle instead of the cycle of values necessary for walking, or if there is anything I may have overlooked.

Ultimately, my goal is to train the agent with control angles and apply this policy in a new environment with similar observations and joint angles to avoid having to start training from scratch each time. Thank you in advance for any guidance or advice you may have.

0 Comments
Show -2 older commentsHide -2 older comments

Sign in to comment.

Sign in to answer this question.

Answer 1

Emmanouil Tzorakoleftherakis on 6 Mar 2023

1
Link

Direct link to this answer

https://nl.mathworks.com/matlabcentral/answers/1923000-receiving-only-one-joint-angle-instead-of-a-cycle-of-values-necessary-for-walking-during-simulation#answer_1186505

Hello,

There are several open questions here:

1) If you want to use imitation learning, you need to have input output data. In the model you are showing above, you are feeding input data and are collecting the output data from the simulation. Question is how do you know that the input data you provide actually lead to walking behavior? If that's not the case, then the imitation learning will learn something, but not what you might expect. What I mean is, there should be another feedback loop around the controller you are trying to imitate.

2) When you test the trained policy, the "use external action" signal should be zero. The model above shows it's one.

3) The control input you pass as external actions seem to come from a constant block? That's not how it should be, maybe that's why you are seeing a single value. There should be some logic that selects the appropriate action at eaqch time step and feeds that to the RL Agent block. Related to #1, you would also most likely need to have a feedback loop here. Maybe you can use a MATLAB Function block to accomplish this.

Hope this is helpful

3 Comments
Show 1 older commentHide 1 older comment

Emmanouil Tzorakoleftherakis on 6 Mar 2023

Think of a case where the initial position of the robot is different than what you have now (e.g. the robot is falling). In that case , if you still use the same control angles you are using now, the robot won't be able to recover. That's why you would need your "expert" to use feedback, so that your policy is also robust. Regardless of that, I think the issue is that you provide the whole "data" (values of every time step until the simulation is finished) whereas you should only provide the value for the current time step

Lokeshwaran Manohar on 6 Mar 2023

If the robot is about to fall state (i.e there is a set of joint limits it should not exceed) if it's exceeds then the episode is finished and the new episode starts with the same initial position again. So this helps the robot to recover. The same data is fed for every time step. Since I use a reset function, the robot always recovers. Regardless of the issue, if I train a robot with imitation learning for 2000 episodes and another robot with 10000 episodes. Will the behaviour robot with more trained episodes behave better? Because in reinforcement learning, the robot is trained as much as episodes possible.

In my case, I trained the robot for 2000 episodes with the control angles and I used that agent to train with DDPG algorithm for another 5000 episodes. The robot performed better after training for 5000 episodes with DDPG.

Sign in to comment.

Receiving only one joint angle instead of a cycle of values necessary for walking during simulation?

0 Comments
Show -2 older commentsHide -2 older comments

Accepted Answer

3 Comments
Show 1 older commentHide 1 older comment

More Answers (0)

See Also

Categories

Tags

Products

Release

Community Treasure Hunt

Receiving only one joint angle instead of a cycle of values necessary for walking during simulation?

0 Comments Show -2 older commentsHide -2 older comments

Accepted Answer

3 Comments Show 1 older commentHide 1 older comment

More Answers (0)

See Also

Categories

Tags

Products

Release

Community Treasure Hunt

0 Comments
Show -2 older commentsHide -2 older comments

3 Comments
Show 1 older commentHide 1 older comment