Reinforcement Learning Random Action Generator

3 views (last 30 days)
Greeting. I'm jason a robotics student and I would really appreciate it if you could help me with the questions below.
Consider that we have an RL environment described as follows:
numObs = 10;
ObservationInfo = rlNumericSpec([numObs 1]);
ObservationInfo.Name = 'Robot Observations';
numAct = 15;
ActionInfo = rlNumericSpec([numAct 1]);
ActionInfo.UpperLimit = [5; 5; 2; 2; 1; 3; 6; 5; 6; 5; 1; 1; 1; 1; 1];
ActionInfo.LowerLimit = [1; 1; -2; -2; -2; -6; -12; -5; -6; -3;-1 ;-1 ;-1 ;-1 ;-1];
ActionInfo.Name = 'Robot Actions';
1_ In the function step(env, Action), the function takes the Action and nvironment as an input and implements the robot dynamics. In which part of the code should I describe the Action Parameter.
2_ Does the random action generator of a system in RL toolbox generate random action in the range of upper limit and lower limit of the ActionInfo? How does the process of random action generator work?
3_ Is there a way we can define our own random action generator for an RL agent?
Thanks in Advace
Regards
Jason

Accepted Answer

Emmanouil Tzorakoleftherakis
Edited: Emmanouil Tzorakoleftherakis on 15 Sep 2020
Hi Jason,
1) I am not really sure what you mean. There are two ways to create custom environments in MATLAB - one is using custom functions, and the other using a custom class template. If the links don't have the answer you are looking for please let me know.
2) Which algorithm are you referring to? I am assuming you are referring to a continuous method like DDPG since your questions is about respecting bounds. DDPG adds a random value to an action generated by the policy using a noise model. You are responsible for choosing the parameters of the noise model so that exploration happens within your desired range otherwise the actions will always be clipped based on your upper and lower limits. Make sure that you use a tanh and a scaling layer at the end of your actor to shape the action outputs of the policy in your desired range as well (noise will be added on top of that).
3) Again, for DDPG, you can find details of the implemented noise model here. There are many parameters you can change to customize this model, but it is not possible to use a custom one yet (we are working on it).
  3 Comments
Emmanouil Tzorakoleftherakis
Random actions are not always between -1 and 1. Values depend on the values you select for Mean and Variance. You do not need to use interpolation, I believe you can generate a noise vector from the provided model that matches your desired range (so select difference mean/variance values for each action)
Jason Smith
Jason Smith on 16 Sep 2020
Thank you very much. This really helped

Sign in to comment.

More Answers (0)

Products


Release

R2020a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!