Main Content

rlStochasticActorRepresentation

Stochastic actor representation for reinforcement learning agents

Description

This object implements a function approximator to be used as a stochastic actor within a reinforcement learning agent. A stochastic actor takes the observations as inputs and returns a random action, thereby implementing a stochastic policy with a specific probability distribution. After you create an rlStochasticActorRepresentation object, use it to create a suitable agent, such as an rlACAgent or rlPGAgent agent. For more information on creating representations, see Create Policy and Value Function Representations.

Creation

Description

Discrete Action Space Stochastic Actor

example

discActor = rlStochasticActorRepresentation(net,observationInfo,discActionInfo,'Observation',obsName) creates a stochastic actor with a discrete action space, using the deep neural network net as function approximator. Here, the output layer of net must have as many elements as the number of possible discrete actions. This syntax sets the ObservationInfo and ActionInfo properties of discActor to the inputs observationInfo and discActionInfo, respectively. obsName must contain the names of the input layers of net.

example

discActor = rlStochasticActorRepresentation({basisFcn,W0},observationInfo,actionInfo) creates a discrete space stochastic actor using a custom basis function as underlying approximator. The first input argument is a two-elements cell in which the first element contains the handle basisFcn to a custom basis function, and the second element contains the initial weight matrix W0. This syntax sets the ObservationInfo and ActionInfo properties of discActor to the inputs observationInfo and actionInfo, respectively.

discActor = rlStochasticActorRepresentation(___,options) creates the discrete action space, stochastic actor discActor using the additional options set options, which is an rlRepresentationOptions object. This syntax sets the Options property of discActor to the options input argument. You can use this syntax with any of the previous input-argument combinations.

Continuous Action Space Gaussian Actor

example

contActor = rlStochasticActorRepresentation(net,observationInfo,contActionInfo,'Observation',obsName) creates a Gaussian stochastic actor with a continuous action space using the deep neural network net as function approximator. Here, the output layer of net must have twice as many elements as the number of dimensions of the continuous action space. This syntax sets the ObservationInfo and ActionInfo properties of contActor to the inputs observationInfo and contActionInfo respectively. obsName must contain the names of the input layers of net.

Note

contActor does not enforce constraints set by the action specification, therefore, when using this actor, you must enforce action space constraints within the environment.

contActor = rlStochasticActorRepresentation(___,options) creates the continuous action space, Gaussian actor contActor using the additional options option set, which is an rlRepresentationOptions object. This syntax sets the Options property of contActor to the options input argument. You can use this syntax with any of the previous input-argument combinations.

Input Arguments

expand all

Deep neural network used as the underlying approximator within the actor, specified as one of the following:

For a discrete action space stochastic actor, net must have the observations as input and a single output layer having as many elements as the number of possible discrete actions. Each element represents the probability (which must be nonnegative) of executing the corresponding action.

For a continuous action space stochastic actor, net must have the observations as input and a single output layer having twice as many elements as the number of dimensions of the continuous action space. The elements of the output vector represent all the mean values followed by all the standard deviations (which must be nonnegative) of the Gaussian distributions for the dimensions of the action space.

Note

The fact that standard deviations must be nonnegative while mean values must fall within the output range means that the network must have two separate paths. The first path must produce an estimation for the mean values, so any output nonlinearity must be scaled so that its output falls in the desired range. The second path must produce an estimation for the standard deviations, so you must use a softplus or ReLU layer to enforce nonnegativity.

The network input layers must be in the same order and with the same data type and dimensions as the signals defined in ObservationInfo. Also, the names of these input layers must match the observation names specified in obsName. The network output layer must have the same data type and dimension as the signal defined in ActionInfo.

rlStochasticActorRepresentation objects support recurrent deep neural networks.

For a list of deep neural network layers, see List of Deep Learning Layers. For more information on creating deep neural networks for reinforcement learning, see Create Policy and Value Function Representations.

Observation names, specified as a cell array of strings or character vectors. The observation names must be the names of the input layers in net.

Example: {'my_obs'}

Custom basis function, specified as a function handle to a user-defined MATLAB function. The user defined function can either be an anonymous function or a function on the MATLAB path. The output of the actor is the vector a = softmax(W'*B), where W is a weight matrix and B is the column vector returned by the custom basis function. Each element of a represents the probability of taking the corresponding action. The learnable parameters of the actor are the elements of W.

When creating a stochastic actor representation, your basis function must have the following signature.

B = myBasisFunction(obs1,obs2,...,obsN)

Here obs1 to obsN are observations in the same order and with the same data type and dimensions as the signals defined in observationInfo

Example: @(obs1,obs2,obs3) [obs3(2)*obs1(1)^2; abs(obs2(5)+obs3(1))]

Initial value of the basis function weights, W, specified as a matrix. It must have as many rows as the length of the basis function output, and as many columns as the number of possible actions.

Properties

expand all

Representation options, specified as an rlRepresentationOptions object. Available options include the optimizer used for training and the learning rate.

Observation specifications, specified as a reinforcement learning specification object or an array of specification objects defining properties such as dimensions, data type, and names of the observation signals.

rlStochasticActorRepresentation sets the ObservationInfo property of contActor or discActor to the input observationInfo.

You can extract observationInfo from an existing environment or agent using getObservationInfo. You can also construct the specifications manually using rlFiniteSetSpec or rlNumericSpec.

Action specifications, specified as a reinforcement learning specification object defining properties such as dimensions, data type and name of the action signals.

For a discrete action space actor, rlStochasticActorRepresentation sets ActionInfo to the input discActionInfo, which must be an rlFiniteSetSpec object.

For a continuous action space actor, rlStochasticActorRepresentation sets ActionInfo to the input contActionInfo, which must be an rlNumericSpec object.

You can extract actionInfo from an existing environment or agent using getActionInfo. You can also construct the specification manually using rlFiniteSetSpec or rlNumericSpec.

For custom basis function representations, the action signal must be a scalar, a column vector, or a discrete action.

Object Functions

rlACAgentActor-critic reinforcement learning agent
rlPGAgentPolicy gradient reinforcement learning agent
rlPPOAgentProximal policy optimization reinforcement learning agent
rlSACAgentSoft actor-critic reinforcement learning agent
getActionObtain action from agent or actor representation given environment observations

Examples

collapse all

Create an observation specification object (or alternatively use getObservationInfo to extract the specification object from an environment). For this example, define the observation space as a continuous four-dimensional space, so that a single observation is a column vector containing four doubles.

obsInfo = rlNumericSpec([4 1]);

Create an action specification object (or alternatively use getActionInfo to extract the specification object from an environment). For this example, define the action space as consisting of three values, -10, 0, and 10.

actInfo = rlFiniteSetSpec([-10 0 10]);

Create a deep neural network approximator for the actor. The input of the network (here called state) must accept a four-element vector (the observation vector just defined by obsInfo), and its output (here called actionProb) must be a three-element vector. Each element of the output vector must be between 0 and 1 since it represents the probability of executing each of the three possible actions (as defined by actInfo). Using softmax as the output layer enforces this requirement.

net = [  featureInputLayer(4,'Normalization','none','Name','state')
         fullyConnectedLayer(3,'Name','fc')
         softmaxLayer('Name','actionProb')  ];

Create the actor with rlStochasticActorRepresentation, using the network, the observations and action specification objects, as well as the names of the network input layer.

actor = rlStochasticActorRepresentation(net,obsInfo,actInfo,'Observation','state')
actor = 
  rlStochasticActorRepresentation with properties:

         ActionInfo: [1x1 rl.util.rlFiniteSetSpec]
    ObservationInfo: [1x1 rl.util.rlNumericSpec]
            Options: [1x1 rl.option.rlRepresentationOptions]

To validate your actor, use getAction to return a random action from the observation vector [1 1 1 1], using the current network weights.

act = getAction(actor,{[1 1 1 1]}); 
act{1}
ans = 10

You can now use the actor to create a suitable agent, such as an rlACAgent, or rlPGAgent agent.

Create an observation specification object (or alternatively use getObservationInfo to extract the specification object from an environment). For this example, define the observation space as a continuous six-dimensional space, so that a single observation is a column vector containing 6 doubles.

obsInfo = rlNumericSpec([6 1]);

Create an action specification object (or alternatively use getActionInfo to extract the specification object from an environment). For this example, define the action space as a continuous two-dimensional space, so that a single action is a column vector containing 2 doubles both between -10 and 10.

actInfo = rlNumericSpec([2 1],'LowerLimit',-10,'UpperLimit',10);

Create a deep neural network approximator for the actor. The observation input (here called myobs) must accept a six-dimensional vector (the observation vector just defined by obsInfo). The output (here called myact) must be a four-dimensional vector (twice the number of dimensions defined by actInfo). The elements of the output vector represent, in sequence, all the means and all the standard deviations of every action.

The fact that standard deviations must be non-negative while mean values must fall within the output range means that the network must have two separate paths. The first path is for the mean values, and any output nonlinearity must be scaled so that it can produce values in the output range. The second path is for the standard deviations, and you can use a softplus or relu layer to enforce non-negativity.

% input path layers (6 by 1 input and a 2 by 1 output)
inPath = [ imageInputLayer([6 1 1], 'Normalization','none','Name','myobs') 
           fullyConnectedLayer(2,'Name','infc') ]; % 2 by 1 output

% path layers for mean value (2 by 1 input and 2 by 1 output)
% using scalingLayer to scale the range
meanPath = [ tanhLayer('Name','tanh'); % output range: (-1,1)
             scalingLayer('Name','scale','Scale',actInfo.UpperLimit) ]; % output range: (-10,10)

% path layers for standard deviations (2 by 1 input and output)
% using softplus layer to make it non negative
sdevPath =  softplusLayer('Name', 'splus');

% conctatenate two inputs (along dimension #3) to form a single (4 by 1) output layer
outLayer = concatenationLayer(3,2,'Name','mean&sdev');

% add layers to network object
net = layerGraph(inPath);
net = addLayers(net,meanPath);
net = addLayers(net,sdevPath);
net = addLayers(net,outLayer);

% connect layers: the mean value path output MUST be connected to the FIRST input of the concatenationLayer
net = connectLayers(net,'infc','tanh/in');              % connect output of inPath to meanPath input
net = connectLayers(net,'infc','splus/in');             % connect output of inPath to sdevPath input
net = connectLayers(net,'scale','mean&sdev/in1');       % connect output of meanPath to gaussPars input #1
net = connectLayers(net,'splus','mean&sdev/in2');       % connect output of sdevPath to gaussPars input #2

% plot network
plot(net)

Set some training options for the actor.

actorOpts = rlRepresentationOptions('LearnRate',8e-3,'GradientThreshold',1);

Create the actor with rlStochasticActorRepresentation, using the network, the observations and action specification objects, the names of the network input layer and the options object.

actor = rlStochasticActorRepresentation(net, obsInfo, actInfo, 'Observation','myobs',actorOpts)
actor = 
  rlStochasticActorRepresentation with properties:

         ActionInfo: [1x1 rl.util.rlNumericSpec]
    ObservationInfo: [1x1 rl.util.rlNumericSpec]
            Options: [1x1 rl.option.rlRepresentationOptions]

To check your actor, use getAction to return a random action from the observation vector ones(6,1), using the current network weights.

act = getAction(actor,{ones(6,1)}); 
act{1}
ans = 2x1 single column vector

   -0.0763
    9.6860

You can now use the actor to create a suitable agent (such as an rlACAgent, rlPGAgent, or rlPPOAgent agent).

Create an observation specification object (or alternatively use getObservationInfo to extract the specification object from an environment). For this example, define the observation space as a continuous four-dimensional space, so that a single observation is a column vector containing 2 doubles.

obsInfo = rlNumericSpec([2 1]);

The stochastic actor based on a custom basis function does not support continuous action spaces. Therefore, create a discrete action space specification object (or alternatively use getActionInfo to extract the specification object from an environment with a discrete action space). For this example, define the action space as a finite set consisting of 3 possible values (named 7, 5, and 3 in this case).

actInfo = rlFiniteSetSpec([7 5 3]);

Create a custom basis function. Each element is a function of the observations defined by obsInfo.

myBasisFcn = @(myobs) [myobs(2)^2; myobs(1); exp(myobs(2)); abs(myobs(1))]
myBasisFcn = function_handle with value:
    @(myobs)[myobs(2)^2;myobs(1);exp(myobs(2));abs(myobs(1))]

The output of the actor is the action, among the ones defined in actInfo, corresponding to the element of softmax(W'*myBasisFcn(myobs)) which has the highest value. W is a weight matrix, containing the learnable parameters, which must have as many rows as the length of the basis function output, and as many columns as the number of possible actions.

Define an initial parameter matrix.

W0 = rand(4,3);

Create the actor. The first argument is a two-element cell containing both the handle to the custom function and the initial parameter matrix. The second and third arguments are, respectively, the observation and action specification objects.

actor = rlStochasticActorRepresentation({myBasisFcn,W0},obsInfo,actInfo)
actor = 
  rlStochasticActorRepresentation with properties:

         ActionInfo: [1x1 rl.util.rlFiniteSetSpec]
    ObservationInfo: [1x1 rl.util.rlNumericSpec]
            Options: [1x1 rl.option.rlRepresentationOptions]

To check your actor use the getAction function to return one of the three possible actions, depending on a given random observation and on the current parameter matrix.

v = getAction(actor,{rand(2,1)})
v = 1x1 cell array
    {[3]}

You can now use the actor (along with an critic) to create a suitable discrete action space agent.

For this example, you create a stochastic actor with a discrete action space using a recurrent neural network. You can also use a recurrent neural network for a continuous stochastic actor using the same method.

Create an environment and obtain observation and action information.

env = rlPredefinedEnv('CartPole-Discrete');
obsInfo = getObservationInfo(env);
actInfo = getActionInfo(env);
numObs = obsInfo.Dimension(1);
numDiscreteAct = numel(actInfo.Elements);

Create a recurrent deep neural network for the actor. To create a recurrent neural network, use a sequenceInputLayer as the input layer and include at least one lstmLayer.

actorNetwork = [
    sequenceInputLayer(numObs,'Normalization','none','Name','state')
    fullyConnectedLayer(8,'Name','fc')
    reluLayer('Name','relu')
    lstmLayer(8,'OutputMode','sequence','Name','lstm')
    fullyConnectedLayer(numDiscreteAct,'Name','output')
    softmaxLayer('Name','actionProb')];

Create a stochastic actor representation for the network.

actorOptions = rlRepresentationOptions('LearnRate',1e-3,'GradientThreshold',1);
actor = rlStochasticActorRepresentation(actorNetwork,obsInfo,actInfo,...
    'Observation','state', actorOptions);
Introduced in R2020a