rlRepresentation

Model representation for reinforcement learning agents

Description

Use rlRepresentation to create a function approximator representation for the actor or critic of a reinforcement learning agent. To do so, you specify the observation and action signals for the training environment and options that affect the training of an agent that uses the representation. For more information on creating representations, see Create Policy and Value Function Representations.

example

rep = rlRepresentation(net,obsInfo,'Observation',obsNames) creates a representation for the deep neural network net. The observation names obsNames are the network input layer names. obsInfo contains the corresponding observation specifications for the training environment. Use this syntax to create a representation for a critic that does not require action inputs, such as a critic for an rlACAgent or rlPGAgent agent.

example

rep = rlRepresentation(net,obsInfo,actInfo,'Observation',obsNames,'Action',actNames) creates a representation with action signals specified by the names actNames and specification actInfo. Use this syntax to create a representation for any actor, or for a critic that takes both observation and action as input, such as a critic for an rlDQNAgent or rlDDPGAgent agent.

example

tableCritic = rlRepresentation(tab) creates a critic representation for the value table or Q table tab. When you create a table representation, you specify the observation and action specifications when you create tab.

critic = rlRepresentation(basisFcn,W0,obsInfo) creates a linear basis function representation using the handle to a custom basis function basisFcn and initial weight vector W0. obsInfo contains the corresponding observation specifications for the training environment. Use this syntax to create a representation for a critic that does not require action inputs, such as a critic for an rlACAgent or rlPGAgent agent.

critic = rlRepresentation(basisFcn,W0,oaInfo) creates a linear basis function representation using the specification cell array oaInfo, where oaInfo = {obsInfo,actInfo}. Use this syntax to create a representation for a critic that takes both observations and actions as inputs, such as a critic for an rlDQNAgent or rlDDPGAgent agent.

actor = rlRepresentation(basisFcn,W0,obsInfo,actInfo) creates a linear basis function representation using the specified observation and action specifications, obsInfo and actInfo, respectively. Use this syntax to create a representation for an actor that takes observations as inputs and generates actions.

example

rep = rlRepresentation(___,repOpts) creates a representation using additional options that specify learning parameters for the representation when you train an agent. Available options include the optimizer used for training and the learning rate. Use rlRepresentationOptions to create the options set repOpts. You can use this syntax with any of the previous input-argument combinations.

Examples

collapse all

Create an actor representation and a critic representation that you can use to define a reinforcement learning agent such as an Actor Critic (AC) agent.

For this example, create actor and critic representations for an agent that can be trained against the cart-pole environment described in Train AC Agent to Balance Cart-Pole System. First, create the environment. Then, extract the observation and action specifications from the environment. You need these specifications to define the agent and critic representations.

env = rlPredefinedEnv("CartPole-Discrete");
obsInfo = getObservationInfo(env);
actInfo = getActionInfo(env);

For a state-value-function critic such as those used for AC or PG agents, the inputs are the observations and the output should be a scalar value, the state value. For this example, create the critic representation using a deep neural network with one output, and with observation signals corresponding to x,xdot,theta,thetadot as described in Train AC Agent to Balance Cart-Pole System. You can obtain the number of observations from the obsInfo specification. Name the network layer input 'observation'.

numObservation = obsInfo.Dimension(1);
criticNetwork = [
    imageInputLayer([numObservation 1 1],'Normalization','none','Name','observation')
    fullyConnectedLayer(1,'Name','CriticFC')];

Specify options for the critic representation using rlRepresentationOptions. These options control parameters of critic network learning, when you train an agent that incorporates the critic representation. For this example, set the learning rate to 0.05 and the gradient threshold to 1.

repOpts = rlRepresentationOptions('LearnRate',5e-2,'GradientThreshold',1);

Create the critic representation using the specified neural network and options. Also, specify the action and observation information for the critic. Set the observation name to 'observation', which is the name you used when you created the network input layer for criticNetwork.

critic = rlRepresentation(criticNetwork,obsInfo,'Observation',{'observation'},repOpts)
critic = 
  rlLayerRepresentation with properties:

    Options: [1x1 rl.option.rlRepresentationOptions]

Similarly, create a network for the actor. An AC agent decides which action to take given observations using an actor representation. For an actor, the inputs are the observations, and the output depends on whether the action space is discrete or continuous. For the actor of this example, there are two possible discrete actions, –10 or 10. Thus, to create the actor, use a deep neural network with the same observation input as the critic, that can output these two values. You can obtain the number of actions from the actInfo specification. Name the output 'action'.

numAction = numel(actInfo.Elements); 
actorNetwork = [
    imageInputLayer([4 1 1], 'Normalization','none','Name','observation')
    fullyConnectedLayer(numAction,'Name','action')];

Create the actor representation using the observation name and specification and the action name and specification. Use the same representation options.

actor = rlRepresentation(actorNetwork,obsInfo,actInfo,...
    'Observation',{'observation'},'Action',{'action'},repOpts)
actor = 
  rlLayerRepresentation with properties:

    Options: [1x1 rl.option.rlRepresentationOptions]

You can now use the actor and critic representations to create an AC agent.

agentOpts = rlACAgentOptions(...
    'NumStepsToLookAhead',32,...
    'DiscountFactor',0.99);
agent = rlACAgent(actor,critic,agentOpts)
agent = 
  rlACAgent with properties:

    AgentOptions: [1x1 rl.option.rlACAgentOptions]

For additional examples showing how to create actor and critic representations for different agent types, see:

Create an environment interface.

env = rlPredefinedEnv("BasicGridWorld");

Create a Q table using the action and observation specifications from the environment.

qTable = rlTable(getObservationInfo(env),getActionInfo(env));

Create a representation for the Q table.

tableRep = rlRepresentation(qTable);

Assume that you have an environment, env. Obtain the observation and action specifications from the environment.

obsInfo = geObservationInfo(env);
actInfo = getActionInfo(env);

Create a custom basis function. In this case, use the quadratic basis function from Train Custom LQR Agent.

function B = computeQuadraticBasis(x,u,n)
z = cat(1,x,u);
idx = 1;
for r = 1:n
    for c = r:n
        if idx == 1
            B = z(r)*z(c);
        else
            B = cat(1,B,z(r)*z(c));
        end
        idx = idx + 1;
    end
end

Compute any dimensions and parameters required for your basis function.

nQ = size(obj.Q,1);
nR = size(obj.R,1);
n = nQ+nR;

Set an initial weight vector.

w0 = 0.1*ones(0.5*(n+1)*n,1);

Create a representation using a handle to the custom basis function.

critic = rlRepresentation(@(x,u) computeQuadraticBasis(x,u,n),w0,obsInfo,actInfo);

Input Arguments

collapse all

Deep neural network for actor or critic, specified as one of the following:

For a list of deep neural network layers, see List of Deep Learning Layers (Deep Learning Toolbox). For more information on creating deep neural networks for reinforcement learning, see Create Policy and Value Function Representations.

Observation names, specified as a cell array of character vectors. The observation names are the network input layer names you specify when you create net. The names in obsNames must be in the same order as the observation specifications in obsInfo.

Example: {'observation'}

Observation specification, specified as a reinforcement learning spec object or an array of spec objects. You can extract obsInfo from an existing environment using getObservationInfo. Or, you can construct the specs manually using a spec command such as rlFiniteSetSpec or rlNumericSpec. This specification defines such information about the observations as the dimensions and names of the observation signals.

Action name, specified as a single-element cell array that contains a character vector. The action name is the network layer name you specify when you create net. For critic networks, this layer is the first layer of the action input path. For actors, this layer is the last layer of the action output path.

Example: {'action'}

Action specification, specified as a reinforcement learning spec object. You can extract actInfo from an existing environment using getActionInfo. Or, you can construct the spec manually using a spec command such as rlFiniteSetSpec or rlNumericSpec. This specification defines such information about the action as the dimensions and name of the action signal.

For linear basis function representations, the action signal must be a scalar, a column vector, or a discrete action.

Value table or Q table for critic, specified as an rlTable object. The learnable parameters of a table representation are the elements of tab.

Custom basis function, specified as a function handle to a user-defined function. For a linear basis function representation, the output of the representation is f = W'B, where W is a weight array and B is the column vector returned by the custom basis function. The learnable parameters of a linear basis function representation are the elements of W.

When creating:

  • A critic representation with observation inputs only, your basis function must have the following signature.

    B = myBasisFunction(obs1,obs2,...,obsN)

    Here obs1 to obsN are observations in the same order and with the same data type and dimensions as the observation specifications in obsInfo.

  • A critic representation with observation and action inputs, your basis function must have the following signature.

    B = myBasisFunction(obs1,obs2,...,obsN,act)

    Here obs1 to obsN are observations in the same order and with the same data type and dimensions as the observation specifications in the first element of oaInfo, and act has the same data type and dimensions as the action specification in the second element of oaInfo.

  • An actor representation, your basis function must have the following signature.

    B = myBasisFunction(obs1,obs2,...,obsN)

    Here, obs1 to obsN are observations in the same order and with the same data type and dimensions as the observation specifications in obsInfo. The data types and dimensions of the action specification in actInfo affect the data type and dimensions of f.

Example: @(x,u) myBasisFunction(x,u)

Initial value for linear basis function weight array, W, specified as one of the following:

  • Column vector — When creating a critic representation or an actor representation with a continuous scalar action signal

  • Array — When creating an actor representation with a column vector continuous action signal or a discrete action space.

Observation and action specifications for creating linear basis function critic representations, specified as the cell array {obsInfo,actInfo}.

Representation options, specified as an option set that you create with rlRepresentationOptions. Available options include the optimizer used for training and the learning rate. See rlRepresentationOptions for details.

Output Arguments

collapse all

Deep neural network representation, returned as an rlLayerRepresentation object. Use this representation to create an agent for reinforcement learning. For more information, see Reinforcement Learning Agents.

Value or Q table critic representation, returned as an rlTableRepresentation object. Use this representation to create an agent for reinforcement learning. For more information, see Reinforcement Learning Agents.

Linear basis function critic representation, returned as and rlLinearBasisRepresentation object. Use this representation to create an agent for reinforcement learning. For more information, see Reinforcement Learning Agents.

Linear basis function actor representation, returned as and rlLinearBasisRepresentation object. Use this representation to create an agent for reinforcement learning. For more information, see Reinforcement Learning Agents.

Introduced in R2019a