rlStochasticActorPolicy

Policy object to generate stochastic actions for custom training loops and application deployment

Since R2022a

Description

This object implements a stochastic policy, which returns stochastic actions given an input observation, according to a probability distribution. You can create an rlStochasticActorPolicy object from an rlDiscreteCategoricalActor or rlContinuousGaussianActor, or extract it from an rlPGAgent, rlACAgent, rlPPOAgent, rlTRPOAgent, or rlSACAgent. You can then train the policy object using a custom training loop or deploy it for your application using generatePolicyBlock or generatePolicyFunction. If UseMaxLikelihoodAction is set to 1 the policy is deterministic, therefore in this case it does not explore. For more information on policies and value functions, see Create Policies and Value Functions.

Creation

Syntax

policy = rlStochasticActorPolicy(actor)

Description

example

policy = rlStochasticActorPolicy(actor) creates the stochastic policy object policy from the continuous Gaussian or discrete categorical actor actor. It also sets the Actor property of policy to the input argument actor.

Properties

expand all

`Actor` — Actor
`rlDiscreteCategoricalActor` object | `rlContinuousGaussianActor` object

Actor, specified as an rlContinuousGaussianActor or rlDiscreteCategoricalActor object.

`UseMaxLikelihoodAction` — Option to enable maximum likelihood action
`false` (default) | `true`

Option to enable maximum likelihood action, specified as a logical value: either false (default, the action is sampled from the probability distribution, this helps exploration) or true (always using maximum likelihood action). When the option to always use the maximum likelihood action enabled the policy is deterministic and therefore it does not explore.

Example: false

`ObservationInfo` — Observation specifications
`rlFiniteSetSpec` object | `rlNumericSpec` object | array

Observation specifications, specified as an rlFiniteSetSpec or rlNumericSpec object or an array of such objects. These objects define properties such as the dimensions, data types, and names of the observation channels.

`ActionInfo` — Action specifications
`rlFiniteSetSpec` object | `rlNumericSpec` object

Action specifications, specified as an rlFiniteSetSpec or rlNumericSpec object. This object defines the properties of the environment action channel, such as its dimensions, data type, and name.

Note

Only one action channel is allowed.

`SampleTime` — Sample time of policy
positive scalar | `-1` (default)

Sample time of the policy, specified as a positive scalar or as -1 (default). Setting this parameter to -1 allows for event-based simulations.

Within a Simulink^® environment, the RL Agent block in which the policy is specified executes every SampleTime seconds of simulation time. If SampleTime is -1, the block inherits the sample time from its parent subsystem.

Within a MATLAB^® environment, the policy is executed every time the environment advances. In this case, SampleTime is the time interval between consecutive elements in the output experience. If SampleTime is -1, the sample time is treated as being equal to 1.

Example: 0.2

Object Functions

`generatePolicyBlock`	Generate Simulink block that evaluates policy of an agent or policy object
`generatePolicyFunction`	Generate MATLAB function that evaluates policy of an agent or policy object
`getAction`	Obtain action from agent, actor, or policy object given environment observations
`getLearnableParameters`	Obtain learnable parameter values from agent, function approximator, or policy object
`reset`	Reset environment, agent, experience buffer, or policy object
`setLearnableParameters`	Set learnable parameter values of agent, function approximator, or policy object

Examples

collapse all

Create Stochastic Actor Policy from Discrete Categorical Actor

Open Live Script

Create observation and action specification objects. For this example, define a continuous four-dimensional observation space and a discrete action space having two possible actions.

obsInfo = rlNumericSpec([4 1]);
actInfo = rlFiniteSetSpec([-1 1]);

Alternatively, use getObservationInfo and getActionInfo to extract the specification objects from an environment.

Create a discrete categorical actor. This actor must accept an observation as input and return an output vector in which each element represents the probability of taking the corresponding action.

To approximate the policy function within the actor, use a deep neural network model. Define the network as an array of layer objects, and get the dimension of the observation space and the number of possible actions from the environment specification objects.

layers = [ 
    featureInputLayer(obsInfo.Dimension(1))
    fullyConnectedLayer(16)
    reluLayer
    fullyConnectedLayer(numel(actInfo.Elements)) 
    ];

Convert the network to a dlnetwork object and display the number of weights.

model = dlnetwork(layers);
summary(model)

   Initialized: true

   Number of learnables: 114

   Inputs:
      1   'input'   4 features

Create the actor using model, and the observation and action specifications.

actor = rlDiscreteCategoricalActor(model,obsInfo,actInfo)

actor = 
  rlDiscreteCategoricalActor with properties:

    ObservationInfo: [1x1 rl.util.rlNumericSpec]
         ActionInfo: [1x1 rl.util.rlFiniteSetSpec]
      Normalization: "none"
          UseDevice: "cpu"
         Learnables: {4x1 cell}
              State: {0x1 cell}

To return the probability distribution of the possible actions as a function of a random observation, and given the current network weights, use evaluate.

prb = evaluate(actor,{rand(obsInfo.Dimension)});
prb{1}

ans = 2x1 single column vector

    0.5850
    0.4150

Create a policy object from actor.

policy = rlStochasticActorPolicy(actor)

policy = 
  rlStochasticActorPolicy with properties:

                     Actor: [1x1 rl.function.rlDiscreteCategoricalActor]
    UseMaxLikelihoodAction: 0
             Normalization: "none"
           ObservationInfo: [1x1 rl.util.rlNumericSpec]
                ActionInfo: [1x1 rl.util.rlFiniteSetSpec]
                SampleTime: -1

You can access the policy options using dot notation. For example, set the option to always use the maximum likelihood action, thereby making the policy deterministic.

policy.UseMaxLikelihoodAction = true

policy = 
  rlStochasticActorPolicy with properties:

                     Actor: [1x1 rl.function.rlDiscreteCategoricalActor]
    UseMaxLikelihoodAction: 1
             Normalization: "none"
           ObservationInfo: [1x1 rl.util.rlNumericSpec]
                ActionInfo: [1x1 rl.util.rlFiniteSetSpec]
                SampleTime: -1

Check the policy with a random observation input.

act = getAction(policy,{rand(obsInfo.Dimension)});
act{1}

ans = -1

You can now train the policy with a custom training loop and then deploy it to your application.

Version History

Introduced in R2022a

rlStochasticActorPolicy

Description

Creation

Syntax

Description

Properties

`Actor` — Actor
`rlDiscreteCategoricalActor` object | `rlContinuousGaussianActor` object

`UseMaxLikelihoodAction` — Option to enable maximum likelihood action
`false` (default) | `true`

`ObservationInfo` — Observation specifications
`rlFiniteSetSpec` object | `rlNumericSpec` object | array

`ActionInfo` — Action specifications
`rlFiniteSetSpec` object | `rlNumericSpec` object

`SampleTime` — Sample time of policy
positive scalar | `-1` (default)

Object Functions

Examples

Create Stochastic Actor Policy from Discrete Categorical Actor

Version History

See Also

Functions

Objects

Blocks

Topics

rlStochasticActorPolicy

Description

Creation

Syntax

Description

Properties

Actor — Actor rlDiscreteCategoricalActor object | rlContinuousGaussianActor object

UseMaxLikelihoodAction — Option to enable maximum likelihood action false (default) | true

ObservationInfo — Observation specifications rlFiniteSetSpec object | rlNumericSpec object | array

ActionInfo — Action specifications rlFiniteSetSpec object | rlNumericSpec object

SampleTime — Sample time of policy positive scalar | -1 (default)

Object Functions

Examples

Create Stochastic Actor Policy from Discrete Categorical Actor

Version History

See Also

Functions

Objects

Blocks

Topics

`Actor` — Actor
`rlDiscreteCategoricalActor` object | `rlContinuousGaussianActor` object

`UseMaxLikelihoodAction` — Option to enable maximum likelihood action
`false` (default) | `true`

`ObservationInfo` — Observation specifications
`rlFiniteSetSpec` object | `rlNumericSpec` object | array

`ActionInfo` — Action specifications
`rlFiniteSetSpec` object | `rlNumericSpec` object

`SampleTime` — Sample time of policy
positive scalar | `-1` (default)