Main Content

rlContinuousDeterministicTransitionFunction

Deterministic transition function approximator object for neural network-based environment

    Description

    When creating a neural network-based environment using rlNeuralNetworkEnvironment, you can specify deterministic transition function approximators using rlContinuousDeterministicTransitionFunction objects.

    A transition function approximator object uses a deep neural network to predict the next observations based on the current observations and actions.

    To specify stochastic transition function approximators, use rlContinuousGaussianTransitionFunction objects.

    Creation

    Description

    example

    tsnFcnAppx = rlContinuousDeterministicTransitionFunction(net,observationInfo,actionInfo,Name=Value) creates a deterministic transition function approximator object using deep neural network net and sets the ObservationInfo and ActionInfo properties.

    When creating a deterministic transition function approximator you must specify the names of the deep neural network inputs and outputs using the ObservationInputNames, ActionInputNames, and NextObservationOutputNames name-value pair arguments.

    You can also specify the PredictDiff and UseDevice properties using optional name-value pair arguments. For example, to use a GPU for prediction, specify UseDevice="gpu".

    Input Arguments

    expand all

    Deep neural network, specified as a dlnetwork object.

    The input layer names for this network must match the input names specified using ObservationInputNames and ActionInputNames. The dimensions of the input layers must match the dimensions of the corresponding observation and action specifications in ObservationInfo and ActionInfo, respectively.

    The output layer names for this network must match the output names specified using NextObservationOutputNames. The dimensions of the input layers must match the dimensions of the corresponding observation specifications in ObservationInfo.

    Name-Value Arguments

    Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter.

    Example: ObservationInputNames="velocity"

    Observation input layer names, specified as a string or string array.

    The number of observation input names must match the length of ObservationInfo and the order of the names must match the order of the specifications in ObservationInfo.

    Action input layer names, specified as a string or string array.

    The number of action input names must match the length of ActionInfo and the order of the names must match the order of the specifications in ActionInfo.

    Next observation output layer names, specified as a string or string array.

    The number of next observation output names must match the length of ObservationInfo and the order of the names must match the order of the specifications in ObservationInfo.

    Properties

    expand all

    This property is read-only.

    Observation specifications, specified as a reinforcement learning specification object or an array of specification objects defining properties such as dimensions, data type, and names of the observation signals.

    You can extract the observation specifications from an existing environment or agent using getObservationInfo. You can also construct the specifications manually using rlFiniteSetSpec or rlNumericSpec.

    This property is read-only.

    Action specifications, specified as a reinforcement learning specification object or an array of specification objects defining properties such as dimensions, data type, and names of the action signals.

    You can extract the action specifications from an existing environment or agent using getActionInfo. You can also construct the specification manually using rlFiniteSetSpec or rlNumericSpec.

    Option to predict the difference between the current observation and the next observation, specified as one of the following logical values.

    • false — Select this option if net outputs the value of the next observation.

    • true — Select this option if net outputs the difference between the next observation and the current observation. In this case, the predict function computes the next observation by adding the current observation to the output of net.

    Computation device used to perform operations such as gradient computation, parameter updates, and prediction during training and simulation, specified as either "cpu" or "gpu".

    The "gpu" option requires both Parallel Computing Toolbox™ software and a CUDA®-enabled NVIDIA® GPU. For more information on supported GPUs see GPU Support by Release (Parallel Computing Toolbox).

    You can use gpuDevice (Parallel Computing Toolbox) to query or select a local GPU device to be used with MATLAB®.

    Note

    Training or simulating a network on a GPU involves device-specific numerical round-off errors. These errors can produce different results compared to performing the same operations using a CPU.

    Object Functions

    rlNeuralNetworkEnvironmentEnvironment model with deep neural network transition models

    Examples

    collapse all

    Create an environment interface and extract observation and action specifications. Alternatively, you can create specifications using rlNumericSpec and rlFiniteSetSpec.

    env = rlPredefinedEnv("CartPole-Continuous");
    obsInfo = getObservationInfo(env);
    actInfo = getActionInfo(env);

    Create a deep neural network. The network has two input channels, one for the current observations and one for the current actions. The single output channel is for the predicted next observation.

    statePath = featureInputLayer(obsInfo.Dimension(1),...
        Normalization="none",Name="state");
    actionPath = featureInputLayer(actInfo.Dimension(1),...
        Normalization="none",Name="action");
    commonPath = [concatenationLayer(1,2,Name="concat")
        fullyConnectedLayer(64,Name="FC1")
        reluLayer(Name="CriticRelu1")
        fullyConnectedLayer(64, Name="FC3")
        reluLayer(Name="CriticCommonRelu2")
        fullyConnectedLayer(obsInfo.Dimension(1),Name="nextObservation")];
    
    tsnNet = layerGraph(statePath);
    tsnNet = addLayers(tsnNet,actionPath);
    tsnNet = addLayers(tsnNet,commonPath);
    
    tsnNet = connectLayers(tsnNet,"state","concat/in1");
    tsnNet = connectLayers(tsnNet,"action","concat/in2");
    
    plot(tsnNet)

    Figure contains an axes object. The axes object contains an object of type graphplot.

    Create dlnetwork object.

    tsnNet = dlnetwork(tsnNet);

    Create a deterministic transition function object.

    tsnFcnAppx = rlContinuousDeterministicTransitionFunction(...
        tsnNet,obsInfo,actInfo,...
        ObservationInputNames="state", ...
        ActionInputNames="action", ...
        NextObservationOutputNames="nextObservation");

    Using this transition function object, you can predict the next observation based on the current observation and action. For example, predict the next observation for a random observation and action.

    obs = rand(obsInfo.Dimension);
    act = rand(actInfo.Dimension);
    nextObsP = predict(tsnFcnAppx,{obs},{act})
    nextObsP = 1x1 cell array
        {4x1 single}
    
    
    nextObsP{1}
    ans = 4x1 single column vector
    
       -0.1172
        0.1168
        0.0493
       -0.0155
    
    

    You can also obtain the same result using evaluate.

    nextObsE = evaluate(tsnFcnAppx,{obs,act})
    nextObsE = 1x1 cell array
        {4x1 single}
    
    
    nextObsE{1}
    ans = 4x1 single column vector
    
       -0.1172
        0.1168
        0.0493
       -0.0155
    
    

    Version History

    Introduced in R2022a