This is machine translation

Translated by Microsoft
Mouseover text to see original. Click the button below to return to the English version of the page.

Note: This page has been translated by MathWorks. Click here to see
To view all translated materials including this page, select Country from the country navigator on the bottom of this page.


Simulate a trained reinforcement learning agent within a specified environment


experience = sim(env,agent,simOpts)
experience = sim(agent,env,simOpts)



experience = sim(env,agent,simOpts) simulates a reinforcement learning environment against an agent configured for that environment..

experience = sim(agent,env,simOpts) performs the same simulation as the previous syntax.


collapse all

Simulate a reinforcement learning environment with an agent configured for that environment. For this example, load an environment and agent that are already configured. The environment is a discrete cart-pole environment created with rlPredefinedEnv. The agent is a Policy Gradient (rlPGAgent) agent. For more information about the environment and agent used in this example, see Train PG Agent to Balance Cart-Pole System.

rng(0); % for reproducibility
load RLSimExample.mat
env = 
  CartPoleDiscreteAction with properties:

                  Gravity: 9.8000
                 MassCart: 1
                 MassPole: 0.1000
                   Length: 0.5000
                 MaxForce: 10
                       Ts: 0.0200
    ThetaThresholdRadians: 0.2094
               XThreshold: 2.4000
      RewardForNotFalling: 1
        PenaltyForFalling: -5
                    State: [4×1 double]

agent = 
  rlPGAgent with properties:

    AgentOptions: [1×1 rl.option.rlPGAgentOptions]

Typically, you train the agent using train and simulate the environment to test the performance of the trained agent. For this example, simulate the environment using the agent you loaded. Configure simulations options specifying that the simulation run for 100 steps..

simOpts = rlSimulationOptions('MaxSteps',100);

For the predefined cart-pole environment used in this example. you can use plot to generate a visualization of the cart-pole system. When you simulate the environment, this plot updates automatically so that you can watch the system evolve during the simulation.


Simulate the environment.

experience = sim(env,agent,simOpts)

experience = struct with fields:
       Observation: [1×1 struct]
            Action: [1×1 struct]
            Reward: [1×1 timeseries]
            IsDone: [1×1 timeseries]
    SimulationInfo: [1×1 struct]

The output structure experience records the observations collected from the environment, the action and reward, and other data collected during the simulation. Each field contains is a timeseries or a structure of timeseries data. For instance, experience.Action is a timeseries containing the action imposed on the cart-pole system by the agent at each step of the simulation.

ans = struct with fields:
    CartPoleAction: [1×1 timeseries]

Input Arguments

collapse all

Environment in which the agent acts, specified as a reinforcement learning environment object, such as:

For more information about creating and configuring environments, see:

When env is a Simulink environment, calling sim compiles and simulates the model associated with the environment.

Agent to train, specified as a reinforcement learning agent object, such as an rlACAgent or rlDDPGAgent object, or a custom agent. Before simulation, you must configure the actor and critic representations of the agent. For more information about how to create and configure agents for reinforcement learning, see Reinforcement Learning Agents.

Simulation options, specified as an rlSimulationOptions object. Use this argument to specify such parameters and options as:

  • Number of steps per simulation

  • Number of simulations to run

For details, see rlSimulationOptions.

Output Arguments

collapse all

Simulation results, returned as a structure or structure array. The number f elements in the array is equal to the number of simulations specified by the NumSimulations option of rlSimulationOptions The fields of the experience structure are as follows.

Observations collected from the environment, returned as a structure with fields corresponding to the observations specified in the environment. Each field contains a timeseries of length N + 1, where N is the number of simulation steps.

To obtain the current observation and the next observation for a given simulation step, use code such as the following, assuming one of the fields of Observation is obs1.

Obs = getSamples(experience.Observation.obs1,1:N);
NextObs = getSamples(experience.Observation.obs1,2:N+1);
These values can be useful if you are writing your own training algorithm using sim to generate experiences for training.

Actions computed by the agent, returned as a structure with fields corresponding to the action signals specified in the environment. Each field contains a timeseries of length N, where N is the number of simulation steps.

Reward at each step in the simulation, returned as a timeseries of length N, where N is the number of simulation steps.

Flag indicating termination of episode, returned as a timeseries of a scalar logical signal. This flag is set at each step by the environment, according to conditions you specify for episode termination when you configure the environment. When the environment sets this flag to 1, simulation terminates.

Information collected during simulation, returned as:

  • For MATLAB environments, a structure containing the field SimulationError. This structure contains any errors that occurred during simulation.

  • For Simulink environments, a Simulink.SimulationOutput object containing simulation data. Recorded data includes any signals and states that the model is configured to log, simulation metadata, and any errors that occurred.

Introduced in R2019a