This is machine translation

Translated by Microsoft
Mouseover text to see original. Click the button below to return to the English version of the page.

Note: This page has been translated by MathWorks. Click here to see
To view all translated materials including this page, select Country from the country navigator on the bottom of this page.

Create MATLAB Environments for Reinforcement Learning

In a reinforcement learning scenario, where you are training an agent to complete task, the environment models the dynamics with which the agent interacts. As shown in the following figure, the environment:

  1. Receives actions from the agent

  2. Outputs observations in response to the actions

  3. Generates a reward measuring how well the action contributes to achieving the task

Creating an environment model includes defining the following:

  • Action and observation signals that the agent uses to interact with the environment.

  • Reward signal that the agent uses to measure its success. For more information, see Define Reward Signals.

  • Environment dynamic behavior.

Action and Observation Signals

When you create an environment object, you must specify the action and observation signals that the agent uses to interact with the environment. You can create both discrete and continuous action spaces. For more information, see rlNumericSpec and rlFiniteSetSpec, respectively.

What signals you select as actions and observations depends on your application. For example, for control system applications, the integrals (and sometimes derivatives) of error signals are often useful observations. Also, for reference-tracking applications, having a time-varying reference signal as an observation is helpful.

When you define your observation signals, ensure that all the system states are observable through the observations. For example, an image observation of a swinging pendulum has position information but does not have enough information to determine the pendulum velocity. In this case, you can specify the pendulum velocity as a separate observation.

Predefined MATLAB Environments

Reinforcement Learning Toolbox™ software provides predefined MATLAB® environments for which the actions, observations, rewards, and dynamics are already defined. You can use these environments to:

  • Learn reinforcement learning concepts

  • Gain familiarity with Reinforcement Learning Toolbox software features

  • Test your own reinforcement learning agents

For more information, see Load Predefined Grid World Environments and Load Predefined Control System Environments.

Custom MATLAB Environments

You can create the following types of custom MATLAB environments for your own applications:

  • Grid worlds with specified size, rewards, and obstacles

  • Environments with dynamics specified using custom functions

  • Environment specified by creating and modifying a template environment object

Once you create a custom environment object, you can train an agent in the same manner as in a predefined environment. For more information on training agents, see Train Reinforcement Learning Agents.

Custom Grid Worlds

You can create custom grid worlds of any size with your own custom reward, state transition, and obstacle configurations. To create a custom grid world environment:

  1. Create a grid world model using the createGridWorld function. For example, create a grid world with ten rows and nine columns.

    gw = createGridWorld(10,9);
  2. Configure the grid world by modifying the properties of the model. For example, specify the terminal state as location [7,9].

    gw.TerminalStates = "[7,9]";
  3. Create an MDP environment for this grid world, which the agent uses to interact with the grid world model.

    env = rlMDPEnv(gw);

Specify Custom Functions

For simple environments, you can define a custom environment object by creating an rlFunctionEnv object, specifying your own custom reset and step functions.

  • At the beginning of each training episode, the agent prepares the environment for training by setting the initial conditions using the reset function. For example, you can specify known initial state values or place the environment into a random initial state.

  • The step function defines the dynamics of the environment; that is, how the state changes in response to agent actions. At each training time step, the state of the model is updated using the step function.

For more information, see Create MATLAB Environment using Custom Functions.

Create and Modify Template Environment

For more complex environments, you can define a custom environment by creating and modifying a template environment. To create a custom environment:

  1. Create an environment template class using the rlCreateEnvTemplate function.

  2. Modify the template environment, specifying environment properties, required environment functions, and optional environment functions.

  3. Validate your custom environment using validateEnvironment.

For more information, see Create Custom MATLAB Environment from Template.

See Also

| |

Related Topics