This is machine translation

Translated by Microsoft
Mouseover text to see original. Click the button below to return to the English version of the page.

Note: This page has been translated by MathWorks. Click here to see
To view all translated materials including this page, select Country from the country navigator on the bottom of this page.

Deploy Trained Reinforcement Learning Policies

Once you train a reinforcement learning agent, you can generate code to deploy the optimal policy. You can generate:

  • CUDA® code for deep neural network policies using GPU Coder™

  • C/C++ code for table, deep neural network, or linear basis function policies using MATLAB® Coder™

Note

Generating code for deep neural network policies supports networks with only a single input layer.

For more information on training reinforcement learning agents, see Train Reinforcement Learning Agents.

Create Policy Evaluation Function

To generate code for the trained optimal policy of a reinforcement learning agent, you must first create a policy evaluation function from the agent. You can generate a policy function for an agent with any type of policy representation object:

  • Value and Q tables (rlTableRepresentation)

  • Deep neural networks (rlLayerRepresentation)

  • Linear basis functions (rlLinearBasisRepresentation)

For more information on the different types of policies, see Create Policy and Value Function Representations.

To create a policy evaluation function that selects an action based on a given observation, use the generatePolicyFunction command. This command generates a MATLAB script, which contains the policy evaluation function, and a MAT-file, which contains the optimal policy data.

You can generate code to deploy this policy function using GPU Coder or MATLAB Coder.

Generate Code Using GPU Coder

If your trained optimal policy uses a deep neural network, you can generate CUDA code for the policy using GPU Coder. There are several required and recommended prerequisite products for generating CUDA code for deep neural networks. For more information, see Installing Prerequisite Products (GPU Coder) and Setting Up the Prerequisite Products (GPU Coder).

Not all deep neural network layers support GPU code generation. For a list of supported layers, see Supported Networks and Layers (GPU Coder). For more information and examples on GPU code generation, see Deep Learning with GPU Coder (GPU Coder).

Generate CUDA Code for Deep Neural Network Policy

As an example, generate GPU code for the policy gradient agent trained in Train PG Agent to Balance Cart-Pole System.

Load the trained agent.

load('MATLABCartpolePG.mat','agent')

Create a policy evaluation function for this agent.

generatePolicyFunction(agent)

This command creates the evaluatePolicy.m file, which contains the policy function, and the agentData.mat file, which contains the trained deep neural network actor. For a given observation, the policy function evaluates a probability for each potential action using the actor network. Then, the policy function randomly selects an action based on these probabilities.

Since the actor network for this PG agent has a single input layer and single output layer, you can generate code for this network using GPU Coder. For example, you can generate a CUDA-compatible MEX function.

Configure the codegen function to create a CUDA-compatible C++ MEX function

cfg = coder.gpuConfig('mex');
cfg.TargetLang = 'C++';
cfg.DeepLearningConfig = coder.DeepLearningConfig('cudnn');

Set the dimensions of the policy evaluation input argument, which corresponds to the observation specification dimensions for the agent. To find the observation dimensions, use the getObservationInfo function. In this case, the observations are in a four-element vector.

argstr = '{ones(4,1)}';

Generate code using the codegen function.

codegen('-config','cfg','evaluatePolicy','-args',argstr,'-report');

This command generates the MEX function evaluatePolicy_mex.

Generate Code Using MATLAB Coder

You can generate C/C++ code for table, deep neural network, or linear basis function policies using MATLAB Coder.

Using MATLAB Coder, you can generate:

  • C/C++ code for policies that use Q tables, value tables, or linear basis functions. For more information on general C/C++ code generation, see Generating Code (MATLAB Coder).

  • C++ code for policies that use deep neural networks. For more information, see Deep Learning with MATLAB Coder (MATLAB Coder).

Generate C Code for Q Table Policy

As an example, generate C code for the Q-learning agent trained in Train Reinforcement Learning Agent in Basic Grid World.

Load the trained agent.

load('basicGWQAgent.mat','qAgent')

Create a policy evaluation function for this agent.

generatePolicyFunction(qAgent)

This command creates the evaluatePolicy.m file, which contains the policy function, and the agentData.mat file, which contains the trained Q table value function. For a given observation, the policy function looks up the value function for each potential action using the Q table. Then, the policy function selects the action for which the value function is greatest.

Set the dimensions of the policy evaluation input argument, which corresponds to the observation specification dimensions for the agent. To find the observation dimensions, use the getObservationInfo function. In this case, there is a single finite observation.

argstr = '{[1]}';

Configure the codegen function to generate embeddable C code suitable for targeting a static library, and set the output folder to buildFolder.

cfg = coder.config('lib');
outFolder = 'buildFolder';

Generate C code using the codegen function.

codegen('-c','-d',outFolder,'-config','cfg',...
    'evaluatePolicy','-args',argstr,'-report');

See Also

Related Topics