Main Content

rlBehaviorCloningRegularizerOptions

Regularizer options object to train DDPG, TD3 and SAC agents

Since R2023a

    Description

    Use an rlBehaviorCloningRegularizerOptions object to specify behavioral cloning regularizer options to train a DDPG, TD3, or SAC agent. The only option you can specify is the regularizer weight, which balances the actor loss with the behavioral cloning penalty and is mostly useful to train agents offline (specifically to deal with possible differences between the probability distribution of the dataset and the one generated by the environment). To enable the behavioral cloning regularizer when training an agent, set the BatchDataRegularizerOptions property of the agent options object to an rlBehaviorCloningRegularizerOptions object that has your preferred regularizer weight.

    Creation

    Description

    bcOpts = rlBehaviorCloningRegularizerOptions returns a default behavioral cloning regularizer options set.

    example

    bcOpts = rlBehaviorCloningRegularizerOptions(Name=Value) creates the behavioral cloning regularizer option set bcOpts and sets its properties using one or more name-value arguments.

    Properties

    expand all

    Behavioral cloning regularizer weight, specified as a positive scalar. This weight controls the trade-off between the actor loss and the behavioral cloning penalty.

    Specifically, the behavioral cloning regularizer k2(π(si)-ai)2 is added to the actor loss Lactor, where ai is an action from the minibatch (which stores N experiences) and π(si) is an action from the current actor given the observation si (also taken from the minibatch). The actor is therefore updated to minimize the loss function, L'actor:

    L'actor=1Ni=1N(λLactor(si,π(si))+k2(π(si)ai)2)

    Here the normalization term λ depends on the behavioral cloning weight Wbc, which regulates the importance of the standard Lactor factor:

    λ=Wbc1Ni=1N|Q(si,ai)|

    The scaling factor k scales the regularization term to the appropriate action range:

    k=2AmxAmn

    Here Amx and Amn are the upper and lower limits of the action range. These limits are taken from the action specifications (or are otherwise estimated if unavailable).

    To set Wbc, assign a value to the BehaviorCloningRegularizerWeight property of the rlBehaviorCloningRegularizerOptions object. For more information, see [1].

    Example: BehaviorCloningRegularizerWeight=5

    Object Functions

    Examples

    collapse all

    Create an rlBehaviorCloningRegularizerOptions object specifying the BehaviorCloningRegularizerWeight.

    opt = rlBehaviorCloningRegularizerOptions( ...
        BehaviorCloningRegularizerWeight=5)
    opt = 
      rlBehaviorCloningRegularizerOptions with properties:
    
        BehaviorCloningRegularizerWeight: 5
    
    

    You can modify options using dot notation. For example, set BehaviorCloningRegularizerWeight to 3.

    opt.BehaviorCloningRegularizerWeight = 3;

    To specify this behavioral cloning option set for an agent, first create the agent options object. For this example, create a default rlTD3AgentOptions object for a TD3 agent.

    agentOpts = rlTD3AgentOptions;

    Then, assign the rlBehaviorCloningRegularizerOptions object to the BatchDataRegularizerOptions property.

    agentOpts.BatchDataRegularizerOptions  = opt;

    When you create the agent, use agentOpts as the last input argument for the agent constructor function rlTD3Agent.

    References

    [1] Fujimoto, Scott, and Shixiang Shane Gu. "A minimalist approach to offline reinforcement learning." Advances in Neural Information Processing Systems 34 (2021): 20132-20145.

    Version History

    Introduced in R2023a