Confusion in agent and trainFromData options when using RNN/LSTM

2 views (last 30 days)
My dataset contains numTraj trajectories, each containing numSteps time-steps. I filled the experience buffer with my data in a similar manner as follows for off-policy training. This makes "IsDone" 1 on the final time-step of every trajectory.
numStates = 5;
numActions = 1;
numSteps = 600;
numTraj = 100;
obsInfo = rlNumericSpec([numStates 1]);
actInfo = rlNumericSpec([numActions 1]);
buffer = rlReplayMemory(obsInfo,actInfo,numTraj*numSteps);
expBatch = struct;
for j = 1:numTraj % Generate random training data
for i = 1:numSteps
n = (j-1)*numSteps + i;
expBatch(n).Observation = {rand(numStates, 1)};
expBatch(n).Action = {rand(numActions, 1)};
expBatch(n).Reward = rand(1, 1);
expBatch(n).NextObservation = {rand(numStates, 1)};
expBatch(n).IsDone = 0;
end
expBatch(n).IsDone = 1;
end
append(buffer,expBatch);
Since I have a fixed number of trajectories and time-steps per trajectory, how should I be setting the following agent and trainFromData options?
rlSACAgentOptions (or options for other agents that can use RNNs):
  1. SequenceLength: Since all numTraj trajectories have numSteps time-steps, should SequenceLength = numSteps?
  2. MiniBatchSize: From this answer, it seems that MiniBatchSize should be set to numSteps as well. Is this correct?
  3. MaxMiniBatchPerEpoch: If MiniBatchSize = numSteps, then should MaxMiniBatchPerEpoch = numTraj if I want to use the whole dataset for training every epoch?
  1. NumStepsPerEpoch: Is this referring to the number of time-steps that are used for training in an epoch? If so, should this be set to numTraj*numSteps to use the whole dataset every epoch?

Answers (1)

Shivansh
Shivansh on 26 Jun 2024
Hi Kundan!
I think you are setting all the agent and trainFromData options in the right manner with respect to your model.
SequenceLength: You can set "SequenceLength" as "numSteps" since all "numTraj" trajectories have "numSteps" time-steps.
MiniBatchSize: The "MiniBatchSize" should also be set to the number of time-steps within each trajectory as you want the mini-batch to cover an entire trajectory for training.
MaxMiniBatchPerEpoch: Since "MiniBatchSize" is set to "numSteps", you should set this parameter to be "numTraj".
NumStepsPerEpoch: This should be set to the total number of time-steps in your dataset, which is "numTraj * numSteps", to ensure that the whole dataset is used in each epoch.
I hope it helps!

Categories

Find more on Sequence and Numeric Feature Data Workflows in Help Center and File Exchange

Products


Release

R2024a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!