PPOAgentの実装について

3 views (last 30 days)
shoki kobayashi
shoki kobayashi on 11 Sep 2020
Commented: shoki kobayashi on 13 Sep 2020
現在、simlinkの自作環境をPPOAgentを用いて制御しようとしています。
しかし、下のようなエラーが発生し上手くいかない状況が続いています。
どのように改善すればよろしいでしょうか。
エラー: rl.representation.rlStochasticActorRepresentation (line 32)
Number of outputs for a continuous stochastic actor representation must be two times the number of actions.
エラー: rlStochasticActorRepresentation (line 139)
Rep = rl.representation.rlStochasticActorRepresentation(...
自分のコード
clear all
motion_time_constant = 0.01;
mdl = 'fivelinkrl';
open_system(mdl)
Ts = 0.05;
Tf = 20;
mdl = 'fivelinkrl';
open_system(mdl)
agentblk = [mdl '/RL Agent'];
numObs = 15;
obsInfo = rlNumericSpec([numObs 1]);
obsInfo.Name = 'observations';
numAct = 5;
actInfo = rlNumericSpec([numAct 1],'LowerLimit',-10,'UpperLimit',10);
actInfo.Name = 'Action';
% define environment
env = rlSimulinkEnv(mdl,agentblk,obsInfo,actInfo);
%createPPOAgent
criticLayerSizes = [400 300];
actorLayerSizes = [400 300];
createNetworkWeights;
criticNetwork = [imageInputLayer([numObs 1 1],'Normalization','none','Name','observations')
fullyConnectedLayer(criticLayerSizes(1),'Name','CriticFC1', ...
'Weights',weights.criticFC1, ...
'Bias',bias.criticFC1)
reluLayer('Name','CriticRelu1')
fullyConnectedLayer(criticLayerSizes(2),'Name','CriticFC2', ...
'Weights',weights.criticFC2, ...
'Bias',bias.criticFC2)
reluLayer('Name','CriticRelu2')
fullyConnectedLayer(1,'Name','CriticOutput',...
'Weights',weights.criticOut,...
'Bias',bias.criticOut)];
criticOpts = rlRepresentationOptions('LearnRate',1e-3);
critic = rlValueRepresentation(criticNetwork,env.getObservationInfo, ...
'Observation',{'observations'},criticOpts);
actorNetwork = [imageInputLayer([numObs 1 1],'Normalization','none','Name','observations')
fullyConnectedLayer(actorLayerSizes(1),'Name','ActorFC1',...
'Weights',weights.actorFC1,...
'Bias',bias.actorFC1)
reluLayer('Name','ActorRelu1')
fullyConnectedLayer(actorLayerSizes(2),'Name','ActorFC2',...
'Weights',weights.actorFC2,...
'Bias',bias.actorFC2)
reluLayer('Name','ActorRelu2')
fullyConnectedLayer(numAct,'Name','Action',...
'Weights',weights.actorOut,...
'Bias',bias.actorOut)
softmaxLayer('Name','actionProbability')
];
actorOptions = rlRepresentationOptions('LearnRate',1e-3);
%%%% ↓error %%%%%%%%%%%%%%%%%
actor = rlStochasticActorRepresentation(actorNetwork,obsInfo,actInfo,...
'Observation',{'observations'}, actorOptions);
%%%% ↑error %%%%%%%%%%%%%%%%%%
opt = rlPPOAgentOptions('ExperienceHorizon',512,...
'ClipFactor',0.2,...
'EntropyLossWeight',0.02,...
'MiniBatchSize',64,...
'NumEpoch',3,...
'AdvantageEstimateMethod','gae',...
'GAEFactor',0.95,...
'SampleTime',0.05,...
'DiscountFactor',0.9995);
agent = rlPPOAgent(actor,critic,opt);
%TrainAgent
maxEpisodes = 4000;
maxSteps = floor(Tf/Ts);
trainOpts = rlTrainingOptions(...
'MaxEpisodes',maxEpisodes,...
'MaxStepsPerEpisode',maxSteps,...
'ScoreAveragingWindowLength',250,...
'Verbose',false,...
'Plots','training-progress',...
'StopTrainingCriteria','EpisodeCount',...
'StopTrainingValue',maxEpisodes,...
'SaveAgentCriteria','EpisodeCount',...
'SaveAgentValue',maxEpisodes);
trainingStats = train(agent,env,trainOpts);
save('agent.mat', 'agent')
Result in simulation
simOptions = rlSimulationOptions('MaxSteps',maxSteps);
experience = sim(env,agent,simOptions);

Answers (1)

Toshinobu Shintai
Toshinobu Shintai on 11 Sep 2020
PPOエージェントのactionは離散でなければなりませんので、actionInfoは、例えば以下のように定義します。
actInfo = rlFiniteSetSpec({[-1, -1, -1], [1, 1, 1]});
上記の場合は、numActは2となります。numActにはアクションのパターン数を入力します。
制御器の出力としての次元数は、上記の[-1, -1, -1]のベクトルの次元数として指定します。このとき、PPOエージェントは [-1, -1, -1] か [1, 1, 1] を、その時のobservationに応じて選択して出力します。
修正したコードを添付しましたのでご確認ください。
  1 Comment
shoki kobayashi
shoki kobayashi on 13 Sep 2020
返信ありがとうございます。
添付したコードを用いると上手くいきました。
本当に助かりました。
1つ質問なのですが、PPOエージェントのactionを連続系で用いる方法はありますか?

Sign in to comment.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!