<Deep reinforcement learning____PPO>How do you fix this error?Ask for help

1 view (last 30 days)
Hi everyone,
I am using PPO algorithm to train the agent in the custom environment, but there is an error.
I think it may be related to obsInfo, but I don't know how to solve this error? Below is my code and error log.
Please help the weak and helpless me, very grateful.
slx = 'RLcontrolstrategy0312';
open_system(slx);
agentblk = slx +"/agent";
%obsinfo actinfo
%Is that the problem?
obsInfo=rlNumericSpec([49,1], ...
'LowerLimit',0, ...
'UpperLimit',1);
actInfo = rlNumericSpec([6,1], 'LowerLimit',[0 0 0 -1 -1 -1]','UpperLimit',[1 1 1 1 1 1]');
scale = [0.5 0.5 0.5 1 1 1]';
bias = [0.5 0.5 0.5 0 0 0]';
env = rlSimulinkEnv(slx,agentblk,obsInfo,actInfo);
Ts = 0.001;
Tf = 4;
rng(0)
%critic
cnet = [
featureInputLayer(9,"Normalization","none","Name","observation1")
fullyConnectedLayer(256,"Name","fc1")
concatenationLayer(1,3,"Name","concat")
tanhLayer("Name","tanh1")
fullyConnectedLayer(256,"Name","fc2")
tanhLayer("Name","tanh2")
fullyConnectedLayer(128,"Name","fc3")
tanhLayer("Name","tanh3")
fullyConnectedLayer(64,"Name","fc4")
tanhLayer("Name","tanh4")
fullyConnectedLayer(32,"Name","fc5")
tanhLayer("Name","tanh5")
fullyConnectedLayer(1,"Name","CriticOutput")];
cnetMCT=[
featureInputLayer(20,"Normalization","none","Name","observation2")
fullyConnectedLayer(256,"Name","fc11")
tanhLayer("Name","tanh13")
fullyConnectedLayer(64,"Name","fc14")
tanhLayer("Name","tanh14")
fullyConnectedLayer(32,"Name","fc15")];
cnetMCR=[
featureInputLayer(20,"Normalization","none","Name","observation3")
fullyConnectedLayer(256,"Name","fc21")
tanhLayer("Name","tanh23")
fullyConnectedLayer(64,"Name","fc24")
tanhLayer("Name","tanh24")
fullyConnectedLayer(32,"Name","fc25")];
criticNetwork = layerGraph(cnet);
criticNetwork = addLayers(criticNetwork, cnetMCT);
criticNetwork = connectLayers(criticNetwork,"fc15","concat/in2");
criticNetwork = addLayers(criticNetwork, cnetMCR);
criticNetwork = connectLayers(criticNetwork,"fc25","concat/in3");
criticdlnet = dlnetwork(criticNetwork,'Initialize',false);
criticdlnet1 = initialize(criticdlnet);
%Is that the problem?
critic= rlValueFunction(criticdlnet1,obsInfo, ...
ObservationInputNames=["observation1","observation2","observation3"]);
%actor
anet = [
featureInputLayer(9,"Normalization","none","Name","ain1")
fullyConnectedLayer(256,"Name","fc1")
concatenationLayer(1,3,"Name","concat")
tanhLayer("Name","tanh1")
fullyConnectedLayer(256,"Name","fc2")
tanhLayer("Name","tanh2")
fullyConnectedLayer(128,"Name","fc3")
tanhLayer("Name","tanh3")
fullyConnectedLayer(64,"Name","fc4")
tanhLayer("Name","tanh4")];
anetMCT=[
featureInputLayer(20,"Normalization","none","Name","ain2")
fullyConnectedLayer(256,"Name","fc11")
tanhLayer("Name","tanh13")
fullyConnectedLayer(64,"Name","fc14")
tanhLayer("Name","tanh14")
fullyConnectedLayer(32,"Name","fc15")];
anetMCR=[
featureInputLayer(20,"Normalization","none","Name","ain3")
fullyConnectedLayer(256,"Name","fc21")
tanhLayer("Name","tanh23")
fullyConnectedLayer(64,"Name","fc24")
tanhLayer("Name","tanh24")
fullyConnectedLayer(32,"Name","fc25")];
meanPath = [
fullyConnectedLayer(32,"Name","meanFC")
tanhLayer("Name","tanh5")
fullyConnectedLayer(numAct,"Name","mean")
tanhLayer("Name","tanh6")
scalingLayer(Name="meanPathOut",Scale=scale,Bias=bias)];
stdPath = [
fullyConnectedLayer(32,"Name","stdFC")
tanhLayer("Name","tanh7")
fullyConnectedLayer(numAct,"Name","fc5")
softplusLayer("Name","std")];
actorNetwork = layerGraph(anet);
actorNetwork = addLayers(actorNetwork,anetMCT);
actorNetwork = addLayers(actorNetwork,anetMCR);
actorNetwork = connectLayers(actorNetwork,"fc15","concat/in2");
actorNetwork = connectLayers(actorNetwork,"fc25","concat/in3");
actorNetwork = addLayers(actorNetwork,meanPath);
actorNetwork = addLayers(actorNetwork,stdPath);
actorNetwork = connectLayers(actorNetwork,"tanh4","meanFC/in");
actorNetwork = connectLayers(actorNetwork,"tanh4","stdFC/in");
actordlnet = dlnetwork(actorNetwork);
%Is that the problem?
actor = rlContinuousGaussianActor(actordlnet,obsInfo,actInfo, ...
"ActionMeanOutputNames","meanPathOut", ...
"ActionStandardDeviationOutputNames","std", ...
ObservationInputNames= ["ain1","ain2","ain3"]);
%agent
agentOptions=rlPPOAgentOptions("SampleTime",Ts,"DiscountFactor",0.995,"ExperienceHorizon",1024,"MiniBatchSize",512,"ClipFactor",0.2, ...
"EntropyLossWeight",0.01,"NumEpoch",8,"AdvantageEstimateMethod","gae","GAEFactor",0.98, ...
"NormalizedAdvantageMethod","current");
agent=rlPPOAgent(actor,critic,agentOptions);
%training
trainOptions=rlTrainingOptions("StopOnError","on", "MaxEpisodes",2000,"MaxStepsPerEpisode",floor(Tf/Ts), ...
"ScoreAveragingWindowLength",10,"StopTrainingCriteria","AverageReward", ...
"StopTrainingValue",100000,"SaveAgentCriteria","None", ...
"SaveAgentDirectory","D:\car\jianmo\zhangxiang\agent","Verbose",false, ...
"Plots","training-progress");
trainingStats = train(agent,env,trainOptions);
The debug logs are as follows
Incorrect use of rl.internal.validate.mapFunctionObservationInput
Number of input layers for deep neural network must equal to number of observation specifications.
error rlValueFunction(Line 92)
modelInputMap = rl.internal.validate.mapFunctionObservationInput(model,observationInfo,nameValueArgs.ObservationInputNames);
error ppo(Line 187)
critic= rlValueFunction(criticdlnet1,obsInfo, ...

Answers (1)

Ronit
Ronit on 27 Mar 2024
Hi,
Based on the error log you've provided, the issue seems to be with the number of observation inputs expected by your neural network model and the number of observation specifications you've defined. This error is thrown by the rlValueFunction when initializing the critic, indicating that the critic's network does not match the observation information obsInfo you've specified.
You have defined ‘obsinfo’ as a single object and while initializing the critic with rlValueFunction, you have specified three observation input names:
critic= rlValueFunction(criticdlnet1,obsInfo, ...
ObservationInputNames=["observation1","observation2","observation3"]);
This discrepancy between the number of obsInfo objects (1) and the number of observation input names (3) is causing of the error.
To resolve this issue, ensure that the number of obsInfo objects matches the number of observation input names you've specified for your network. If your environment produces three distinct observations, you should define an obsInfo object for each and pass them as a vector to the rlValueFunction.
For more information regarding ‘rlValueFunction function, please refer to this documentation - https://www.mathworks.com/help/reinforcement-learning/ref/rl.function.rlvaluefunction.html#responsive_offcanvas
Hope this helps!

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!