<Deep reinforcement learning____PPO>How do you fix this error?Ask for help
1 view (last 30 days)
Show older comments
Hi everyone,
I am using PPO algorithm to train the agent in the custom environment, but there is an error.
I think it may be related to obsInfo, but I don't know how to solve this error? Below is my code and error log.
Please help the weak and helpless me, very grateful.
slx = 'RLcontrolstrategy0312';
open_system(slx);
agentblk = slx +"/agent";
%obsinfo actinfo
%Is that the problem?
obsInfo=rlNumericSpec([49,1], ...
'LowerLimit',0, ...
'UpperLimit',1);
actInfo = rlNumericSpec([6,1], 'LowerLimit',[0 0 0 -1 -1 -1]','UpperLimit',[1 1 1 1 1 1]');
scale = [0.5 0.5 0.5 1 1 1]';
bias = [0.5 0.5 0.5 0 0 0]';
env = rlSimulinkEnv(slx,agentblk,obsInfo,actInfo);
Ts = 0.001;
Tf = 4;
rng(0)
%critic
cnet = [
featureInputLayer(9,"Normalization","none","Name","observation1")
fullyConnectedLayer(256,"Name","fc1")
concatenationLayer(1,3,"Name","concat")
tanhLayer("Name","tanh1")
fullyConnectedLayer(256,"Name","fc2")
tanhLayer("Name","tanh2")
fullyConnectedLayer(128,"Name","fc3")
tanhLayer("Name","tanh3")
fullyConnectedLayer(64,"Name","fc4")
tanhLayer("Name","tanh4")
fullyConnectedLayer(32,"Name","fc5")
tanhLayer("Name","tanh5")
fullyConnectedLayer(1,"Name","CriticOutput")];
cnetMCT=[
featureInputLayer(20,"Normalization","none","Name","observation2")
fullyConnectedLayer(256,"Name","fc11")
tanhLayer("Name","tanh13")
fullyConnectedLayer(64,"Name","fc14")
tanhLayer("Name","tanh14")
fullyConnectedLayer(32,"Name","fc15")];
cnetMCR=[
featureInputLayer(20,"Normalization","none","Name","observation3")
fullyConnectedLayer(256,"Name","fc21")
tanhLayer("Name","tanh23")
fullyConnectedLayer(64,"Name","fc24")
tanhLayer("Name","tanh24")
fullyConnectedLayer(32,"Name","fc25")];
criticNetwork = layerGraph(cnet);
criticNetwork = addLayers(criticNetwork, cnetMCT);
criticNetwork = connectLayers(criticNetwork,"fc15","concat/in2");
criticNetwork = addLayers(criticNetwork, cnetMCR);
criticNetwork = connectLayers(criticNetwork,"fc25","concat/in3");
criticdlnet = dlnetwork(criticNetwork,'Initialize',false);
criticdlnet1 = initialize(criticdlnet);
%Is that the problem?
critic= rlValueFunction(criticdlnet1,obsInfo, ...
ObservationInputNames=["observation1","observation2","observation3"]);
%actor
anet = [
featureInputLayer(9,"Normalization","none","Name","ain1")
fullyConnectedLayer(256,"Name","fc1")
concatenationLayer(1,3,"Name","concat")
tanhLayer("Name","tanh1")
fullyConnectedLayer(256,"Name","fc2")
tanhLayer("Name","tanh2")
fullyConnectedLayer(128,"Name","fc3")
tanhLayer("Name","tanh3")
fullyConnectedLayer(64,"Name","fc4")
tanhLayer("Name","tanh4")];
anetMCT=[
featureInputLayer(20,"Normalization","none","Name","ain2")
fullyConnectedLayer(256,"Name","fc11")
tanhLayer("Name","tanh13")
fullyConnectedLayer(64,"Name","fc14")
tanhLayer("Name","tanh14")
fullyConnectedLayer(32,"Name","fc15")];
anetMCR=[
featureInputLayer(20,"Normalization","none","Name","ain3")
fullyConnectedLayer(256,"Name","fc21")
tanhLayer("Name","tanh23")
fullyConnectedLayer(64,"Name","fc24")
tanhLayer("Name","tanh24")
fullyConnectedLayer(32,"Name","fc25")];
meanPath = [
fullyConnectedLayer(32,"Name","meanFC")
tanhLayer("Name","tanh5")
fullyConnectedLayer(numAct,"Name","mean")
tanhLayer("Name","tanh6")
scalingLayer(Name="meanPathOut",Scale=scale,Bias=bias)];
stdPath = [
fullyConnectedLayer(32,"Name","stdFC")
tanhLayer("Name","tanh7")
fullyConnectedLayer(numAct,"Name","fc5")
softplusLayer("Name","std")];
actorNetwork = layerGraph(anet);
actorNetwork = addLayers(actorNetwork,anetMCT);
actorNetwork = addLayers(actorNetwork,anetMCR);
actorNetwork = connectLayers(actorNetwork,"fc15","concat/in2");
actorNetwork = connectLayers(actorNetwork,"fc25","concat/in3");
actorNetwork = addLayers(actorNetwork,meanPath);
actorNetwork = addLayers(actorNetwork,stdPath);
actorNetwork = connectLayers(actorNetwork,"tanh4","meanFC/in");
actorNetwork = connectLayers(actorNetwork,"tanh4","stdFC/in");
actordlnet = dlnetwork(actorNetwork);
%Is that the problem?
actor = rlContinuousGaussianActor(actordlnet,obsInfo,actInfo, ...
"ActionMeanOutputNames","meanPathOut", ...
"ActionStandardDeviationOutputNames","std", ...
ObservationInputNames= ["ain1","ain2","ain3"]);
%agent
agentOptions=rlPPOAgentOptions("SampleTime",Ts,"DiscountFactor",0.995,"ExperienceHorizon",1024,"MiniBatchSize",512,"ClipFactor",0.2, ...
"EntropyLossWeight",0.01,"NumEpoch",8,"AdvantageEstimateMethod","gae","GAEFactor",0.98, ...
"NormalizedAdvantageMethod","current");
agent=rlPPOAgent(actor,critic,agentOptions);
%training
trainOptions=rlTrainingOptions("StopOnError","on", "MaxEpisodes",2000,"MaxStepsPerEpisode",floor(Tf/Ts), ...
"ScoreAveragingWindowLength",10,"StopTrainingCriteria","AverageReward", ...
"StopTrainingValue",100000,"SaveAgentCriteria","None", ...
"SaveAgentDirectory","D:\car\jianmo\zhangxiang\agent","Verbose",false, ...
"Plots","training-progress");
trainingStats = train(agent,env,trainOptions);
The debug logs are as follows
Incorrect use of rl.internal.validate.mapFunctionObservationInput
Number of input layers for deep neural network must equal to number of observation specifications.
error rlValueFunction(Line 92)
modelInputMap = rl.internal.validate.mapFunctionObservationInput(model,observationInfo,nameValueArgs.ObservationInputNames);
error ppo(Line 187)
critic= rlValueFunction(criticdlnet1,obsInfo, ...
0 Comments
Answers (1)
Ronit
on 27 Mar 2024
Hi,
Based on the error log you've provided, the issue seems to be with the number of observation inputs expected by your neural network model and the number of observation specifications you've defined. This error is thrown by the ‘rlValueFunction’ when initializing the critic, indicating that the critic's network does not match the observation information ‘obsInfo’ you've specified.
You have defined ‘obsinfo’ as a single object and while initializing the critic with ‘rlValueFunction’, you have specified three observation input names:
critic= rlValueFunction(criticdlnet1,obsInfo, ...
ObservationInputNames=["observation1","observation2","observation3"]);
This discrepancy between the number of ‘obsInfo’ objects (1) and the number of observation input names (3) is causing of the error.
To resolve this issue, ensure that the number of ‘obsInfo’ objects matches the number of observation input names you've specified for your network. If your environment produces three distinct observations, you should define an ‘obsInfo’ object for each and pass them as a vector to the ‘rlValueFunction’.
For more information regarding ‘rlValueFunction’ function, please refer to this documentation - https://www.mathworks.com/help/reinforcement-learning/ref/rl.function.rlvaluefunction.html#responsive_offcanvas
Hope this helps!
0 Comments
See Also
Categories
Find more on Training and Simulation in Help Center and File Exchange
Products
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!