我在matlab中使​用强化学习进行控制在​actor的网络最后​一层用的是tanhl​ayer,那么输出的​范围应该在-1到1,​但是输出的大小却不是

6 views (last 30 days)
guiyang
guiyang on 5 Jun 2024
Commented: guiyang on 13 Jun 2024
actorNet = [
featureInputLayer(numObs, Name="StateInLyr")
fullyConnectedLayer(64)
reluLayer
fullyConnectedLayer(32)
reluLayer
fullyConnectedLayer(numAct)
tanhLayer(Name="ActionOutLyr")
];
图片是actoer的输出,一共是6维,每个维度的输出都不在这个范围

Answers (1)

Krishna
Krishna on 6 Jun 2024
Hi Guiyang,
If the output of a tanh layer in your network is not within the expected range of -1 to 1, consider the following points:
  1. Minor deviations from the expected range might be due to floating-point precision limits. These are typically negligible.
  2. Check if there's any scaling or modification applied after the tanh output that might alter its range.
  3. Ensure that the tanh layer is indeed the final layer in your network, with no additional operations post-tanh.
  4. Verify that the method used for logging or visualizing outputs is accurate and not introducing errors or not scaling the tanh output.
Also you can follow these troubleshooting Steps:
  1. Test the tanh function with known inputs to confirm its correct behavior.
  2. Double-check the network architecture for unintended layers or operations after the tanh.
These steps should help identify and resolve the issue with the tanh layer output.
Also please follow this documentation to ask question better and get quick answers,
Hope this helps.
  1 Comment
guiyang
guiyang on 13 Jun 2024
我还是找不到错误的原因,我给出了我的代码,能帮忙检查下吗
clc
clear
%%
%参数设置
dataType = 'double';
%%
%模型参数
Ts=1e-5;
T=0.001;
T1=0;
w=2*pi*50;
Un=1770;
Rn=0.145;
Ln=5.4e-3;
Cd=9e-3;
Udc=3600;
Rd=25;
Kesogi=1;
Kisogi=1;
Kppll=0.7;
Kipll=25;
Kpv=0.5;
Kiv=5;
Kpi=2;
Kii=50;
fp=1:0.01:85;
%%
%%
%创建环境接口
mdl = "deepl_rectifier_model1";
open_system(mdl)
numObs = 10;
obsInfo = rlNumericSpec( ...
[numObs 1], ...
DataType=dataType);
obsInfo.Name = "observations";
obsInfo.Description = "Error and reference signal";
% 创建动作规范
numAct = 6;
actInfo = rlNumericSpec([numAct 1], "DataType", dataType);
actInfo.Name = "vqdRef";
agentblk = "deepl_rectifier_model1/RL Agent";
env = rlSimulinkEnv(mdl, agentblk, obsInfo, actInfo);
actInfo=getActionInfo(env);
env.ResetFcn = @resetReCT;
%%
%建立智能体
% 状态输入路径
statePath = [
featureInputLayer(numObs, Name="StateInLyr")
fullyConnectedLayer(64, Name="fc1")
];
% 动作输入路径
actionPath = [
featureInputLayer(numAct, Name="ActionInLyr")
fullyConnectedLayer(64, Name="fc2")
];
% 通用输出路径
commonPath = [additionLayer(2, Name="add")
reluLayer
fullyConnectedLayer(32)
reluLayer
fullyConnectedLayer(16)
fullyConnectedLayer(1, Name="QValueOutLyr")
];
% 将图层添加到图层图对象
criticNet = layerGraph();
criticNet = addLayers(criticNet, statePath);
criticNet = addLayers(criticNet, actionPath);
criticNet = addLayers(criticNet, commonPath);
% 连接图层
criticNet = connectLayers(criticNet, "fc1", "add/in1");
criticNet = connectLayers(criticNet, "fc2", "add/in2");
%绘制critic
criticDLNet = dlnetwork(criticNet, Initialize=false);
%固定随机种子
rng(0)
%建立critic
critic1 = rlQValueFunction(initialize(criticDLNet),obsInfo, actInfo);
critic2 = rlQValueFunction(initialize(criticDLNet),obsInfo, actInfo);
%建立actor
actorNet = [
featureInputLayer(numObs, Name="StateInLyr")
fullyConnectedLayer(64)
reluLayer
fullyConnectedLayer(32)
reluLayer
fullyConnectedLayer(numAct)
sigmoidLayer(Name="ActionOutLyr")
];
%绘制actoer
actordlNet = dlnetwork(actorNet);
% summary(actordlNet)
% plot(actordlNet)
%构建actor
actor = rlContinuousDeterministicActor(actordlNet,obsInfo,actInfo);
%设置智能体参数
Ts_agent = 0.001;
agentOpts = rlTD3AgentOptions( ...
SampleTime=Ts_agent, ...
DiscountFactor=0.995, ...
ExperienceBufferLength=2e6, ...
MiniBatchSize=256, ...
NumStepsToLookAhead=1, ...
TargetSmoothFactor=0.005, ...
TargetUpdateFrequency=10);
for idx = 1:2
agentOpts.CriticOptimizerOptions(idx).LearnRate = 1e-4;
agentOpts.CriticOptimizerOptions(idx).GradientThreshold = 1;
agentOpts.CriticOptimizerOptions(idx).L2RegularizationFactor = 1e-3;
end
% Actor optimizer options
agentOpts.ActorOptimizerOptions.LearnRate = 1e-3;
agentOpts.ActorOptimizerOptions.GradientThreshold = 1;
agentOpts.ActorOptimizerOptions.L2RegularizationFactor = 1e-3;
%设置噪声参数
%设置噪声的方差和衰减率
agentOpts.ExplorationModel.Variance = 0.05;
agentOpts.ExplorationModel.VarianceDecayRate = 2e-4;
agentOpts.ExplorationModel.VarianceMin = 0.001;
%高斯动作噪声模型来平滑目标策略更新
agentOpts.TargetPolicySmoothModel.Variance = 0.1;
agentOpts.TargetPolicySmoothModel.VarianceDecayRate = 1e-4;
% 使用指定的参与者、批评者和选项创建代理
agent = rlTD3Agent(actor, [critic1,critic2], agentOpts);
%%
%训练的智能体
T2 = 2;
maxepisodes = 1000;
maxsteps = ceil(T2/Ts_agent);
trainOpts = rlTrainingOptions(...
MaxEpisodes=maxepisodes, ...
MaxStepsPerEpisode=maxsteps, ...
StopTrainingCriteria="AverageReward",...
StopTrainingValue=-190,...
ScoreAveragingWindowLength=100);
doTraining = true;
if doTraining
trainResult = train(agent, env, trainOpts);
else
load("rlPMSMAgent.mat","agent")
end
%%
%智能体仿真
sim(mdl);

Sign in to comment.

Products


Release

R2024a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!