How does the Q-Learning update the qTable by using the reinforcement learning toolbox?

Question

Tracy Shang on 1 May 2021

1
Link

Direct link to this question

https://nl.mathworks.com/matlabcentral/answers/818725-how-does-the-q-learning-update-the-qtable-by-using-the-reinforcement-learning-toolbox

Commented: Adi Firdaus on 10 Dec 2021

Open in MATLAB Online

The 'MaxEpisodes' and "maxStepPerEpisode' are set to 1.

I ran the following code. After the first episode, the Q(4,1) is set to -1.

However, I ran the “train section" and the both Q(4,1) and Q(4,2) are updated, as shown in the following figure.

In the second episode, the action 2 is executed in state 4. Therefore, In my opion, only Q(4,2) should be updated as -1.

Why is Q(4,2) set to 0.7441?

Why is Q(4,1) is updated too and set to -1.67?

clear
GW = createGridWorld(4,4);
GW.CurrentState = '[2,1]';
GW.TerminalStates = '[4,4]';
nS = numel(GW.States);
nA = numel(GW.Actions);
GW.R = -1*ones(nS,nS,nA);
GW.R(:,state2idx(GW,GW.TerminalStates),:) = 10;
env = rlMDPEnv(GW);
qTable = rlTable(getObservationInfo(env),getActionInfo(env));
critic = rlQValueRepresentation(qTable,getObservationInfo(env),getActionInfo(env));
critic.Options.LearnRate =1;
agentOpt = rlQAgentOptions;
agentOpt.EpsilonGreedyExploration.Epsilon = 0.05;
agentOpt.DiscountFactor = 1;
agent = rlQAgent(critic, agentOpt);
plot(env)
env.Model.Viewer.ShowTrace = true;
env.Model.Viewer.clearTrace;
%% train section
rng(0)
opt = rlTrainingOptions(...
    'MaxEpisodes',1,...
    'MaxStepsPerEpisode',1,...
    'StopTrainingCriteria',"AverageReward",...
    'Plots', "none",...
    'StopTrainingValue',480);
trainStats = train(agent,env,opt);
%%
aa = getLearnableParameters(getCritic(agent));

0 Comments
Show -2 older commentsHide -2 older comments

Sign in to comment.

Sign in to answer this question.

Answer 1

Emmanouil Tzorakoleftherakis on 3 May 2021

0
Link

Direct link to this answer

https://nl.mathworks.com/matlabcentral/answers/818725-how-does-the-q-learning-update-the-qtable-by-using-the-reinforcement-learning-toolbox#answer_691070

Open in MATLAB Online

Can you try

critic.Options.L2RegularizationFactor=0;

This parameter is nonzero by default and likely the reason for the discrepancy you are observing

2 Comments
Show NoneHide None

Tracy Shang on 4 May 2021

Edited: Tracy Shang on 4 May 2021

Open in MATLAB Online

Thanks for your answer!

I tried the code you suggested. The resut showed no difference.

But you inspired me!

I tried another parameter just like as follows. The qTable was updated as shown in the following figure.

critic.Options.OptimizerParameters.GradientDecayFactor =0;

I tried both parameters by add the following codes and the qTable was updated as shown in the following figure. At least, the question about Q(4,1) is solved.

According the parameters I set, the equtation of calculating Qvalue is simplified as follows.

That is,

.

Why is Q(4,2) set to -1.4139?

critic.Options.OptimizerParameters.GradientDecayFactor =0;  
critic.Options.L2RegularizationFactor=0;

Looking forward to your further answer. Thank you very much!

Adi Firdaus on 10 Dec 2021

need answer too

Sign in to comment.

How does the Q-Learning update the qTable by using the reinforcement learning toolbox?

0 Comments
Show -2 older commentsHide -2 older comments

Answers (1)

2 Comments
Show NoneHide None

See Also

Categories

Tags

Community Treasure Hunt

How does the Q-Learning update the qTable by using the reinforcement learning toolbox?

0 Comments Show -2 older commentsHide -2 older comments

Answers (1)

2 Comments Show NoneHide None

See Also

Categories

Tags

Community Treasure Hunt

0 Comments
Show -2 older commentsHide -2 older comments

2 Comments
Show NoneHide None