Problems using LSTM with PPO Agent - Error: Invalid input argument type or size such as observation, reward, isdone or loggedSignals.

Question

Stephan on 24 Jul 2020

0
Link

Direct link to this question

https://nl.mathworks.com/matlabcentral/answers/570067-problems-using-lstm-with-ppo-agent-error-invalid-input-argument-type-or-size-such-as-observation

Commented: katuysha on 5 Jun 2023

Hi,

i implemented some RL agents (DQN, AC, PPO...) successfully with my custom environment function by using a feedforward network like shown in the documentation here. All worked properly but the model did not converge. So i tried to use a LSTM network to see if this would work better in this case. Therefore i made some adjustments to my code following this part of the documentation. The functions are working without any problems and also the episode manager is starting properly. Also if start the reset- and the step functions manually everything looks like it should. But when i run the script, after a short moment i get the error message

>> RL_PPO_LSTM
Error using rl.agent.AbstractPolicy/step (line 116)
Invalid input argument type or size such as observation, reward, isdone or
loggedSignals.
Error in rl.env.MATLABEnvironment/simLoop (line 241)
                    action = step(policy,observation,reward,isdone);
Error in rl.env.MATLABEnvironment/simWithPolicyImpl (line 106)
                    [expcell{simCount},epinfo,siminfos{simCount}] =
                    simLoop(env,policy,opts,simCount,usePCT);
Error in rl.env.AbstractEnv/simWithPolicy (line 70)
            [experiences,varargout{1:(nargout-1)}] =
            simWithPolicyImpl(this,policy,opts,varargin{:});
Error in rl.task.SeriesTrainTask/runImpl (line 33)
            [varargout{1},varargout{2}] =
            simWithPolicy(this.Env,this.Agent,simOpts);
Error in rl.task.Task/run (line 21)
            [varargout{1:nargout}] = runImpl(this);
Error in rl.task.TaskSpec/internal_run (line 159)
            [varargout{1:nargout}] = run(task);
Error in rl.task.TaskSpec/runDirect (line 163)
            [this.Outputs{1:getNumOutputs(this)}] = internal_run(this);
Error in rl.task.TaskSpec/runScalarTask (line 187)
                runDirect(this);
Error in rl.task.TaskSpec/run (line 69)
                runScalarTask(task);
Error in rl.train.SeriesTrainer/run (line 24)
            run(seriestaskspec);
Error in rl.train.TrainingManager/train (line 291)
            run(trainer);
Error in rl.train.TrainingManager/run (line 160)
            train(this);
Error in rl.agent.AbstractAgent/train (line 54)
TrainingStatistics = run(trainMgr);
Error in RL_PPO_LSTM (line 83)
trainingStats = train(agent,env,trainOpts);
Caused by:
    Expected one output from a curly brace or dot indexing expression, but
    there were 2 results.

I saw a similar questions here on answers:

https://de.mathworks.com/matlabcentral/answers/471256-how-to-solve-invalid-input-argument-type-or-size-such-as-observation-reward-isdone-or-loggedsigna

and i changed my functions to output row vectors as logged.signal but that did not change anything. I tried to debug this, by setting "pause on error" - but im really lost here.

Thanks for your help!

Stephan

0 Comments
Show -2 older commentsHide -2 older comments

Sign in to comment.

Sign in to answer this question.

Answer 1

Stephan on 25 Jul 2020

1
Link

Direct link to this answer

https://nl.mathworks.com/matlabcentral/answers/570067-problems-using-lstm-with-ppo-agent-error-invalid-input-argument-type-or-size-such-as-observation#answer_470791

I finally could solve the issue. The problem was that there were 2 LSTM layers in the network, which led to the error:

Caused by:
    Expected one output from a curly brace or dot indexing expression, but
    there were 2 results.

Removing the second LSTM layer solved the problem.

1 Comment
Show -1 older commentsHide -1 older comments

katuysha on 5 Jun 2023

can not understand your description, please make it clearer

Sign in to comment.

Problems using LSTM with PPO Agent - Error: Invalid input argument type or size such as observation, reward, isdone or loggedSignals.

0 Comments
Show -2 older commentsHide -2 older comments

Accepted Answer

1 Comment
Show -1 older commentsHide -1 older comments

More Answers (0)

See Also

Categories

Tags

Products

Release

Community Treasure Hunt

Problems using LSTM with PPO Agent - Error: Invalid input argument type or size such as observation, reward, isdone or loggedSignals.

0 Comments Show -2 older commentsHide -2 older comments

Accepted Answer

1 Comment Show -1 older commentsHide -1 older comments

More Answers (0)

See Also

Categories

Tags

Products

Release

Community Treasure Hunt

0 Comments
Show -2 older commentsHide -2 older comments

1 Comment
Show -1 older commentsHide -1 older comments