createMDP

Create Markov decision process model

Description

example

MDP = createMDP(states,actions) creates a Markov decision process model with the specified states and actions.

Examples

collapse all

Create an MDP model with eight states and two possible actions.

MDP = createMDP(8,["up";"down"]);

Specify the state transitions and their associated rewards.

% State 1 Transition and Reward
MDP.T(1,2,1) = 1;
MDP.R(1,2,1) = 3;
MDP.T(1,3,2) = 1;
MDP.R(1,3,2) = 1;

% State 2 Transition and Reward
MDP.T(2,4,1) = 1;
MDP.R(2,4,1) = 2;
MDP.T(2,5,2) = 1;
MDP.R(2,5,2) = 1;

% State 3 Transition and Reward
MDP.T(3,5,1) = 1;
MDP.R(3,5,1) = 2;
MDP.T(3,6,2) = 1;
MDP.R(3,6,2) = 4;

% State 4 Transition and Reward
MDP.T(4,7,1) = 1;
MDP.R(4,7,1) = 3;
MDP.T(4,8,2) = 1;
MDP.R(4,8,2) = 2;

% State 5 Transition and Reward
MDP.T(5,7,1) = 1;
MDP.R(5,7,1) = 1;
MDP.T(5,8,2) = 1;
MDP.R(5,8,2) = 9;

% State 6 Transition and Reward
MDP.T(6,7,1) = 1;
MDP.R(6,7,1) = 5;
MDP.T(6,8,2) = 1;
MDP.R(6,8,2) = 1;

% State 7 Transition and Reward
MDP.T(7,7,1) = 1;
MDP.R(7,7,1) = 0;
MDP.T(7,7,2) = 1;
MDP.R(7,7,2) = 0;

% State 8 Transition and Reward
MDP.T(8,8,1) = 1;
MDP.R(8,8,1) = 0;
MDP.T(8,8,2) = 1;
MDP.R(8,8,2) = 0;

Specify the terminal states of the model.

MDP.TerminalStates = ["s7";"s8"];

Input Arguments

collapse all

Model states, specified as one of the following:

• Positive integer — Specify the number of model states. In this case, each state has a default name, such as "s1" for the first state.

• String vector — Specify the state names. In this case, the total number of states is equal to the length of the vector.

Model actions, specified as one of the following:

• Positive integer — Specify the number of model actions. In this case, each action has a default name, such as "a1" for the first action.

• String vector — Specify the action names. In this case, the total number of actions is equal to the length of the vector.

Output Arguments

collapse all

MDP model, returned as a GenericMDP object with the following properties.

Name of the current state, specified as a string.

State names, specified as a string vector with length equal to the number of states.

Action names, specified as a string vector with length equal to the number of actions.

State transition matrix, specified as a 3-D array, which determines the possible movements of the agent in an environment. State transition matrix T is a probability matrix that indicates how likely the agent will move from the current state s to any possible next state s' by performing action a. T is an S-by-S-by-A array, where S is the number of states and A is the number of actions. It is given by:

The sum of the transition probabilities out from a nonterminal state s following a given action must sum up to one. Therefore, all stochastic transitions out of a given state must be specified at the same time.

For example, to indicate that in state 1 following action 4 there is an equal probability of moving to states 2 or 3, use the following:

MDP.T(1,[2 3],4) = [0.5 0.5];

You can also specify that, following an action, there is some probability of remaining in the same state. For example:

MDP.T(1,[1 2 3 4],1) = [0.25 0.25 0.25 0.25];

Reward transition matrix, specified as a 3-D array, which determines how much reward the agent receives after performing an action in the environment. R has the same shape and size as state transition matrix T. The reward for moving from state s to state s' by performing action a is given by:

Terminal state names in the grid world, specified as a string vector of state names.