Note: This page has been translated by MathWorks. Click here to see

To view all translated materials including this page, select Country from the country navigator on the bottom of this page.

To view all translated materials including this page, select Country from the country navigator on the bottom of this page.

Often, the first step in creating a multiple time series model is to obtain data. There are two types of multiple time series data:

**Response data**. Response data corresponds to*y*in the multiple time series models defined in Types of Multivariate Time Series Models._{t}**Exogenous data**. Exogenous data corresponds to*x*in the multiple time series models defined in Types of Multivariate Time Series Models. Each variable in the exogenous data appears in all response equations._{t}

Before using Econometrics
Toolbox™ functions with the data,
put the data into the required form. Use standard MATLAB^{®} commands,
or preprocess the data with a spreadsheet program, database program,
PERL, or other utilities.

There are several freely available sources of data sets, such as the St. Louis Federal
Reserve Economics Database (known as FRED): `https://research.stlouisfed.org/fred2/`

. If you
have a license, you can use Datafeed
Toolbox™ functions to access data from various sources.

For multiple time series models, response and predictor data
sets must be in separate arrays. Each row of the matrix represents
one time or observation, and each column of the matrix represents
one time series or variable. The earliest data is the first row, the
latest data is the last row. The response and predictor data represents *y** _{t}* and

The problem context determines the response data structure. Specifically, response data can be any of the following:

Multiple paths of presample observations, a three-dimensional array with pages corresponding to separate paths.

Sample path to which the VAR model is fit, a matrix.

Multiple paths of future observations for conditional forecasting or simulation, a three-dimensional array with pages corresponding to separate paths.

The predictor data structure is a matrix representing one path of observations.

Suppose there are *T* sample times and *n* time
series. To create a variable representing one path of response data,
put the data in the form of a *T*-by-*n* matrix:

$$\left[\begin{array}{cccc}{Y}_{1,1}& {Y}_{2,1}& \cdots & {Y}_{n,1}\\ {Y}_{1,2}& {Y}_{2,2}& \cdots & {Y}_{n,2}\\ \vdots & \vdots & \ddots & \vdots \\ {Y}_{1,T}& {Y}_{2,T}& \cdots & {Y}_{n,T}\end{array}\right]$$

*Y*_{j,t} represents
response series *j*, for *j* = 1,..., *n* and 1 ≤ *t* ≤ *T*. You must structure
the predictor data similarly.

Depending on the context, response data can have an extra dimension
corresponding to separate, independent paths. For this type of data,
use a three-dimensional array `Y(t,j,p)`

, where:

`t`

is the time index of an observation, 1 ≤`t`

≤`numobs`

.`j`

is the index of a time series, 1 ≤`j`

≤`numseries`

.`p`

is the path index, 1 ≤`p`

≤`numpaths`

.

For any path `p`

, `Y(t,j,p)`

is
a time series.

The file `Data_USEconModel`

ships with Econometrics
Toolbox software.
It contains time series from the St. Louis Federal Reserve Economics
Database (known as FRED).

Enter

`load Data_USEconModel`

to load the data into your MATLAB workspace. The following items load into the workspace:

`Data`

, a 249-by-14 matrix containing the 14 time series,`DataTable`

, a 249-by-14`timetable`

that packages the data,`dates`

, a 249-element vector containing the dates for`Data`

,`Description`

, a character array containing a description of the data series and the key to the labels for each series,`series`

, a 1-by-14 cell array of labels for the time series.

Examine the data structures:

firstPeriod = dates(1)

firstPeriod = 711217

lastPeriod = dates(end)

lastPeriod = 733863

`dates`

is a vector containing MATLAB serial date numbers, the number of days since the putative date January 1, 0000. (This “date” is not a real date, but is convenient for making date calculations; for more information, see Date Formats (Financial Toolbox) in the Financial Toolbox™ User's Guide.)The

`Data`

matrix contains 14 columns. These represent the time series labeled by the cell vector of strings`series`

.

FRED Series | Description |
---|---|

COE | Paid compensation of employees in $ billions |

CPIAUCSL | Consumer Price Index |

FEDFUNDS | Effective federal funds rate |

GCE | Government consumption expenditures and investment in $ billions |

GDP | Gross Domestic Product |

GDPDEF | Gross domestic product in $ billions |

GDPI | Gross private domestic investment in $ billions |

GS10 | Ten-year treasury bond yield |

HOANBS | Non-farm business sector index of hours worked |

M1SL | M1 money supply (narrow money) |

M2SL | M2 money supply (broad money) |

PCEC | Personal consumption expenditures in $ billions |

TB3MS | Three-month treasury bill yield |

UNRATE | Unemployment rate |

`DataTable`

is a `timetable`

array containing the same data
as in `Data`

. However, like tables, you can use dot notation to
access a variable, for example, `DataTable.UNRATE`

calls the
unemployment rate time series. Also, all time tables contain the variable
`Time`

, which is a `datetime`

object. For more
details, see Create Timetables (MATLAB) and Represent Dates and Times in MATLAB (MATLAB).

Your data might have characteristics that violate assumptions for linear multiple time series models. For example, you can have data with exponential growth, or data from multiple sources at different periodicities. You must preprocess your data to convert it into an acceptable form for analysis.

For time series with exponential growth, you can preprocess the data by taking the logarithm of the growing series. In some cases you then difference the result. For an example, see VAR Model Case Study.

For data from multiple sources, you must decide how best to fill in missing values. Commonly, you take the missing values as unchanged from the previous value, or as interpolated from neighboring values.

If you take a difference of a series, the series becomes 1 shorter. If you take a difference of only some time series in a data set, truncate the other series so that all have the same length, or pad the differenced series with initial values.

You can test each time series data column for stability using unit root tests. For details, see Unit Root Nonstationarity.

To fit a lagged model to data, partition the response data in up to three sections:

Presample data

Estimation data

Forecast data

The purpose of presample data is to provide initial values for
lagged variables. When trying to fit a model to the estimation data,
you need to access earlier times. For example, if the maximum lag
in a model is 4, and if the earliest time in the estimation data is
50, then the model needs to access data at time 46 when fitting the
observations at time 50. By default, `estimate`

removes
the required amount of observations from the response data to use
as presample data. This reduces the effective sample size.

The estimation data contains the observations *y** _{t}*,
and

`estimate`

fits the model to this data explicitly.
The number of observations in the estimation data is the effective
sample size.Use the forecast data for comparing fitted model predictions against data. You do not have to have a forecast period. Use one to validate the predictive power of a fitted model.

The following figure shows how to arrange the data in the data
matrix, with *j* presample rows and *k* forecast
rows.