Documentation

### This is machine translation

Mouseover text to see original. Click the button below to return to the English version of the page.

Note: This page has been translated by MathWorks. Click here to see
To view all translated materials including this page, select Country from the country navigator on the bottom of this page.

## Multivariate Time Series Data Structures

### Multivariate Time Series Data

Often, the first step in creating a multiple time series model is to obtain data. There are two types of multiple time series data:

Before using Econometrics Toolbox™ functions with the data, put the data into the required form. Use standard MATLAB® commands, or preprocess the data with a spreadsheet program, database program, PERL, or other utilities.

There are several freely available sources of data sets, such as the St. Louis Federal Reserve Economics Database (known as FRED): `https://research.stlouisfed.org/fred2/`. If you have a license, you can use Datafeed Toolbox™ functions to access data from various sources.

### Response and Predictor Data Structures

For multiple time series models, response and predictor data sets must be in separate arrays. Each row of the matrix represents one time or observation, and each column of the matrix represents one time series or variable. The earliest data is the first row, the latest data is the last row. The response and predictor data represents yt and xt, respectively, in the notation of Types of Multivariate Time Series Models.

The problem context determines the response data structure. Specifically, response data can be any of the following:

• Multiple paths of presample observations, a three-dimensional array with pages corresponding to separate paths.

• Sample path to which the VAR model is fit, a matrix.

• Multiple paths of future observations for conditional forecasting or simulation, a three-dimensional array with pages corresponding to separate paths.

The predictor data structure is a matrix representing one path of observations.

Suppose there are T sample times and n time series. To create a variable representing one path of response data, put the data in the form of a T-by-n matrix:

`$\left[\begin{array}{cccc}{Y}_{1,1}& {Y}_{2,1}& \cdots & {Y}_{n,1}\\ {Y}_{1,2}& {Y}_{2,2}& \cdots & {Y}_{n,2}\\ ⋮& ⋮& \ddots & ⋮\\ {Y}_{1,T}& {Y}_{2,T}& \cdots & {Y}_{n,T}\end{array}\right]$`

Yj,t represents response series j, for j = 1,..., n and 1 ≤ t ≤ T. You must structure the predictor data similarly.

#### Multiple Paths

Depending on the context, response data can have an extra dimension corresponding to separate, independent paths. For this type of data, use a three-dimensional array `Y(t,j,p)`, where:

• `t` is the time index of an observation, 1 ≤ `t` ≤ `numobs`.

• `j` is the index of a time series, 1 ≤ `j` ≤ `numseries`.

• `p` is the path index, 1 ≤ `p` ≤ `numpaths`.

For any path `p`, `Y(t,j,p)` is a time series.

### Example: Response Data Structure

The file `Data_USEconModel` ships with Econometrics Toolbox software. It contains time series from the St. Louis Federal Reserve Economics Database (known as FRED).

Enter

`load Data_USEconModel`

to load the data into your MATLAB workspace. The following items load into the workspace:

• `Data`, a 249-by-14 matrix containing the 14 time series,

• `DataTable`, a 249-by-14 `timetable` that packages the data,

• `dates`, a 249-element vector containing the dates for `Data`,

• `Description`, a character array containing a description of the data series and the key to the labels for each series,

• `series`, a 1-by-14 cell array of labels for the time series.

Examine the data structures:

`firstPeriod = dates(1)`
```firstPeriod = 711217 ```
`lastPeriod = dates(end)`
```lastPeriod = 733863 ```
• `dates` is a vector containing MATLAB serial date numbers, the number of days since the putative date January 1, 0000. (This “date” is not a real date, but is convenient for making date calculations; for more information, see Date Formats (Financial Toolbox) in the Financial Toolbox™ User's Guide.)

• The `Data` matrix contains 14 columns. These represent the time series labeled by the cell vector of strings `series`.

FRED SeriesDescription
COEPaid compensation of employees in \$ billions
CPIAUCSL Consumer Price Index
FEDFUNDSEffective federal funds rate
GCEGovernment consumption expenditures and investment in \$ billions
GDPGross Domestic Product
GDPDEFGross domestic product in \$ billions
GDPIGross private domestic investment in \$ billions
GS10Ten-year treasury bond yield
HOANBSNon-farm business sector index of hours worked
M1SL M1 money supply (narrow money)
M2SLM2 money supply (broad money)
PCECPersonal consumption expenditures in \$ billions
TB3MS Three-month treasury bill yield
UNRATEUnemployment rate

`DataTable` is a `timetable` array containing the same data as in `Data`. However, like tables, you can use dot notation to access a variable, for example, `DataTable.UNRATE` calls the unemployment rate time series. Also, all time tables contain the variable `Time`, which is a `datetime` object. For more details, see Create Timetables (MATLAB) and Represent Dates and Times in MATLAB (MATLAB).

### Data Preprocessing

Your data might have characteristics that violate assumptions for linear multiple time series models. For example, you can have data with exponential growth, or data from multiple sources at different periodicities. You must preprocess your data to convert it into an acceptable form for analysis.

• For time series with exponential growth, you can preprocess the data by taking the logarithm of the growing series. In some cases you then difference the result. For an example, see VAR Model Case Study.

• For data from multiple sources, you must decide how best to fill in missing values. Commonly, you take the missing values as unchanged from the previous value, or as interpolated from neighboring values.

### Note

If you take a difference of a series, the series becomes 1 shorter. If you take a difference of only some time series in a data set, truncate the other series so that all have the same length, or pad the differenced series with initial values.

#### Testing Data for Stationarity

You can test each time series data column for stability using unit root tests. For details, see Unit Root Nonstationarity.

### Partitioning Response Data

To fit a lagged model to data, partition the response data in up to three sections:

• Presample data

• Estimation data

• Forecast data

The purpose of presample data is to provide initial values for lagged variables. When trying to fit a model to the estimation data, you need to access earlier times. For example, if the maximum lag in a model is 4, and if the earliest time in the estimation data is 50, then the model needs to access data at time 46 when fitting the observations at time 50. By default, `estimate` removes the required amount of observations from the response data to use as presample data. This reduces the effective sample size.

The estimation data contains the observations yt, and `estimate` fits the model to this data explicitly. The number of observations in the estimation data is the effective sample size.

Use the forecast data for comparing fitted model predictions against data. You do not have to have a forecast period. Use one to validate the predictive power of a fitted model. The following figure shows how to arrange the data in the data matrix, with j presample rows and k forecast rows. Download ebook