Often, the first step in creating a multiple time series model is to obtain data. There are two types of multiple time series data:
Response data. Response data corresponds to yt in the multiple time series models defined in Types of Multivariate Time Series Models.
Exogenous data. Exogenous data corresponds to xt in the multiple time series models defined in Types of Multivariate Time Series Models. Each variable in the exogenous data appears in all response equations.
Before using Econometrics Toolbox™ functions with the data, put the data into the required form. Use standard MATLAB® commands, or preprocess the data with a spreadsheet program, database program, PERL, or other utilities.
There are several freely available sources of data sets, such as the St. Louis Federal
Reserve Economics Database (known as FRED):
https://research.stlouisfed.org/fred2/. If you
have a license, you can use Datafeed
Toolbox™ functions to access data from various sources.
For multiple time series models, response and predictor data sets must be in separate arrays. Each row of the matrix represents one time or observation, and each column of the matrix represents one time series or variable. The earliest data is the first row, the latest data is the last row. The response and predictor data represents yt and xt, respectively, in the notation of Types of Multivariate Time Series Models.
The problem context determines the response data structure. Specifically, response data can be any of the following:
Multiple paths of presample observations, a three-dimensional array with pages corresponding to separate paths.
Sample path to which the VAR model is fit, a matrix.
Multiple paths of future observations for conditional forecasting or simulation, a three-dimensional array with pages corresponding to separate paths.
The predictor data structure is a matrix representing one path of observations.
Suppose there are T sample times and n time series. To create a variable representing one path of response data, put the data in the form of a T-by-n matrix:
Yj,t represents response series j, for j = 1,..., n and 1 ≤ t ≤ T. You must structure the predictor data similarly.
Depending on the context, response data can have an extra dimension
corresponding to separate, independent paths. For this type of data,
use a three-dimensional array
t is the time index of an observation,
j is the index of a time series,
p is the path index, 1 ≤
For any path
a time series.
Data_USEconModel ships with Econometrics
It contains time series from the St. Louis Federal Reserve Economics
Database (known as FRED).
to load the data into your MATLAB workspace. The following items load into the workspace:
Data, a 249-by-14 matrix containing
the 14 time series,
DataTable, a 249-by-14
packages the data,
dates, a 249-element vector containing
the dates for
Description, a character array
containing a description of the data series and the key to the labels
for each series,
series, a 1-by-14 cell array of
labels for the time series.
Examine the data structures:
firstPeriod = dates(1)
firstPeriod = 711217
lastPeriod = dates(end)
lastPeriod = 733863
dates is a vector containing MATLAB serial
date numbers, the number of days since the putative date January 1,
0000. (This “date” is not a real date, but is convenient
for making date calculations; for more information, see Date Formats (Financial Toolbox) in the Financial
Data matrix contains 14 columns.
These represent the time series labeled by the cell vector of strings
|COE||Paid compensation of employees in $ billions|
|CPIAUCSL||Consumer Price Index|
|FEDFUNDS||Effective federal funds rate|
|GCE||Government consumption expenditures and investment in $ billions|
|GDP||Gross Domestic Product|
|GDPDEF||Gross domestic product in $ billions|
|GDPI||Gross private domestic investment in $ billions|
|GS10||Ten-year treasury bond yield|
|HOANBS||Non-farm business sector index of hours worked|
|M1SL||M1 money supply (narrow money)|
|M2SL||M2 money supply (broad money)|
|PCEC||Personal consumption expenditures in $ billions|
|TB3MS||Three-month treasury bill yield|
DataTable is a
timetable array containing the same data
Data. However, like tables, you can use dot notation to
access a variable, for example,
DataTable.UNRATE calls the
unemployment rate time series. Also, all time tables contain the variable
Time, which is a
datetime object. For more
details, see Create Timetables (MATLAB) and Represent Dates and Times in MATLAB (MATLAB).
Your data might have characteristics that violate assumptions for linear multiple time series models. For example, you can have data with exponential growth, or data from multiple sources at different periodicities. You must preprocess your data to convert it into an acceptable form for analysis.
For time series with exponential growth, you can preprocess the data by taking the logarithm of the growing series. In some cases you then difference the result. For an example, see VAR Model Case Study.
For data from multiple sources, you must decide how best to fill in missing values. Commonly, you take the missing values as unchanged from the previous value, or as interpolated from neighboring values.
If you take a difference of a series, the series becomes 1 shorter. If you take a difference of only some time series in a data set, truncate the other series so that all have the same length, or pad the differenced series with initial values.
You can test each time series data column for stability using unit root tests. For details, see Unit Root Nonstationarity.
To fit a lagged model to data, partition the response data in up to three sections:
The purpose of presample data is to provide initial values for
lagged variables. When trying to fit a model to the estimation data,
you need to access earlier times. For example, if the maximum lag
in a model is 4, and if the earliest time in the estimation data is
50, then the model needs to access data at time 46 when fitting the
observations at time 50. By default,
the required amount of observations from the response data to use
as presample data. This reduces the effective sample size.
The estimation data contains the observations yt,
estimate fits the model to this data explicitly.
The number of observations in the estimation data is the effective
Use the forecast data for comparing fitted model predictions against data. You do not have to have a forecast period. Use one to validate the predictive power of a fitted model.
The following figure shows how to arrange the data in the data matrix, with j presample rows and k forecast rows.