This section focuses on using likelihood-based methods for multivariate normal regression. The parameters of the regression model are estimated via maximum likelihood estimation. For multiple series, this requires iteration until convergence. The complication due to the possibility of missing data is incorporated into the analysis with a variant of the EM algorithm known as the ECM algorithm.

The underlying theory of maximum likelihood estimation and the definition and significance of the Fisher information matrix can be found in Caines [1] and Cramér [2]. The underlying theory of the ECM algorithm can be found in Meng and Rubin [8] and Sexton and Swensen [9].

In addition, these two examples of maximum likelihood estimation are presented:

Suppose that you have a multivariate normal linear regression model in the form

$$\left[\begin{array}{c}{Z}_{1}\\ \vdots \\ {Z}_{m}\end{array}\right]~N\left(\left[\begin{array}{c}{H}_{1}b\\ \vdots \\ {H}_{m}b\end{array}\right],\text{}\left[\begin{array}{ccc}C& & 0\\ & \ddots & \\ 0& & C\end{array}\right]\right),$$

where the model has *m* observations of *n*-dimensional
random variables **Z**_{1},
..., **Z*** _{m}* with
a linear regression model that has a

Given a parameter vector **b** and
a collection of design matrices, the collection of *m* independent
variables **Z*** _{k}* is assumed to have independent
identically distributed multivariate normal residual errors

`n`

-by-`n`

covariance
matrix A concise way to write this model is

$${Z}_{k}\sim N\left({H}_{k}b,\text{\hspace{0.17em}}C\right)$$

for *k* = 1, ..., *m*.

The goal of multivariate normal regression is to obtain maximum
likelihood estimates for **b** and **C** given a collection of *m* observations **z**_{1}, ..., **z*** _{m}* of
the random variables

Quasi-maximum likelihood estimation works with the same models but with a relaxation of the assumption of normally distributed residuals. In this case, however, the parameter estimates are asymptotically optimal.

To estimate the parameters of the multivariate normal linear
regression model using maximum likelihood estimation, it is necessary
to maximize the log-likelihood function over the estimation parameters
given observations **z**_{1},
... , **z*** _{m}*.

Given the multivariate normal model to characterize residual errors in the regression model, the log-likelihood function is

$$\begin{array}{c}L\left({z}_{1},\dots ,{z}_{m};\text{\hspace{0.17em}}b,\text{\hspace{0.17em}}C\right)=\frac{1}{2}mn\mathrm{log}\left(2\pi \right)+\frac{1}{2}m\mathrm{log}\left(\mathrm{det}\left(C\right)\right)\\ +\frac{1}{2}{\displaystyle \sum _{k=1}^{m}{\left({z}_{k}-{H}_{k}b\right)}^{T}{C}^{-1}\left({z}_{k}-{H}_{k}b\right)}.\end{array}$$

Although the cross-sectional residuals must be independent,
you can use this log-likelihood function for quasi-maximum likelihood
estimation. In this case, the estimates for the parameters **b** and **C** provide
estimates to characterize the first and second moments of the residuals.
See Caines [1] for details.

Except for a special case (see Special Case of Multiple Linear Regression Model), if both
the model parameters in **b** and the
covariance parameters in **C** are to
be estimated, the estimation problem is intractably nonlinear and
a solution must use iterative methods. Denote estimates for the parameters **b** and **C** for
iteration *t* = 0, 1, ... with the superscript
notation **b*** ^{(}^{t}^{)}* and

Given initial estimates **b**^{(0)} and **C**^{(0)} for the parameters,
the maximum likelihood estimates for **b** and **C ** are obtained using a two-stage iterative
process with

$${b}^{\left(t+1\right)}={\left({\displaystyle \sum _{k=1}^{m}{H}_{k}{}^{T}{\left({C}^{\left(t\right)}\right)}^{-1}{H}_{k}}\right)}^{-1}\left({\displaystyle \sum _{k=1}^{m}{H}_{k}{}^{T}{\left({C}^{\left(t\right)}\right)}^{-1}{z}_{k}}\right)$$

and

$${C}^{\left(t+1\right)}=\frac{1}{m}{\displaystyle \sum _{k=1}^{m}\left({z}_{k}-{H}_{k}{b}^{\left(t+1\right)}\right){\left({z}_{k}-{H}_{k}{b}^{\left(t+1\right)}\right)}^{T}}$$

for *t* = 0, 1, ... .

The special case mentioned in Maximum Likelihood Estimation occurs if *n* =
1 so that the sequence of observations is a sequence of scalar observations.
This model is known as a multiple linear regression model. In this
case, the covariance matrix **C** is
a `1`

-by-`1`

matrix that drops out
of the maximum likelihood iterates so that a single-step estimate
for **b** and **C** can
be obtained with converged estimates **b**^{(1)} and **C**^{(1)}.

Another simplification of the general model is called least-squares
regression. If **b**^{(0)} = **0** and **C**^{(0)} = **I**, then **b**^{(1)} and **C**^{(1)} from the two-stage
iterative process are least-squares estimates for **b** and **C**, where

$${b}^{LS}={\left({\displaystyle \sum _{k=1}^{m}{H}_{k}{}^{T}{H}_{k}}\right)}^{-1}\left({\displaystyle \sum _{k=1}^{m}{H}_{k}{}^{T}{z}_{k}}\right)$$

and

$${C}^{LS}=\frac{1}{m}{\displaystyle \sum _{k=1}^{m}\left({z}_{k}-{H}_{k}{b}^{LS}\right){\left({z}_{k}-{H}_{k}{b}^{LS}\right)}^{T}}.$$

A final simplification of the general model is to estimate the
mean and covariance of a sequence of *n*-dimensional
observations **z**_{1},
..., **z*** _{m}*.
In this case, the number of series is equal to the number of model
parameters with

If the iterative process continues until the log-likelihood
function increases by no more than a specified amount, the resultant
estimates are said to be maximum likelihood estimates **b*** ^{ML}* and

If *n* = 1 (which implies a single data series),
convergence occurs after only one iterative step, which, in turn,
implies that the least-squares and maximum likelihood estimates are
identical. If, however, *n* > 1, the least-squares
and maximum likelihood estimates are usually distinct.

In Financial Toolbox™ software, both the changes in the log-likelihood function and the norm of the change in parameter estimates are monitored. Whenever both changes fall below specified tolerances (which should be something between machine precision and its square root), the toolbox functions terminate under an assumption that convergence has been achieved.

Since maximum likelihood estimates are formed from samples of random variables, their estimators are random variables; an estimate derived from such samples has an uncertainty associated with it. To characterize these uncertainties, which are called standard errors, two quantities are derived from the total log-likelihood function.

The Hessian of the total log-likelihood function is

$${\nabla}^{2}L\left({z}_{1},\dots ,{z}_{m};\text{\hspace{0.17em}}\theta \right)$$

and the Fisher information matrix is

$$I\left(\theta \right)=-E\left[{\nabla}^{2}L\left({z}_{1},\dots ,{z}_{m};\text{\hspace{0.17em}}\theta \right)\right],$$

where the partial derivatives of the $${\nabla}^{2}$$ operator are taken with respect
to the combined parameter vector Θ that contains the distinct
components of **b** and **C** with a total of *q* = *p* + *n* (*n* +
1)/2 parameters.

Since maximum likelihood estimation is concerned with large-sample estimates, the central limit theorem applies to the estimates and the Fisher information matrix plays a key role in the sampling distribution of the parameter estimates. Specifically, maximum likelihood parameter estimates are asymptotically normally distributed such that

$$\left({\theta}^{\left(t\right)}-\theta \right)\sim N\left(0,\text{\hspace{0.17em}}{I}^{-1},\text{\hspace{0.17em}}\left({\theta}^{\left(t\right)}\right)\right)\text{as}t\to \infty ,$$

where Θ is the combined parameter vector and Θ^{(}^{t}^{)} is
the estimate for the combined parameter vector at iteration *t* =
0, 1, ... .

The Fisher information matrix provides a lower bound, called a Cramér-Rao lower bound, for the standard errors of estimates of the model parameters.

Given an estimate for the combined parameter vector Θ, the squared standard errors are the diagonal elements of the inverse of the Fisher information matrix

$${s}^{2}\left({\widehat{\theta}}_{i}\right)={\left({I}^{-1}\left({\widehat{\theta}}_{i}\right)\right)}_{ii}$$

for *i* = 1, ..., *q*.

Since the standard errors are estimates for the standard deviations of the parameter estimates, you can construct confidence intervals so that, for example, a 95% interval for each parameter estimate is approximately

$${\widehat{\theta}}_{i}\pm 1.96s\left({\widehat{\theta}}_{i}\right)$$

for *i* = 1, ..., *q*.

Error ellipses at a level-of-significance α ε [0, 1] for the parameter estimates satisfy the inequality

$${\left(\theta -\widehat{\theta}\right)}^{T}I\left(\widehat{\theta}\right)\left(\theta -\widehat{\theta}\right)\le {\chi}_{1-\alpha ,q}^{2}$$

and follow a $${\chi}^{2}$$ distribution with *q* degrees-of-freedom.
Similar inequalities can be formed for any subcollection of the parameters.

In general, given parameter estimates, the computed Fisher information matrix, and the log-likelihood function, you can perform numerous statistical tests on the parameters, the model, and the regression.

`convert2sur`

| `ecmlsrmle`

| `ecmlsrobj`

| `ecmmvnrfish`

| `ecmmvnrfish`

| `ecmmvnrmle`

| `ecmmvnrobj`

| `ecmmvnrstd`

| `ecmmvnrstd`

| `ecmnfish`

| `ecmnhess`

| `ecmninit`

| `ecmnmle`

| `ecmnobj`

| `ecmnstd`

| `mvnrfish`

| `mvnrmle`

| `mvnrobj`

| `mvnrstd`