Deviance is a generalization of the residual sum of squares. It measures the
goodness of fit compared to a saturated model.

Deviance of a model *M*_{1} is twice the difference
between the loglikelihood of the model *M*_{1} and the
saturated model *M*_{s}. A saturated
model is a model with the maximum number of parameters that you can estimate.

For example, if you have *n* observations
(*y*_{i},
*i* = 1, 2, ..., *n*) with potentially different
values for
*X*_{i}^{T}β,
then you can define a saturated model with *n* parameters. Let
*L*(*b*,*y*) denote the maximum value
of the likelihood function for a model with the parameters *b*. Then the
deviance of the model *M*_{1} is

where *b*_{1} and
*b*_{s} contain the
estimated parameters for the model *M*_{1} and the
saturated model, respectively. The deviance has a chi-square distribution with *n* – *p* degrees of freedom, where *n* is the number of parameters
in the saturated model and *p* is the number of parameters in the model
*M*_{1}.

Assume you have two different generalized linear regression models
*M*_{1} and
*M*_{2}, and
*M*_{1} has a subset of the terms in
*M*_{2}. You can assess the fit of the models by
comparing the deviances *D*_{1} and
*D*_{2} of the two models. The difference of the
deviances is

Asymptotically, the difference *D* has a chi-square distribution with degrees
of freedom *v* equal to the difference in the number of parameters
estimated in *M*_{1} and
*M*_{2}. You can obtain the
*p*-value for this test by using
`1 – chi2cdf(D,v)`

.

Typically, you examine *D* using a model
*M*_{2} with a constant term and no predictors.
Therefore, *D* has a chi-square distribution with *p* – 1 degrees of freedom. If the dispersion is estimated, the difference divided
by the estimated dispersion has an *F* distribution with *p* – 1 numerator degrees of freedom and *n* – *p* denominator degrees of freedom.