compare

Compare generalized linear mixed-effects models

Syntax

results = compare(glme,altglme)

results = compare(glme,altglme,CheckNesting=checknesting)

Description

results = compare(glme,altglme) returns the results of a likelihood ratio test that compares the generalized linear mixed-effects models glme and altglme. To conduct a valid likelihood ratio test, both models must use the same response vector in the fit, and glme must be nested in altglme. Always input the smaller model first, and the larger model second.

compare tests the following null and alternate hypotheses:

H₀: Observed response vector is generated by glme.
H₁: Observed response vector is generated by model altglme.

example

results = compare(glme,altglme,CheckNesting=checknesting) checks if the first input model, glme, is nested in the second input model, altglme.

Input Arguments

expand all

`glme` — Generalized linear mixed-effects model
`GeneralizedLinearMixedModel` object

Generalized linear mixed-effects model, specified as a GeneralizedLinearMixedModel object. For properties and methods of this object, see GeneralizedLinearMixedModel.

You can create a GeneralizedLinearMixedModel object by fitting a generalized linear mixed-effects model to your sample data using fitglme. To conduct a valid likelihood ratio test on two models that have response distributions other than normal, you must fit both models using the "ApproximateLaplace" or "Laplace" fit method. Models with response distributions other than normal that are fitted using "MPL" or "REMPL" cannot be compared using a likelihood ratio test.

`altglme` — Alternative generalized linear mixed-effects model
`GeneralizedLinearMixedModel` object

Alternative generalized linear mixed-effects model, specified as a GeneralizedLinearMixedModel object. altglme be must fit to the same response vector as glme, but with different model specifications. glme must be nested in altglme, such that you can obtain glme from altglme by setting some of the model parameters of altglme to fixed values such as 0.

`checknesting` — Indicator to check nesting between two models
`true` (default) | `false`

Indicator to check nesting between two models, specified as either true or false. If checknesting is true, then compare checks if the smaller model glme is nested in the larger model altglme. If the nesting requirements are not satisfied, then compare returns an error. If checknesting is false, then compare does not perform this check.

Example: CheckNesting=true

Output Arguments

expand all

`results` — Results of likelihood ratio test
table

Results of the likelihood ratio test, returned as a table with two rows. The first row is for glme, and the second row is for altglme. The columns of results contain the following.

Column Name	Description
`Model`	Name of the model
`DF`	Degrees of freedom
`AIC`	Akaike information criterion for the model
`BIC`	Bayesian information criterion for the model
`LogLik`	Maximized log likelihood for the model
`LRStat`	Likelihood ratio test statistic for comparing `altglme` and `glme`
`deltaDF`	`DF` for `altglme` minus `DF` for `glme`
`pValue`	p-value for the likelihood ratio test

Examples

expand all

Compare Mixed-Effects Models

Open Live Script

Load the sample data.

load mfr

This simulated data is from a manufacturing company that operates 50 factories across the world, with each factory running a batch process to create a finished product. The company wants to decrease the number of defects in each batch, so it developed a new manufacturing process. To test the effectiveness of the new process, the company selected 20 of its factories at random to participate in an experiment: Ten factories implemented the new process, while the other ten continued to run the old process. In each of the 20 factories, the company ran five batches (for a total of 100 batches) and recorded the following data:

Flag to indicate whether the batch used the new process (newprocess)
Processing time for each batch, in hours (time)
Temperature of the batch, in degrees Celsius (temp)
Categorical variable indicating the supplier of the chemical used in the batch (supplier)
Number of defects in the batch (defects)

The data also includes time_dev and temp_dev, which represent the absolute deviation of time and temperature, respectively, from the process standard of 3 hours at 20 degrees Celsius.

Fit a fixed-effects-only model using newprocess, time_dev, temp_dev, and supplier as fixed-effects predictors. Specify the response distribution as Poisson, the link function as log, and the fit method as Laplace. Specify the dummy variable encoding as 'effects', so the dummy variable coefficients sum to 0.

FEglme = fitglme(mfr,'defects ~ 1 + newprocess + time_dev + temp_dev + supplier','Distribution','Poisson','Link','log','FitMethod','Laplace','DummyVarCoding','effects');

Fit a second model that uses the same fixed-effects predictors, response distribution, link function, and fit method. This time, include a random-effects intercept grouped by factory, to account for quality differences that might exist due to factory-specific variations.

The number of defects can be modeled using a Poisson distribution

${defects}_{i j} \sim Poisson (μ_{i j})$

This corresponds to the generalized linear mixed-effects model

$\log (μ_{i j}) = β_{0} + β_{1} {newprocess}_{i j} + β_{2} {time_dev}_{i j} + β_{3} {temp_dev}_{i j} + β_{4} {supplier_C}_{i j} + β_{5} {supplier_B}_{i j} + b_{i},$

where

${defects}_{i j}$ is the number of defects observed in the batch produced by factory $i$ during batch $j$ .
$μ_{i j}$ is the mean number of defects corresponding to factory $i$ (where $i = 1, 2, . . ., 20$ ) during batch $j$ (where $j = 1, 2, . . ., 5$ ).
${newprocess}_{i j}$ , ${time_dev}_{i j}$ , and ${temp_dev}_{i j}$ are the measurements for each variable that correspond to factory $i$ during batch $j$ . For example, ${newprocess}_{i j}$ indicates whether the batch produced by factory $i$ during batch $j$ used the new process.
${supplier_C}_{i j}$ and ${supplier_B}_{i j}$ are dummy variables that use effects (sum-to-zero) coding to indicate whether company C or B, respectively, supplied the process chemicals for the batch produced by factory $i$ during batch $j$ .
$b_{i} \sim N (0, σ_{b}^{2})$ is a random-effects intercept for each factory $i$ that accounts for factory-specific variation in quality.

glme = fitglme(mfr,'defects ~ 1 + newprocess + time_dev + temp_dev + supplier + (1|factory)','Distribution','Poisson','Link','log','FitMethod','Laplace','DummyVarCoding','effects');

Compare the two models using a theoretical likelihood ratio test. Specify 'CheckNesting' as true, so compare returns a warning if the nesting requirements are not satisfied.

results = compare(FEglme,glme,'CheckNesting',true)

results = 
    Theoretical Likelihood Ratio Test

    Model     DF    AIC       BIC       LogLik     LRStat    deltaDF    pValue    
    FEglme    6     431.02    446.65    -209.51                                   
    glme      7     416.35    434.58    -201.17    16.672    1          4.4435e-05

Since compare did not return an error, the nesting requirements are satisfied. The small $p$ -value indicates that compare rejects the null hypothesis that the observed response vector is generated by the model FEglme, and instead accepts the alternate model glme. The smaller AIC and BIC values for glme also support the conclusion that glme provides a better fitting model for the response.

More About

expand all

Likelihood Ratio Test

A likelihood ratio test compares the specifications of two nested models by assessing the significance of restrictions to an extended model with unrestricted parameters. Under the null hypothesis H₀, the likelihood ratio test statistic has an approximate chi-squared reference distribution with degrees of freedom deltaDF.

When comparing two models, compare computes the p-value for the likelihood ratio test by comparing the observed likelihood ratio test statistic with this chi-squared reference distribution. A small p-value leads to a rejection of H₀ in favor of H₁, and acceptance of the alternate model altglme. On the other hand, a large p-value indicates that we cannot reject H₀, and reflects insufficient evidence to accept the model altglme.

The p-values obtained using the likelihood ratio test can be conservative when testing for the presence or absence of random-effects terms, and anti-conservative when testing for the presence or absence of fixed-effects terms. Instead, use the fixedEffects or coefTest methods to test for fixed effects.

To conduct a valid likelihood ratio test on GLME models, both models must be fitted using a Laplace or approximate Laplace fit method. Models fitted using a maximum pseudo likelihood (MPL) or restricted maximum pseudo likelihood (REMPL) method cannot be compared using a likelihood ratio test. When comparing models fitted using MPL, the maximized log likelihood of the pseudodata from the final pseudo likelihood iteration is used in the likelihood ratio test. If you compare models with non-normal distributions fitted using MPL, then compare gives a warning that the likelihood ratio test is using maximized log likelihood of pseudodata from the final pseudo likelihood iteration. To use the true maximized log likelihood in the likelihood ratio test, fit both glme and altglme using approximate Laplace or Laplace prior to model comparison.

Nesting Requirements

To conduct a valid likelihood ratio test, glme must be nested in altglme. The CheckNesting=true argument checks the following requirements, and returns an error if any are not satisfied:

You must fit both models (glme and altglme) using the "ApproximateLaplace" or "Laplace" fit method. You cannot compare GLME models fitted using "MPL" or "REMPL" using a likelihood ratio test.
You must fit both models using the same response vector, response distribution, and link function.
The smaller model (glme) must be nested within the larger model (altglme), such that you can obtain glme from altglme by setting some of the model parameters of altglme to fixed values such as 0.
The maximized log likelihood of the larger model (altglme) must be greater than or equal to the maximized log likelihood of the smaller model (glme).
The weight vectors used to fit glme and altglme must be identical.
The random-effects design matrix of the larger model (altglme) must contain the random-effects design matrix of the smaller model (glme).
The fixed-effects design matrix of the larger model (altglme) must contain the fixed-effects design matrix of the smaller model (glme).

Akaike and Bayesian Information Criteria

The Akaike information criterion (AIC) is AIC = –2logL_M + 2(param).

logL_M depends on the method used to fit the model.

If you use 'Laplace' or 'ApproximateLaplace', then logL_M is the maximized log likelihood.
If you use 'MPL', then logL_M is the maximized log likelihood of the pseudo data from the final pseudo likelihood iteration.
If you use 'REMPL', then logL_M is the maximized restricted log likelihood of the pseudo data from the final pseudo likelihood iteration.

param is the total number of parameters estimated in the model. For most GLME models, param is equal to nc + p + 1, where nc is the total number of parameters in the random-effects covariance, excluding the residual variance, and p is the number of fixed-effects coefficients. However, if the dispersion parameter is fixed at 1.0 for binomial or Poisson distributions, then param is equal to (nc + p).

The Bayesian information criterion (BIC) is BIC = –2*logL_M + ln(n_eff)(param).

logL_M depends on the method used to fit the model.

If you use 'Laplace' or 'ApproximateLaplace', then logL_M is the maximized log likelihood.
If you use 'MPL', then logL_M is the maximized log likelihood of the pseudo data from the final pseudo likelihood iteration.
If you use 'REMPL', then logL_M is the maximized restricted log likelihood of the pseudo data from the final pseudo likelihood iteration.

n_eff is the effective number of observations.

If you use 'MPL', 'Laplace', or 'ApproximateLaplace', then n_eff = n, where n is the number of observations.
If you use 'REMPL', then n_eff = n – p.

A lower value of deviance indicates a better fit. As the value of deviance decreases, both AIC and BIC tend to decrease. Both AIC and BIC also include penalty terms based on the number of parameters estimated, p. So, when the number of parameters increase, the values of AIC and BIC tend to increase as well. When comparing different models, the model with the lowest AIC or BIC value is considered as the best fitting model.

For models fitted using 'MPL' and 'REMPL', AIC and BIC are based on the log likelihood (or restricted log likelihood) of pseudo data from the final pseudo likelihood iteration. Therefore, a direct comparison of AIC and BIC values between models fitted using 'MPL' and 'REMPL' is not appropriate.

compare

Syntax

Description

Input Arguments

glme — Generalized linear mixed-effects model GeneralizedLinearMixedModel object

altglme — Alternative generalized linear mixed-effects model GeneralizedLinearMixedModel object

checknesting — Indicator to check nesting between two models true (default) | false

Output Arguments

results — Results of likelihood ratio test table

Examples

Compare Mixed-Effects Models

More About

Likelihood Ratio Test

Nesting Requirements

Akaike and Bayesian Information Criteria

See Also

`glme` — Generalized linear mixed-effects model
`GeneralizedLinearMixedModel` object

`altglme` — Alternative generalized linear mixed-effects model
`GeneralizedLinearMixedModel` object

`checknesting` — Indicator to check nesting between two models
`true` (default) | `false`

`results` — Results of likelihood ratio test
table