Documentation

# compare

Class: LinearMixedModel

Compare linear mixed-effects models

## Syntax

``results = compare(lme,altlme)``
``results = compare(___,Name,Value)``
``````[results,siminfo] = compare(lme,altlme,'NSim',nsim)``````
``````[results,siminfo] = compare(___,Name,Value)``````

## Description

````results = compare(lme,altlme)` returns the results of a likelihood ratio test that compares the linear mixed-effects models `lme` and `altlme`. Both models must use the same response vector in the fit and `lme` must be nested in `altlme` for a valid theoretical likelihood ratio test. Always input the smaller model first, and the larger model second.`compare` tests the following null and alternate hypotheses:H0: Observed response vector is generated by `lme`.H1: Observed response vector is generated by model `altlme`.It is recommended that you fit `lme` and `altlme` using the maximum likelihood (ML) method prior to model comparison. If you use the restricted maximum likelihood (REML) method, then both models must have the same fixed-effects design matrix.To test for fixed effects, use `compare` with the simulated likelihood ratio test when `lme` and `altlme` are fit using ML or use the `fixedEffects`, `anova`, or `coefTest` methods.```

example

````results = compare(___,Name,Value)` also returns the results of a likelihood ratio test that compares linear mixed-effects models `lme` and `altlme` with additional options specified by one or more `Name,Value` pair arguments. For example, you can check if the first input model is nested in the second input model.```

example

``````[results,siminfo] = compare(lme,altlme,'NSim',nsim)``` returns the results of a simulated likelihood ratio test that compares linear mixed-effects models `lme` and `altlme`.You can fit `lme` and `altlme` using ML or REML. Also, `lme` does not have to be nested in `altlme`. If you use the restricted maximum likelihood (REML) method to fit the models, then both models must have the same fixed-effects design matrix.```

example

``````[results,siminfo] = compare(___,Name,Value)``` also returns the results of a simulated likelihood ratio test that compares linear mixed-effects models `lme` and `altlme` with additional options specified by one or more `Name,Value` pair arguments.For example, you can change the options for performing the simulated likelihood ratio test, or change the confidence level of the confidence interval for the p-value.```

## Input Arguments

expand all

Linear mixed-effects model, specified as a `LinearMixedModel` object constructed using `fitlme` or `fitlmematrix`.

Alternative linear mixed-effects model fit to the same response vector but with different model specifications, specified as a `LinearMixedModel` object. `lme` must be nested in `altlme`, that is, `lme` should be obtained from `altlme` by setting some parameters to fixed values, such as 0. You can create a linear mixed-effects object using `fitlme` or `fitlmematrix`.

Number of replications for simulations in the simulated likelihood ratio test, specified as a positive integer number. You must specify `nsim` to do a simulated likelihood ratio test.

Example: `'NSim',1000`

Data Types: `double` | `single`

### Name-Value Pair Arguments

Specify optional comma-separated pairs of `Name,Value` arguments. `Name` is the argument name and `Value` is the corresponding value. `Name` must appear inside quotes. You can specify several name and value pair arguments in any order as `Name1,Value1,...,NameN,ValueN`.

Significance level, specified as the comma-separated pair consisting of `'Alpha'` and a scalar value in the range 0 to 1. For a value α, the confidence level is 100*(1–α)%.

For example, for 99% confidence intervals, you can specify the confidence level as follows.

Example: `'Alpha',0.01`

Data Types: `single` | `double`

Options for performing the simulated likelihood ratio test in parallel, specified as the comma-separated pair consisting of `'Options'`, and a structure created by `statset('LinearMixedModel')`.

These options require Parallel Computing Toolbox™.

`compare` uses the following fields.

 `'UseParallel'` `False` for serial computation. Default.`True` for parallel computation. You need Parallel Computing Toolbox for parallel computation. `'UseSubstreams'` `False` for not using a separate substream of the random number generator for each iteration. Default.`True` for using a separate substream of the random number generator for each iteration. You can only use this option with random stream types that support substreams. `'Streams'` If `'UseSubstreams'` is `True`, then `'Streams'` must be a single random number stream, or a scalar cell array containing a single stream.If `'UseSubstreams'` is `False` and `'UseParallel'` is `False`, then `'Streams'` must be a single random number stream, or a scalar cell array containing a single stream.`'UseParallel'` is `True`, then `'Streams'` must be equal to the number of processors used. If a parallel pool is open, then the `'Streams'` is the same length as the size of the parallel pool. If `'UseParallel'` is `True`, a parallel pool might open up for you. But since `'Streams'` must be equal to the number of processors used, it is best to open a pool explicitly using the `parpool` command, before calling `compare` with the`'UseParallel','True'` option.

For information on parallel statistical computing at the command line, enter

`help parallelstats`

Data Types: `struct`

Indicator to check nesting between two models, specified as the comma-separated pair consisting of `'CheckNesting'` and one of the following.

 `false` Default. No checks. `true` `compare` checks if the smaller model `lme` is nested in the bigger model `altlme`.

`lme` must be nested in the alternate model `altlme` for a valid theoretical likelihood ratio test. `compare` returns an error message if the nesting requirements are not satisfied.

Although valid for both tests, the nesting requirements are weaker for the simulated likelihood ratio test.

Example: `'CheckNesting',true`

Data Types: `single` | `double`

## Output Arguments

expand all

Results of the likelihood ratio test or simulated likelihood ratio test, returned as a dataset array with two rows. The first row is for `lme`, and the second row is for `altlme`. The columns of `results` depend on whether the test is a likelihood ratio or a simulated likelihood ratio test.

• If you use the likelihood ratio test, then `results` contains the following columns.

 `Model` Name of the model `DF` Degrees of freedom, that is, the number of free parameters in the model `AIC` Akaike information criterion for the model `BIC` Bayesian information criterion for the model `LogLik` Maximized log likelihood for the model `LRStat` Likelihood ratio test statistic for comparing `altlme` versus `lme` `deltaDF` `DF` for `altlme` minus `DF` for `lme` `pValue` p-value for the likelihood ratio test
• If you use the simulated likelihood ratio test, then `results` contains the following columns.

 `Model` Name of the model `DF` Degrees of freedom, that is, the number of free parameters in the model `LogLik` Maximized log likelihood for the model `LRStat` Likelihood ratio test statistic for comparing `altlme` versus `lme` `pValue` p-value for the likelihood ratio test `Lower` Lower limit of the confidence interval for `pValue` `Upper` Upper limit of the confidence interval for `pValue`

Simulation output, returned as a structure with the following fields.

 `nsim` Value set for `nsim`. `alpha` Value set for `'Alpha'`. `pValueSim` Simulation-based p-value. `pValueSimCI` Confidence interval for `pValueSim`. The first element of the vector is the lower limit and the second element of the vector contains the upper limit. `deltaDF` The number of free parameters in `altlme` minus the number of free parameters in `lme`. `DF` for `altlme` minus `DF` for `lme`. `THO` A vector of simulated likelihood ratio test statistics under the null hypothesis that the model `lme` generated the observed response vector `y`.

## Examples

expand all

`load flu`

The `flu` dataset array has a `Date` variable, and 10 variables containing estimated influenza rates (in 9 different regions, estimated from Google® searches, plus a nationwide estimate from the CDC).

To fit a linear-mixed effects model, your data must be in a properly formatted dataset array. To fit a linear mixed-effects model with the influenza rates as the responses and region as the predictor variable, combine the nine columns corresponding to the regions into an array. The new dataset array, `flu2`, must have the response variable, `FluRate`, the nominal variable, `Region`, that shows which region each estimate is from, and the grouping variable `Date`.

```flu2 = stack(flu,2:10,'NewDataVarName','FluRate',... 'IndVarName','Region'); flu2.Date = nominal(flu2.Date);```

Fit a linear mixed-effects model, with a varying intercept and varying slope for each region, grouped by `Date`.

`altlme = fitlme(flu2,'FluRate ~ 1 + Region + (1 + Region|Date)');`

Fit a linear mixed-effects model with fixed effects for the region and a random intercept that varies by `Date`.

`lme = fitlme(flu2,'FluRate ~ 1 + Region + (1|Date)');`

Compare the two models. Also check if `lme2` is nested in `lme`.

`compare(lme,altlme,'CheckNesting',true)`
```ans = Theoretical Likelihood Ratio Test Model DF AIC BIC LogLik LRStat deltaDF pValue lme 11 318.71 364.35 -148.36 altlme 55 -305.51 -77.346 207.76 712.22 44 0 ```

The small $p$-value of 0 indicates that model `altlme` is significantly better than the simpler model `lme`.

`load(fullfile(matlabroot,'examples','stats','fertilizer.mat'));`

The dataset array includes data from a split-plot experiment, where soil is divided into three blocks based on the soil type: sandy, silty, and loamy. Each block is divided into five plots, where five different types of tomato plants (cherry, heirloom, grape, vine, and plum) are randomly assigned to these plots. The tomato plants in the plots are then divided into subplots, where each subplot is treated by one of four fertilizers. This is simulated data.

Store the data in a dataset array called `ds`, for practical purposes, and define `Tomato`, `Soil`, and `Fertilizer` as categorical variables.

```ds = fertilizer; ds.Tomato = nominal(ds.Tomato); ds.Soil = nominal(ds.Soil); ds.Fertilizer = nominal(ds.Fertilizer);```

Fit a linear mixed-effects model, where `Fertilizer` and `Tomato` are the fixed-effects variables, and the mean yield varies by the block (soil type) and the plots within blocks (tomato types within soil types) independently.

`lmeBig = fitlme(ds,'Yield ~ Fertilizer * Tomato + (1|Soil) + (1|Soil:Tomato)');`

Refit the model after removing the interaction term `Tomato:Fertilizer` and the random-effects term `(1 | Soil)`.

`lmeSmall = fitlme(ds,'Yield ~ Fertilizer + Tomato + (1|Soil:Tomato)');`

Compare the two models using the simulated likelihood ratio test with 1000 replications. You must use this test to test for both fixed- and random-effect terms. Note that both models are fit using the default fitting method, ML. That’s why, there is no restriction on the fixed-effects design matrices. If you use restricted maximum likelihood (REML) method, both models must have identical fixed-effects design matrices.

`[table,siminfo] = compare(lmeSmall,lmeBig,'nsim',1000)`
```table = Simulated Likelihood Ratio Test: Nsim = 1000, Alpha = 0.05 Model DF AIC BIC LogLik LRStat pValue lmeSmall 10 511.06 532 -245.53 lmeBig 23 522.57 570.74 -238.29 14.491 0.57343 Lower Upper 0.54211 0.60431 ```
```siminfo = struct with fields: nsim: 1000 alpha: 0.0500 pvalueSim: 0.5734 pvalueSimCI: [0.5421 0.6043] deltaDF: 13 TH0: [1000x1 double] ```

The high $p$-value suggests that the larger model, `lme` is not significantly better than the smaller model, `lme2`. The smaller values of Akaike and Bayesian Information Criteria for `lme2` also support this.

```load carbig ```

Fit a linear mixed-effects model for miles per gallon (MPG), with fixed effects for acceleration, horsepower, and the cylinders, and potentially correlated random effects for intercept and acceleration grouped by model year.

First, prepare the design matrices.

```X = [ones(406,1) Acceleration Horsepower]; Z = [ones(406,1) Acceleration]; Model_Year = nominal(Model_Year); G = Model_Year; ```

Now, fit the model using `fitlmematrix` with the defined design matrices and grouping variables.

```lme = fitlmematrix(X,MPG,Z,G,'FixedEffectPredictors',.... {'Intercept','Acceleration','Horsepower'},'RandomEffectPredictors',... {{'Intercept','Acceleration'}},'RandomEffectGroups',{'Model_Year'}); ```

Refit the model with uncorrelated random effects for intercept and acceleration. First prepare the random effects design and the random effects grouping variables.

```Z = {ones(406,1),Acceleration}; G = {Model_Year,Model_Year}; lme2 = fitlmematrix(X,MPG,Z,G,'FixedEffectPredictors',.... {'Intercept','Acceleration','Horsepower'},'RandomEffectPredictors',... {{'Intercept'},{'Acceleration'}},'RandomEffectGroups',... {'Model_Year','Model_Year'}); ```

Compare `lme` and `lme2` using the simulated likelihood ratio test.

```compare(lme2,lme,'CheckNesting',true,'NSim',1000) ```
```ans = SIMULATED LIKELIHOOD RATIO TEST: NSIM = 1000, ALPHA = 0.05 Model DF AIC BIC LogLik LRStat pValue Lower lme2 6 2194.5 2218.3 -1091.3 lme 7 2193.5 2221.3 -1089.7 3.0323 0.094905 0.077462 Upper 0.11477 ```

The high -value indicates that `lme2` is not a significantly better fit than `lme`.

`load(fullfile(matlabroot,'examples','stats','fertilizer.mat'));`

The dataset array includes data from a split-plot experiment, where soil is divided into three blocks based on the soil type: sandy, silty, and loamy. Each block is divided into five plots, where five different types of tomato plants (cherry, heirloom, grape, vine, and plum) are randomly assigned to these plots. The tomato plants in the plots are then divided into subplots, where each subplot is treated by one of four fertilizers. This is simulated data.

Store the data in a dataset array called `ds`, for practical purposes, and define `Tomato`, `Soil`, and `Fertilizer` as categorical variables.

```ds = fertilizer; ds.Tomato = nominal(ds.Tomato); ds.Soil = nominal(ds.Soil); ds.Fertilizer = nominal(ds.Fertilizer);```

Fit a linear mixed-effects model, where `Fertilizer` and `Tomato` are the fixed-effects variables, and the mean yield varies by the block (soil type), and the plots within blocks (tomato types within soil types) independently.

`lme = fitlme(ds,'Yield ~ Fertilizer * Tomato + (1|Soil) + (1|Soil:Tomato)');`

Refit the model after removing the interaction term `Tomato:Fertilizer` and the random-effects term `(1|Soil)`.

`lme2 = fitlme(ds,'Yield ~ Fertilizer + Tomato + (1|Soil:Tomato)');`

Create the options structure for `LinearMixedModel`.

`opt = statset('LinearMixedModel')`
```opt = struct with fields: Display: 'off' MaxFunEvals: [] MaxIter: 10000 TolBnd: [] TolFun: 1.0000e-06 TolTypeFun: [] TolX: 1.0000e-12 TolTypeX: [] GradObj: [] Jacobian: [] DerivStep: [] FunValCheck: [] Robust: [] RobustWgtFun: [] WgtFun: [] Tune: [] UseParallel: [] UseSubstreams: [] Streams: {} OutputFcn: [] ```

Change the options for parallel testing.

`opt.UseParallel = true;`

Start a parallel environment.

`mypool = parpool();`
```Starting parallel pool (parpool) using the 'local' profile ... Connected to the parallel pool (number of workers: 4). ```

Compare `lme2` and `lme` using the simulated likelihood ratio test with 1000 replications and parallel computing.

`compare(lme2,lme,'nsim',1000,'Options',opt)`
```ans = Simulated Likelihood Ratio Test: Nsim = 1000, Alpha = 0.05 Model DF AIC BIC LogLik LRStat pValue Lower lme2 10 511.06 532 -245.53 lme 23 522.57 570.74 -238.29 14.491 0.52647 0.495 Upper 0.55779 ```

The high $p$-value suggests that the larger model, `lme` is not significantly better than the smaller model, `lme2`. The smaller values of `AIC` and `BIC` for `lme2` also support this.

expand all

## References

[1] Hox, J. Multilevel Analysis, Techniques and Applications. Lawrence Erlbaum Associates, Inc., 2002.

[2] Stram D. O. and J. W. Lee. “Variance components testing in the longitudinal mixed-effects model”. Biometrics, Vol. 50, 4, 1994, pp. 1171–1177.