LinearMixedModel
Linear mixed-effects model
Description
A LinearMixedModel
object represents a model of a response
variable with fixed and random effects. It comprises data, a model description, fitted
coefficients, covariance parameters, design matrices, residuals, residual plots, and
other diagnostic information for a linear mixed-effects model. You can predict model
responses with the predict
function and generate random data at new
design points using the random
function.
Creation
Create a LinearMixedModel
model using fitlme
or fitlmematrix
. You can fit a linear
mixed-effects model using fitlme(tbl,formula)
if your data is in a
table or dataset array. Alternatively, if your model is not easily described using a
formula, you can create matrices to define the fixed and random effects, and fit the
model using fitlmematrix(X,y,Z,G)
Properties
Coefficient Estimates
Coefficients
— Fixed-effects coefficient estimates
dataset array
Fixed-effects coefficient estimates and related statistics, stored as a dataset array containing the following fields.
Name | Name of the term. |
Estimate | Estimated value of the coefficient. |
SE | Standard error of the coefficient. |
tStat | t-statistics for testing the null hypothesis that the coefficient is equal to zero. |
DF | Degrees of freedom for the t-test.
Method to compute DF is specified by
the 'DFMethod' name-value pair
argument. Coefficients always uses
the 'Residual' method for
'DFMethod' . |
pValue | p-value for the t-test. |
Lower | Lower limit of the confidence interval for
coefficient. Coefficients always uses
the 95% confidence level, i.e.'alpha'
is 0.05. |
Upper | Upper limit of confidence interval for coefficient.
Coefficients always uses the 95%
confidence level, i.e.'alpha' is
0.05. |
You can change 'DFMethod'
and
'alpha'
while computing confidence intervals for
or testing hypotheses involving fixed- and random-effects, using the
coefCI
and coefTest
methods.
CoefficientCovariance
— Covariance of the estimated fixed-effects coefficients
p-by-p matrix
Covariance of the estimated fixed-effects coefficients of the linear mixed-effects model, stored as a p-by-p matrix, where p is the number of fixed-effects coefficients.
You can display the covariance parameters associated with the random
effects using the covarianceParameters
method.
Data Types: double
CoefficientNames
— Names of the fixed-effects coefficients
1-by-p cell array of character vectors
Names of the fixed-effects coefficients of a linear mixed-effects model, stored as a 1-by-p cell array of character vectors.
Data Types: cell
NumCoefficients
— Number of fixed-effects coefficients
positive integer value
Number of fixed-effects coefficients in the fitted linear mixed-effects model, stored as a positive integer value.
Data Types: double
NumEstimatedCoefficients
— Number of estimated fixed-effects coefficients
positive integer value
Number of estimated fixed-effects coefficients in the fitted linear mixed-effects model, stored as a positive integer value.
Data Types: double
Fitting Method
FitMethod
— Method used to fit the linear mixed-effects model
ML
| REML
Method used to fit the linear mixed-effects model, stored as either of the following.
ML
, if the fitting method is maximum likelihoodREML
, if the fitting method is restricted maximum likelihood
Data Types: char
Input Data
Formula
— Specification of the fixed- and random-effects terms, and grouping
variables
object
Specification of the fixed-effects terms, random-effects terms, and grouping variables that define the linear mixed-effects model, stored as an object.
For more information on how to specify the model to fit using a formula, see Formula.
NumObservations
— Number of observations
positive integer value
Number of observations used in the fit, stored as a positive integer
value. This is the number of rows in the table or dataset array, or the
design matrices minus the excluded rows or rows with
NaN
values.
Data Types: double
NumPredictors
— Number of predictors
positive integer value
Number of variables used as predictors in the linear mixed-effects model, stored as a positive integer value.
Data Types: double
NumVariables
— Total number of variables
positive integer value
Total number of variables including the response and predictors, stored as a positive integer value.
If the sample data is in a table or dataset array
tbl
,NumVariables
is the total number of variables intbl
including the response variable.If the fit is based on matrix input,
NumVariables
is the total number of columns in the predictor matrix or matrices, and response vector.
NumVariables
includes variables, if there are any,
that are not used as predictors or as the response.
Data Types: double
ObservationInfo
— Information about the observations
table
Information about the observations used in the fit, stored as a table.
ObservationInfo
has one row for each observation
and the following four columns.
Weights | The value of the weighted variable for that observation. Default value is 1. |
Excluded | true , if the observation was
excluded from the fit using the
'Exclude' name-value pair
argument, false , otherwise. 1
stands for true and 0 stands for
false . |
Missing |
Missing values include
|
Subset | true , if the observation was
used in the fit, false , if it was
not used because it is missing or excluded. |
Data Types: table
ObservationNames
— Names of observations
cell array of character vectors
Names of observations used in the fit, stored as a cell array of character vectors.
If the data is in a table or dataset array,
tbl
, containing observation names,ObservationNames
has those names.If the data is provided in matrices, or a table or dataset array without observation names, then
ObservationNames
is an empty cell array.
Data Types: cell
PredictorNames
— Names of predictors
cell array of character vectors
Names of the variables that you use as predictors in the fit, stored
as a cell array of character vectors that has the same length as
NumPredictors
.
Data Types: cell
ResponseName
— Names of response variable
character vector
Name of the variable used as the response variable in the fit, stored as a character vector.
Data Types: char
Variables
— Variables
table
Variables, stored as a table.
If the fit is based on a table or dataset array
tbl
, thenVariables
is identical totbl
.If the fit is based on matrix input, then
Variables
is a table containing all the variables in the predictor matrix or matrices, and response variable.
Data Types: table
VariableInfo
— Information about the variables
table
Information about the variables used in the fit, stored as a table.
VariableInfo
has one row for each variable and
contains the following four columns.
Class | Class of the variable ('double' ,
'cell' ,
'nominal' , and so on). |
Range | Value range of the variable.
|
InModel |
|
IsCategorical |
|
Data Types: table
VariableNames
— Names of the variables
cell array of character vectors
Names of the variables used in the fit, stored as a cell array of character vectors.
If sample data is in a table or dataset array
tbl
,VariableNames
contains the names of the variables intbl
.If sample data is in matrix format, then
VariableInfo
includes variable names you supply while fitting the model. If you do not supply the variable names, thenVariableInfo
contains the default names.
Data Types: cell
Summary Statistics
DFE
— Residual degrees of freedom
positive integer value
Residual degrees of freedom, stored as a positive integer value. DFE = n – p, where n is the number of observations, and p is the number of fixed-effects coefficients.
This corresponds to the 'Residual'
method of
calculating degrees of freedom in the fixedEffects
and randomEffects
methods.
Data Types: double
LogLikelihood
— Maximized log or restricted log likelihood
scalar value
Maximized log likelihood or maximized restricted log likelihood of the fitted linear mixed-effects model depending on the fitting method you choose, stored as a scalar value.
Data Types: double
ModelCriterion
— Model criterion
dataset array
Model criterion to compare fitted linear mixed-effects models, stored as a dataset array with the following columns.
AIC | Akaike Information Criterion |
BIC | Bayesian Information Criterion |
Loglikelihood | Log likelihood value of the model |
Deviance | –2 times the log likelihood of the model |
If n is the number of observations used in fitting the model, and p is the number of fixed-effects coefficients, then for calculating AIC and BIC,
The total number of parameters is nc + p + 1, where nc is the total number of parameters in the random-effects covariance excluding the residual variance
The effective number of observations is
n, when the fitting method is maximum likelihood (ML)
n – p, when the fitting method is restricted maximum likelihood (REML)
MSE
— ML or REML estimate
positive scalar value
ML or REML estimate, based on the fitting method used for estimating σ2, stored as a positive scalar value. σ2 is the residual variance or variance of the observation error term of the linear mixed-effects model.
Data Types: double
Rsquared
— Proportion of variability in the response explained by the fitted model
structure
Proportion of variability in the response explained by the fitted
model, stored as a structure. It is the multiple correlation coefficient
or R-squared. Rsquared
has two fields.
Ordinary | R-squared value, stored as a scalar value in a
structure. Rsquared.Ordinary = 1 –
SSE./SST |
Adjusted | R-squared value adjusted for the number of fixed-effects coefficients, stored as a scalar value in a structure.
where |
Data Types: struct
SSE
— Sum of squared errors
positive scalar
Sum of squared errors, specified as a positive scalar.
SSE
is equal to the squared conditional
residuals, that is
SSE = sum((y – F).^2)
,
where y
is the response vector and
F
is the fitted conditional response of the
linear mixed-effects model. The conditional model has contributions from
both fixed and random effects.
If the model was trained with observation weights, the sum of squares
in the SSE
calculation is the weighted sum of
squares.
Data Types: double
SSR
— Regression sum of squares
positive scalar
Regression sum of squares, specified as a positive scalar.
SSR
is the sum of squares explained by the
linear mixed-effects regression, and is equal to the sum of the squared
deviations between the fitted values and the mean of the
response.
SSR = sum((F – mean(y)).^2)
,
where F
is the fitted conditional response of the
linear mixed-effects model and y
is the response
vector. The conditional model has contributions from both fixed and
random effects.
If the model was trained with observation weights, the sum of squares
in the SSR
calculation is the weighted sum of
squares.
Data Types: double
SST
— Total sum of squares
positive scalar
Total sum of squares, specified as a positive scalar.
For a linear mixed-effects model with an intercept,
SST
is calculated as
SST = SSE + SSR
,
where SST
is the total sum of
squares, SSE
is the sum of squared errors, and
SSR
is the regression sum of squares.
For a linear mixed-effects model without an intercept,
SST
is calculated as the sum of the squared
deviations of the observed response values from their mean, that
is
SST = sum((y –
mean(y)).^2)
,
where y
is the response
vector.
If the model was trained with observation weights, the sum of squares
in the SST
calculation is the weighted sum of
squares.
Data Types: double
Object Functions
anova | Analysis of variance for linear mixed-effects model |
coefCI | Confidence intervals for coefficients of linear mixed-effects model |
coefTest | Hypothesis test on fixed and random effects of linear mixed-effects model |
compare | Compare linear mixed-effects models |
covarianceParameters | Extract covariance parameters of linear mixed-effects model |
designMatrix | Fixed- and random-effects design matrices |
fitted | Fitted responses from a linear mixed-effects model |
fixedEffects | Estimates of fixed effects and related statistics |
partialDependence | Compute partial dependence |
plotPartialDependence | Create partial dependence plot (PDP) and individual conditional expectation (ICE) plots |
plotResiduals | Plot residuals of linear mixed-effects model |
predict | Predict response of linear mixed-effects model |
random | Generate random responses from fitted linear mixed-effects model |
randomEffects | Estimates of random effects and related statistics |
residuals | Residuals of fitted linear mixed-effects model |
response | Response vector of the linear mixed-effects model |
Examples
Random Intercept Model with Categorical Predictor
Load the sample data.
load flu
The flu
dataset array has a Date
variable, and 10 variables containing estimated influenza rates (in 9 different regions, estimated from Google® searches, plus a nationwide estimate from the Center for Disease Control and Prevention, CDC).
To fit a linear-mixed effects model, your data must be in a properly formatted dataset array. To fit a linear mixed-effects model with the influenza rates as the responses and region as the predictor variable, combine the nine columns corresponding to the regions into an array. The new dataset array, flu2
, must have the response variable, FluRate
, the nominal variable, Region
, that shows which region each estimate is from, and the grouping variable Date
.
flu2 = stack(flu,2:10,'NewDataVarName','FluRate',... 'IndVarName','Region'); flu2.Date = nominal(flu2.Date);
Fit a linear mixed-effects model with fixed effects for region and a random intercept that varies by Date
.
Because region is a nominal variable, fitlme
takes the first region, NE
, as the reference and creates eight dummy variables representing the other eight regions. For example, is the dummy variable representing the region MidAtl
. For details, see Dummy Variables.
The corresponding model is
where is the observation for level of grouping variable Date
, , = 0, 1, ..., 8, are the fixed-effects coefficients, is the random effect for level of the grouping variable Date
, and is the observation error for observation . The random effect has the prior distribution, and the error term has the distribution, .
lme = fitlme(flu2,'FluRate ~ 1 + Region + (1|Date)')
lme = Linear mixed-effects model fit by ML Model information: Number of observations 468 Fixed effects coefficients 9 Random effects coefficients 52 Covariance parameters 2 Formula: FluRate ~ 1 + Region + (1 | Date) Model fit statistics: AIC BIC LogLikelihood Deviance 318.71 364.35 -148.36 296.71 Fixed effects coefficients (95% CIs): Name Estimate SE tStat DF pValue Lower Upper {'(Intercept)' } 1.2233 0.096678 12.654 459 1.085e-31 1.0334 1.4133 {'Region_MidAtl' } 0.010192 0.052221 0.19518 459 0.84534 -0.092429 0.11281 {'Region_ENCentral'} 0.051923 0.052221 0.9943 459 0.3206 -0.050698 0.15454 {'Region_WNCentral'} 0.23687 0.052221 4.5359 459 7.3324e-06 0.13424 0.33949 {'Region_SAtl' } 0.075481 0.052221 1.4454 459 0.14902 -0.02714 0.1781 {'Region_ESCentral'} 0.33917 0.052221 6.495 459 2.1623e-10 0.23655 0.44179 {'Region_WSCentral'} 0.069 0.052221 1.3213 459 0.18705 -0.033621 0.17162 {'Region_Mtn' } 0.046673 0.052221 0.89377 459 0.37191 -0.055948 0.14929 {'Region_Pac' } -0.16013 0.052221 -3.0665 459 0.0022936 -0.26276 -0.057514 Random effects covariance parameters (95% CIs): Group: Date (52 Levels) Name1 Name2 Type Estimate Lower Upper {'(Intercept)'} {'(Intercept)'} {'std'} 0.6443 0.5297 0.78368 Group: Error Name Estimate Lower Upper {'Res Std'} 0.26627 0.24878 0.285
The -values 7.3324e-06 and 2.1623e-10 respectively show that the fixed effects of the flu rates in regions WNCentral
and ESCentral
are significantly different relative to the flu rates in region NE
.
The confidence limits for the standard deviation of the random-effects term, , do not include 0 (0.5297, 0.78368), which indicates that the random-effects term is significant. You can also test the significance of the random-effects terms using the compare
method.
The estimated value of an observation is the sum of the fixed effects and the random-effect value at the grouping variable level corresponding to that observation. For example, the estimated best linear unbiased predictor (BLUP) of the flu rate for region WNCentral
in week 10/9/2005 is
This is the fitted conditional response, since it includes contribution to the estimate from both the fixed and random effects. You can compute this value as follows.
beta = fixedEffects(lme); [~,~,STATS] = randomEffects(lme); % Compute the random-effects statistics (STATS) STATS.Level = nominal(STATS.Level); y_hat = beta(1) + beta(4) + STATS.Estimate(STATS.Level=='10/9/2005')
y_hat = 1.2884
You can simply display the fitted value using the fitted
method.
F = fitted(lme); F(flu2.Date == '10/9/2005' & flu2.Region == 'WNCentral')
ans = 1.2884
Compute the fitted marginal response for region WNCentral
in week 10/9/2005.
F = fitted(lme,'Conditional',false); F(flu2.Date == '10/9/2005' & flu2.Region == 'WNCentral')
ans = 1.4602
Linear Mixed-Effects Model with a Random Slope
Load the sample data.
load carbig
Fit a linear mixed-effects model for miles per gallon (MPG), with fixed effects for acceleration, horsepower and the cylinders, and uncorrelated random-effect for intercept and acceleration grouped by the model year. This model corresponds to
with the random-effects terms having the following prior distributions:
where represents the model year.
First, prepare the design matrices for fitting the linear mixed-effects model.
X = [ones(406,1) Acceleration Horsepower]; Z = [ones(406,1) Acceleration]; Model_Year = nominal(Model_Year); G = Model_Year;
Now, fit the model using fitlmematrix
with the defined design matrices and grouping variables. Use the 'fminunc'
optimization algorithm.
lme = fitlmematrix(X,MPG,Z,G,'FixedEffectPredictors',.... {'Intercept','Acceleration','Horsepower'},'RandomEffectPredictors',... {{'Intercept','Acceleration'}},'RandomEffectGroups',{'Model_Year'},... 'FitMethod','REML')
lme = Linear mixed-effects model fit by REML Model information: Number of observations 392 Fixed effects coefficients 3 Random effects coefficients 26 Covariance parameters 4 Formula: y ~ Intercept + Acceleration + Horsepower + (Intercept + Acceleration | Model_Year) Model fit statistics: AIC BIC LogLikelihood Deviance 2202.9 2230.7 -1094.5 2188.9 Fixed effects coefficients (95% CIs): Name Estimate SE tStat DF pValue Lower Upper {'Intercept' } 50.064 2.3176 21.602 389 1.4185e-68 45.507 54.62 {'Acceleration'} -0.57897 0.13843 -4.1825 389 3.5654e-05 -0.85112 -0.30681 {'Horsepower' } -0.16958 0.0073242 -23.153 389 3.5289e-75 -0.18398 -0.15518 Random effects covariance parameters (95% CIs): Group: Model_Year (13 Levels) Name1 Name2 Type Estimate Lower Upper {'Intercept' } {'Intercept' } {'std' } 3.72 1.5215 9.0954 {'Acceleration'} {'Intercept' } {'corr'} -0.8769 -0.98274 -0.33846 {'Acceleration'} {'Acceleration'} {'std' } 0.3593 0.19418 0.66483 Group: Error Name Estimate Lower Upper {'Res Std'} 3.6913 3.4331 3.9688
The fixed effects coefficients display includes the estimate, standard errors (SE
), and the 95% confidence interval limits (Lower
and Upper
). The -values for (pValue
) indicate that all three fixed-effects coefficients are significant.
The confidence intervals for the standard deviations and the correlation between the random effects for intercept and acceleration do not include zeros, hence they seem significant. Use the compare
method to test for the random effects.
Display the covariance matrix of the estimated fixed-effects coefficients.
lme.CoefficientCovariance
ans = 3×3
5.3711 -0.2809 -0.0126
-0.2809 0.0192 0.0005
-0.0126 0.0005 0.0001
The diagonal elements show the variances of the fixed-effects coefficient estimates. For example, the variance of the estimate of the intercept is 5.3711. Note that the standard errors of the estimates are the square roots of the variances. For example, the standard error of the intercept is 2.3176, which is sqrt(5.3711)
.
The off-diagonal elements show the correlation between the fixed-effects coefficient estimates. For example, the correlation between the intercept and acceleration is –0.2809 and the correlation between acceleration and horsepower is 0.0005.
Display the coefficient of determination for the model.
lme.Rsquared
ans = struct with fields:
Ordinary: 0.7866
Adjusted: 0.7855
The adjusted value is the R-squared value adjusted for the number of predictors in the model.
More About
Formula
In general, a formula for model specification is a character vector or string
scalar of the form 'y ~ terms'
. For the linear mixed-effects models, this
formula is in the form 'y ~ fixed + (random1|grouping1) + ... +
(randomR|groupingR)'
, where fixed
and
random
contain the fixed-effects and the random-effects terms.
Suppose a table tbl
contains the following:
A response variable,
y
Predictor variables,
Xj
, which can be continuous or grouping variablesGrouping variables,
g1
,g2
, ...,gR
,
where the grouping variables in
Xj
and
gr
can be
categorical, logical, character arrays, string arrays, or cell arrays of character
vectors.
Then, in a formula of the form, 'y ~ fixed + (random1|g1)
+ ... + (randomR|gR)'
,
the term fixed
corresponds to a specification of
the fixed-effects design matrix X
, random
1 is
a specification of the random-effects design matrix Z
1 corresponding
to grouping variable g
1,
and similarly random
R is
a specification of the random-effects design matrix Z
R corresponding
to grouping variable g
R.
You can express the fixed
and random
terms
using Wilkinson notation.
Wilkinson notation describes the factors present in models. The notation relates to factors present in models, not to the multipliers (coefficients) of those factors.
Wilkinson Notation | Factors in Standard Notation |
---|---|
1 | Constant (intercept) term |
X^k , where k is a positive
integer | X , X2 ,
..., Xk |
X1 + X2 | X1 , X2 |
X1*X2 | X1 , X2 , X1.*X2
(elementwise multiplication of X1 and X2) |
X1:X2 | X1.*X2 only |
- X2 | Do not include X2 |
X1*X2 + X3 | X1 , X2 , X3 , X1*X2 |
X1 + X2 + X3 + X1:X2 | X1 , X2 , X3 , X1*X2 |
X1*X2*X3 - X1:X2:X3 | X1 , X2 , X3 , X1*X2 , X1*X3 , X2*X3 |
X1*(X2 + X3) | X1 , X2 , X3 , X1*X2 , X1*X3 |
Statistics and Machine Learning Toolbox™ notation always includes a constant term
unless you explicitly remove the term using -1
.
Here are some examples for linear mixed-effects model specification.
Examples:
Formula | Description |
---|---|
'y ~ X1 + X2' | Fixed effects for the intercept, X1 and X2 .
This is equivalent to 'y ~ 1 + X1 + X2' . |
'y ~ -1 + X1 + X2' | No intercept and fixed effects for X1 and X2 .
The implicit intercept term is suppressed by including -1 . |
'y ~ 1 + (1 | g1)' | Fixed effects for the intercept plus random effect for the
intercept for each level of the grouping variable g1 . |
'y ~ X1 + (1 | g1)' | Random intercept model with a fixed slope. |
'y ~ X1 + (X1 | g1)' | Random intercept and slope, with possible correlation between
them. This is equivalent to 'y ~ 1 + X1 + (1 + X1|g1)' . |
'y ~ X1 + (1 | g1) + (-1 + X1 | g1)' | Independent random effects terms for intercept and slope. |
'y ~ 1 + (1 | g1) + (1 | g2) + (1 | g1:g2)' | Random intercept model with independent main effects for g1 and g2 ,
plus an independent interaction effect. |
Version History
Introduced in R2013b
See Also
MATLAB Command
You clicked a link that corresponds to this MATLAB command:
Run the command by entering it in the MATLAB Command Window. Web browsers do not support MATLAB commands.
Select a Web Site
Choose a web site to get translated content where available and see local events and offers. Based on your location, we recommend that you select: .
You can also select a web site from the following list:
How to Get Best Site Performance
Select the China site (in Chinese or English) for best site performance. Other MathWorks country sites are not optimized for visits from your location.
Americas
- América Latina (Español)
- Canada (English)
- United States (English)
Europe
- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)
- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom (English)