# stats

Analysis of variance (ANOVA) table

Since R2022b

## Syntax

``s = stats(aov)``
``s = stats(aov,type)``
``s = stats(aov,Component=sstype)``
``[s,ems] = stats(___)``

## Description

````s = stats(aov)` returns a component ANOVA table for the `anova` object `aov`. The component ANOVA table contains statistics for the model terms, error, and total. For more information, see `s`.```

example

````s = stats(aov,type)` specifies whether to return a component or summary ANOVA table. The summary ANOVA table includes summary statistics for the linear and nonlinear model terms, regression, error, and total. For more information, see `s`.```
````s = stats(aov,Component=sstype)` specifies the sum of squares type used to create the component table.```

example

````[s,ems] = stats(___)` also returns a table of information about the expected mean squares `ems` for each term and the error. If you specify the `sstype` in the call to `stats`, then the software creates the `ems` table with the specified sum of squares type.```

## Examples

collapse all

`load popcorn.mat`

The columns of the 6-by-3 matrix `popcorn` contain popcorn yield observations in cups for the brands Gourmet, National, and Generic. The first three rows of `popcorn` correspond to popcorn that was popped using an air popper and the last three rows correspond to popcorn popped in oil.

Create string arrays of factor values for the brand and type of popper using the `repmat` function.

```brand = [repmat("Gourmet",6,1); repmat("National",6,1); repmat("Generic",6,1)]; popperType = repmat(["Air";"Air";"Air";"Oil";"Oil";"Oil"], [3, 1]); factors = {brand,popperType};```

Perform a two-way ANOVA to test the null hypothesis that the mean popcorn yield is not affected by the brand of popcorn and popper type.

`aov = anova(factors,popcorn(:),FactorNames=["Brand","PopperType"],ModelSpecification="interactions")`
```aov = 2-way anova, constrained (Type III) sums of squares. Y ~ 1 + Brand*PopperType SumOfSquares DF MeanSquares F pValue ____________ __ ___________ ____ __________ Brand 15.75 2 7.875 56.7 7.679e-07 PopperType 4.5 1 4.5 32.4 0.00010037 Brand:PopperType 0.083333 2 0.041667 0.3 0.74622 Error 1.6667 12 0.13889 Total 22 17 Properties, Methods ```

By default, `anova` displays a component ANOVA table.

Generate a summary ANOVA table.

`s = stats(aov,"summary")`
```s=5×5 table SumOfSquares DF MeanSquares F pValue ____________ __ ___________ _____ __________ Linear 20.25 3 6.75 48.6 5.4835e-07 NonLinear 0.083333 2 0.041667 0.3 0.74622 Regression 20.333 5 4.0667 29.28 2.5065e-06 Error 1.6667 12 0.13889 Total 22 17 1.2941 ```

The row `Linear` corresponds to the terms `Brand` and `PopperType` in the ANOVA model. The small p-value in the `Linear` row indicates that `Brand` and `PopperType` have a statistically significant combined effect on the popcorn yield. The row `NonLinear` corresponds to the term `Brand:PopperType`. The large p-value in the `NonLinear` row indicates that the interaction term does not have a statistically significant effect on the popcorn yield. The small p-value in the row `Regression` indicates that the ANOVA model is a better predictor of the response data than the mean of the data.

`load carsmall`

Data for the country of origin, model year, and mileage is stored in the variables `Origin`, `Model_Year`, and `MPG`, respectively.

Perform a two-way ANOVA to test the null hypothesis that mean mileage is not affected by the country of origin or model year.

`aov = anova({Origin, Model_Year},MPG,RandomFactors=[1 2],FactorNames=["Origin" "Year"])`
```aov = 2-way anova, constrained (Type III) sums of squares. Y ~ 1 + Origin + Year SumOfSquares DF MeanSquares F pValue ____________ __ ___________ ______ __________ Origin 1078.1 5 215.62 10.675 5.3303e-08 Year 2638.4 2 1319.2 65.312 5.5975e-18 Error 1737 86 20.198 Total 6005.3 93 Properties, Methods ```

Display an expected mean squares table for the ANOVA.

`[~,ems] = stats(aov)`
```ems=3×5 table Type ExpectedMeanSquares MeanSquaresDenominator DFDenominator FDenominator ________ __________________________ ______________________ _____________ ____________ Origin "random" "9.159*V(Origin)+V(Error)" 20.198 86 MS(Error) Year "random" "29.5014*V(Year)+V(Error)" 20.198 86 MS(Error) Error "random" "V(Error)" ```

The formulas for the expected mean squares of the random factors `Origin` and `Year` contain terms for their respective variance components. You can use the expected mean squares formulas to compare how much of the expected mean squares is due to the variance in the error and how much is due to the variance components of the random terms.

## Input Arguments

collapse all

Analysis of variance results, specified as an `anova` object. The properties of `aov` contain the factors and response data used by `stats` to compute the statistics in the ANOVA table.

Type of ANOVA table, specified as `"component"` or `"summary"`.

Example: `"summary"`

Data Types: `char` | `string`

Type of the sum of squares used to perform the ANOVA, specified as `"three"`, `"two"`, `"one"`, or `"hierarchical"`. The `stats` function ignores `sstype` unless the ANOVA type is `"component"`. For a model containing main effects but no interactions, the value of `sstype` influences the computations on the unbalanced data only.

The sum of squares of a term ($S{S}_{Term}$) is defined as the reduction in the sum of squares error (SSE) obtained by adding the term to a model that excludes it. The formula for the sum of squares of a term Term has the form

`$S{S}_{Term}=\underset{SS{E}_{{f}_{excl}}}{\underbrace{\sum _{i=1}^{n}{\left({y}_{i}-{f}_{excl}\left({g}_{1},...,{g}_{N}\right)\right)}^{2}}}-\underset{SS{E}_{{f}_{incl}}}{\underbrace{\sum _{i=1}^{n}{\left({y}_{i}-{f}_{incl}\left({g}_{1},...,{g}_{N}\right)\right)}^{2}}}$`

where n is the number of observations, ${y}_{i}$ are the response data, ${g}_{1},...,{g}_{N}$ are the factors used to perform the ANOVA, ${f}_{excl}$ is a model that excludes Term, and ${f}_{incl}$ is a model that includes Term. Both ${f}_{excl}$ and ${f}_{incl}$ are specified by `SumOfSquaresType`. The variables $SS{E}_{{f}_{excl}}$ and $SS{E}_{{f}_{incl}}$ are the sum of squares errors for ${f}_{excl}$ and ${f}_{incl}$, respectively. You can specify ${f}_{excl}$ and ${f}_{incl}$ using one of the options for `SumOfSquaresType` described in the following table.

OptionType of Sum of Squares
`"three"` (default)

${f}_{incl}$ is the full ANOVA model specified in the property `Formula`. ${f}_{excl}$ is a model composed of all terms in ${f}_{incl}$ except Term. The model ${f}_{excl}$ has the same sigma-restricted coding as ${f}_{incl}$. This type of sum of squares is known as Type III.

`"two"`

${f}_{excl}$ is a model composed of all terms in the ANOVA model specified in the property `Formula` that do not contain Term. If Term is a continuous term, then powers of Term are treated as separate terms that do not contain Term. ${f}_{incl}$ is a model composed of Term and all the terms in ${f}_{excl}$. This type of sum of squares is known as Type II.

`"one"`

${f}_{excl}$ is a model composed of all the terms that precede Term in the ANOVA model specified in the property `Formula`. ${f}_{incl}$ is a model composed of Term and all the terms in ${f}_{excl}$. This type of sum of squares is known as Type I.

`"hierarchical"`

${f}_{excl}$ and ${f}_{incl}$ are defined as in Type II, except powers of Term are treated as terms that contain Term.

Example: `Component="hierarchical"`

Data Types: `char` | `string`

## Output Arguments

collapse all

ANOVA statistics, returned as a table.

The contents of `s` depend on the ANOVA type specified in `type`.

• If `type` is `"component"`, then `s` contains ANOVA statistics for each variable in the model except the constant (intercept) term. The table includes these columns for each variable:

ColumnDescription
`SumOfSquares`

Sum of squares explained by the term and calculated depending on `sstype`.

`DF`

Degrees of freedom

• `DF` of a numeric variable is 1.

• `DF` of a categorical variable is the number of dummy variables created for the category (number of categories – 1).

• `DF` of an error term is the difference between the `DF` of the total and the sum of the `DF` for the model terms.

• `DF` of the total is `aov.NumObservations`–1.

`MeanSquares`

Mean squares, defined by `MeanSquares` = `SumOfSquares`/`DF`.

`MeanSquares` for the error term is the mean squared error (MSE).

`F`

F-statistic value to test the null hypothesis that the corresponding coefficient is zero; computed by `F` = `MeanSquares`/`MSE`.

When the null hypothesis is true, the F-statistic follows the F-distribution.

`pValue`

p-value of the F-statistic value

• If `type` is `"summary"`, then `s` contains summary statistics of grouped terms for each row. The summary statistics are calculated using Type I sum of squares. The table includes the same columns as `"component"` and these rows:

RowDescription
`Total`

Total statistics

• `SumOfSquares` — Total sum of squares, which is the sum of the squared deviations of the response around its mean

• `DF` — Sum of degrees of freedom of `Regression` and `Error`

`Regression`

Statistics for the model as a whole

• `SumOfSquares` — Model sum of squares, which is the sum of the squared deviations of the fitted value around the response mean.

• `F` and `pValue` — These values provide a test of whether the model as a whole fits significantly better than a degenerate model consisting of only a constant term.

`Linear`

Statistics for linear terms

• `SumOfSquares` — Sum of squares for linear terms, which is the difference between the model sum of squares and the sum of squares for nonlinear terms.

• `F` and `pValue` — These values provide a test of whether the model with only linear terms fits better than a degenerate model consisting of only a constant term. `stats` uses the mean squared error that is based on the full model to compute this F-value, so the F-value obtained by dropping the nonlinear terms and repeating the test is not the same as the value in this row.

`NonLinear`

Statistics for nonlinear terms

• `SumOfSquares` — Sum of squares for nonlinear (higher-order or interaction) terms, which is the increase in the residual sum of squares obtained by keeping only the linear terms and dropping all nonlinear terms.

• `F` and `pValue` — These values provide a test of whether the full model fits significantly better than a smaller model consisting of only the linear terms.

`Error`

Statistics for error

• `SumOfSquares` — Residual sum of squares, which is the sum of the squared residual values

• `MeanSquares` — Mean squared error, used to compute the F-statistic values for `Regression`, `Linear`, and `NonLinear`

If the data contains replications (multiple observations sharing the same factor values), `s` also contains rows for `LackOfFit` and `PureError`. `LackOfFit` and `PureError` break down `Error` further.

`LackOfFit`

Lack-of-fit statistics

• `SumOfSquares` — Sum of squares due to lack of fit, which is the difference between the residual sum of squares and the replication sum of squares.

• `F` and `pValue` — The F-statistic value is the ratio of lack-of-fit `MeanSquares` to pure error `MeanSquares`. The ratio provides a test of bias by measuring whether the variation of the residuals is larger than the variation of the replications. A low p-value implies that adding additional terms to the model can improve the fit.

`PureError`

Statistics for pure error

• `SumOfSquares` — Replication sum of squares, obtained by finding the sets of points with identical predictor values, computing the sum of squared deviations around the mean within each set, and pooling the computed values

• `MeanSquares` — Model-free pure error variance estimate of the response

Estimated mean squares information, returned as a table. The argument `ems` contains a row for each term, and a row for the error. The table returned by `ems` has the following variables.

• `Type` — An indicator of whether the term is fixed or random.

• `ExpectedMeanSquares` — A formula of the expected mean squares.

• `MeanSquaresDenominator` — The value of the denominator in the calculation of the F-statistic.

• `DFDenominator` — The value of the degrees of freedom in the calculation of the F-statistic denominator.

• `FDenominator` — A formula for the denominator in the calculation of the F-statistic. The denominator changes depending on whether `aov.Formula` has random interaction terms.

You can use the `ems` table to determine if the variance of a random term has a large effect on the estimated mean squares.

Data Types: `table`

## References

[1] Dunn, O. J., and V. A. Clark. Applied Statistics: Analysis of Variance and Regression. New York: Wiley, 1974.

[2] Goodnight, J. H., and F. M. Speed. Computing Expected Mean Squares. Cary, NC: SAS Institute, 1978.

[3] Seber, G. A. F., and A. J. Lee. Linear Regression Analysis. 2nd ed. Hoboken, NJ: Wiley-Interscience, 2003.

## Version History

Introduced in R2022b