Documentation

# groupsummary

Group summary computations

## Syntax

``G = groupsummary(T,groupvars)``
``G = groupsummary(T,groupvars,method)``
``G = groupsummary(T,groupvars,method,datavars)``
``G = groupsummary(T,groupvars,groupbins)``
``G = groupsummary(T,groupvars,groupbins,method)``
``G = groupsummary(T,groupvars,groupbins,method,datavars)``
``G = groupsummary(___,Name,Value)``
``B = groupsummary(A,groupvars,method)``
``B = groupsummary(A,groupvars,groupbins,method)``
``B = groupsummary(___,Name,Value)``
``[B,BG] = groupsummary(A,___)``
``[B,BG,BC] = groupsummary(A,___)``

## Description

example

````G = groupsummary(T,groupvars)` returns a table containing the computed groups and the number of elements in each group for data in a table or timetable `T`. A group contains the unique combinations of grouping variables in `groupvars`. For example, ```G = groupsummary(T,'Gender')``` returns the number of `Male` elements and the number of `Female` elements in the variable `Gender`.```

example

````G = groupsummary(T,groupvars,method)` also returns the computations specified in `method`. For example, `G = groupsummary(T,'Gender','median')` returns the median of all nongrouping variables in `T` for both genders, in addition to the number of elements in each group.```

example

````G = groupsummary(T,groupvars,method,datavars)` specifies the variables to apply the computations to.```
````G = groupsummary(T,groupvars,groupbins)` specifies how to bin the data in `groupvars`. For example, `G = groupsummary(T,'SaleDate','year')` gives the group counts for all sales in `T` within each year according to the grouping variable `SaleDate`.```
````G = groupsummary(T,groupvars,groupbins,method)` bins the data according to `groupbins` for the computations specified in `method`.```

example

````G = groupsummary(T,groupvars,groupbins,method,datavars)` bins the data according to `groupbins` and specifies the variables to apply the computations to.```

example

````G = groupsummary(___,Name,Value)` specifies additional grouping properties using one or more name-value pairs for any of the previous syntaxes. For example, ```G = groupsummary(T,'Category1','IncludeMissingGroups',false)``` excludes the group made from missing `categorical` data indicated by `<undefined>`.```
````B = groupsummary(A,groupvars,method)` returns the computations according to the unique combinations of grouping vectors in `groupvars` when `A` is a vector or matrix. `groupvars` can be a column vector, matrix, or cell array of column vectors.```

example

````B = groupsummary(A,groupvars,groupbins,method)` bins the data according to `groupbins`.```

example

````B = groupsummary(___,Name,Value)` specifies additional grouping properties using one or more name-value pairs for either of the previous array syntaxes.```
````[B,BG] = groupsummary(A,___)` also returns the groups for each grouping vector.```

example

````[B,BG,BC] = groupsummary(A,___)` also returns the group counts for each group.```

## Examples

collapse all

Compute summary statistics on table variables.

Create a table `T` that contains information about five individuals.

```Gender = ["male";"female";"female";"male";"male"]; Age = [38;43;38;40;49]; Height = [71;69;64;67;64]; Weight = [176;163;131;133;119]; T = table(Gender,Age,Height,Weight)```
```T=5×4 table Gender Age Height Weight ________ ___ ______ ______ "male" 38 71 176 "female" 43 69 163 "female" 38 64 131 "male" 40 67 133 "male" 49 64 119 ```

Compute the counts of males and females by specifying `'Gender'` as the grouping variable.

`G = groupsummary(T,'Gender')`
```G=2×2 table Gender GroupCount ________ __________ "female" 2 "male" 3 ```

Compute the mean age, height, and weight of females and males separately.

`G = groupsummary(T,'Gender','mean')`
```G=2×5 table Gender GroupCount mean_Age mean_Height mean_Weight ________ __________ ________ ___________ ___________ "female" 2 40.5 66.5 147 "male" 3 42.333 67.333 142.67 ```

Still grouping by gender, compute the median height only.

`G = groupsummary(T,'Gender','median','Height')`
```G=2×3 table Gender GroupCount median_Height ________ __________ _____________ "female" 2 66.5 "male" 3 67 ```

Group table data using two grouping variables.

Create a table `T` that contains information about five individuals.

```Gender = ["male";"female";"male";"female";"male"]; Smoker = logical([1;0;1;0;1]); Weight = [176;163;131;133;119]; T = table(Gender,Smoker,Weight)```
```T=5×3 table Gender Smoker Weight ________ ______ ______ "male" true 176 "female" false 163 "male" true 131 "female" false 133 "male" true 119 ```

Compute the mean weight, grouped by gender and smoking status. By default, two combinations of gender and smoking status are not represented in the output because they are empty groups.

`G = groupsummary(T,{'Gender','Smoker'},'mean','Weight')`
```G=2×4 table Gender Smoker GroupCount mean_Weight ________ ______ __________ ___________ "female" false 2 148 "male" true 3 142 ```

Set the `'IncludeEmptyGroups'` parameter value to `true` in order to see all group combinations, including the empty ones.

`G = groupsummary(T,{'Gender','Smoker'},'mean','Weight','IncludeEmptyGroups',true)`
```G=4×4 table Gender Smoker GroupCount mean_Weight ________ ______ __________ ___________ "female" false 2 148 "female" true 0 NaN "male" false 0 NaN "male" true 3 142 ```

Group data according to specified bins.

Create a timetable containing sales information for days within a single month.

```TimeStamps = datetime([2017 3 4; 2017 3 2; 2017 3 15; 2017 3 10;... 2017 3 14; 2017 3 31; 2017 3 25;... 2017 3 29; 2017 3 21; 2017 3 18]); Profit = [2032 3071 1185 2587 1998 2899 3112 909 2619 3085]'; TotalItemsSold = [14 13 8 5 10 16 8 6 7 11]'; TT = timetable(TimeStamps,Profit,TotalItemsSold)```
```TT=10×2 timetable TimeStamps Profit TotalItemsSold ___________ ______ ______________ 04-Mar-2017 2032 14 02-Mar-2017 3071 13 15-Mar-2017 1185 8 10-Mar-2017 2587 5 14-Mar-2017 1998 10 31-Mar-2017 2899 16 25-Mar-2017 3112 8 29-Mar-2017 909 6 21-Mar-2017 2619 7 18-Mar-2017 3085 11 ```

Compute the mean profit grouped by the total items sold, binning the groups into intervals of item numbers.

```format shorte G = groupsummary(TT,'TotalItemsSold',[0 4 8 12 16],'mean','Profit')```
```G=3×3 table disc_TotalItemsSold GroupCount mean_Profit ___________________ __________ ___________ [4, 8) 3.0000e+00 2.0383e+03 [8, 12) 4.0000e+00 2.3450e+03 [12, 16] 3.0000e+00 2.6673e+03 ```

Compute the mean profit grouped by day of the week.

`G = groupsummary(TT,'TimeStamps','dayname','mean','Profit')`
```G=5×3 table dayname_TimeStamps GroupCount mean_Profit __________________ __________ ___________ Tuesday 2.0000e+00 2.3085e+03 Wednesday 2.0000e+00 1.0470e+03 Thursday 1.0000e+00 3.0710e+03 Friday 2.0000e+00 2.7430e+03 Saturday 3.0000e+00 2.7430e+03 ```

Create a vector of dates and a vector of corresponding profit values.

```timeStamps = datetime([2017 3 4; 2017 3 2; 2017 3 15; 2017 3 10; ... 2017 3 14; 2017 3 31; 2017 3 25; ... 2017 3 29; 2017 3 21; 2017 3 18]); profit = [2032 3071 1185 2587 1998 2899 3112 909 2619 3085]';```

Compute the mean profit by day of the week. Display the means, the group names, and the number of members in each group.

```format shorte [meanDailyProfit,dayOfWeek,dailyCounts] = groupsummary(profit,timeStamps,'dayname','mean')```
```meanDailyProfit = 5×1 2.3085e+03 1.0470e+03 3.0710e+03 2.7430e+03 2.7430e+03 ```
```dayOfWeek = 5x1 categorical array Tuesday Wednesday Thursday Friday Saturday ```
```dailyCounts = 5×1 2 2 1 2 3 ```

Compute the mean weights for four groups based on their gender and smoker status.

Store patient information as three vectors of different types.

```Gender = ["male";"female";"male";"female";"male"]; Smoker = logical([1;0;1;0;1]); Weight = [176;163;131;133;119];```

Grouping by gender and smoker status, compute the mean weights. `B` contains the mean for each group (`NaN` for empty groups). `BG` is a cell array containing two vectors that describe the groups as you look at their elements rowwise. For instance, the first row of `BG{1}` says that the patients in the first group are female, and the first row of `BG{2}` says that they are nonsmokers. Finally, `BC` contains the number of members in each group for the corresponding groups in `BG`.

```[B,BG,BC] = groupsummary(Weight,{Gender,Smoker},'mean','IncludeEmptyGroups',true); B```
```B = 4×1 148 NaN NaN 142 ```
`BG{1}`
```ans = 4x1 string array "female" "female" "male" "male" ```
`BG{2}`
```ans = 4x1 logical array 0 1 0 1 ```
`BC`
```BC = 4×1 2 0 0 3 ```

## Input Arguments

collapse all

Input data, specified as a table or timetable.

Input array, specified as a vector or matrix.

Grouping variables or vectors, specified as a scalar, vector, matrix, cell array, or function handle.

For table or timetable input data, `groupvars` indicates which columns to use to compute the groups, and can be one of the following:

• A character vector or scalar string specifying a single table variable name

• A cell array of character vectors or a string array, where each element is a table variable name

• A vector of table variable indices

• A logical vector whose elements each correspond to a table variable, where `true` includes the corresponding variable and `false` excludes it

• A function handle that takes a table variable as input and returns a logical scalar

For array input, `groupvars` can be either a column vector with the same number of rows as `A` or a group of column vectors arranged in a matrix or cell array.

Example: `'Age'`

Example: `{'Height','Weight'}`

Computation method, specified as one of the following:

• `'sum'` — sum

• `'mean'` — mean

• `'median'` — median

• `'mode'` — mode

• `'var'` — variance

• `'std'` — standard deviation

• `'min'` — minimum

• `'max'` — maximum

• `'range'` — maximum minus minimum

• `'nummissing'` — number of missing elements

• `'nnz'` — number of nonzero and non-`NaN` elements

• `'all'` — all computations previously listed

You can also specify a function handle that returns one entity per group whose first dimension has length 1. To specify multiple computations at a time, list the options in a cell array, such as `{'mean','median'}`.

Data variables for table or timetable input, specified as a scalar, vector, cell array, or function handle. `datavars` indicates which variables of the input table to apply the methods to, and can be one of the following options:

• A character vector or scalar string specifying a single table variable name

• A cell array of character vectors or string array where each element is a table variable name

• A vector of table variable indices

• A logical vector whose elements each correspond to a table variable, where `true` includes the corresponding variable and `false` excludes it

• A function handle that takes a table variable as input and returns a logical scalar

When `datavars` is not specified, `groupsummary` applies the computations to each nongrouping variable.

Example: `'Profit'`

Example: `{'Income','Expenses'}`

Example: `@isnumeric`

Binning scheme, specified as one of the following options:

• `'none'`, indicating no binning

• A list of bin edges, specified as a numeric vector, or a `datetime` vector for `datetime` grouping variables or vectors

• A number of bins, specified as an integer scalar

• A time duration, specified as a scalar of type `duration` or `calendarDuration` indicating bin widths (for `datetime` or `duration` grouping variables or vectors only)

• A time bin for `datetime` and `duration` grouping variables or vectors only, specified as one of the following character vectors:

ValueDescriptionData Type
`'second'`

Each bin is 1 second.

`datetime` and `duration`
`'minute'`

Each bin is 1 minute.

`datetime` and `duration`
`'hour'`

Each bin is 1 hour.

`datetime` and `duration`
`'day'`

Each bin is 1 calendar day. This value accounts for Daylight Saving Time shifts.

`datetime` and `duration`
`'week'`Each bin is 1 calendar week.`datetime` only
`'month'`Each bin is 1 calendar month.`datetime` only
`'quarter'`Each bin is 1 calendar quarter.`datetime` only
`'year'`

Each bin is 1 calendar year. This value accounts for leap days.

`datetime` and `duration`
`'decade'`Each bin is 1 decade (10 calendar years).`datetime` only
`'century'`Each bin is 1 century (100 calendar years).`datetime` only
`'secondofminute'`

Bins are seconds from 0 to 59.

`datetime` only
`'minuteofhour'`

Bins are minutes from 0 to 59.

`datetime` only
`'hourofday'`

Bins are hours from 0 to 23.

`datetime` only
`'dayofweek'`

Bins are days from 1 to 7. The first day of the week is Sunday.

`datetime` only
`'dayname'`Bins are full day names such as `'Sunday'`.`datetime` only
`'dayofmonth'`Bins are days from 1 to 31.`datetime` only
`'dayofyear'`Bins are days from 1 to 366.`datetime` only
`'weekofmonth'`Bins are weeks from 1 to 6.`datetime` only
`'weekofyear'`Bins are weeks from 1 to 54.`datetime` only
`'monthname'`Bins are full month names such as `'January'`.`datetime` only
`'monthofyear'`

Bins are months from 1 to 12.

`datetime` only
`'quarterofyear'`Bins are quarters from 1 to 4.`datetime` only

• A cell array listing binning rules for each grouping variable or vector

When multiple grouping variables are specified, you can provide a single binning rule that is applied to all grouping variables, or a cell array containing a binning method for each grouping variable such as `{'none',[0 2 4 Inf]}`.

### Name-Value Pair Arguments

Specify optional comma-separated pairs of `Name,Value` arguments. `Name` is the argument name and `Value` is the corresponding value. `Name` must appear inside quotes. You can specify several name and value pair arguments in any order as `Name1,Value1,...,NameN,ValueN`.

Example: ```G = groupsummary(T,groupvars,groupbins,'IncludedEdge','right')```

Included bin edge, specified as either `'left'` or `'right'`, indicating which end of the bin interval is inclusive.

This name-value pair can only be specified when `groupbins` is specified, and the value is applied to all binning schemes for all grouping variables or vectors.

Missing groups indicator, specified as `true` or `false`. When the parameter value is `true`, `groupsummary` displays groups made up of missing values, such as `NaN`. When the parameter value is `false`, `groupsummary` does not display the missing groups.

Data Types: `logical`

Empty groups indicator, specified as `true` or `false`. When the parameter value is `false`, `groupsummary` does not display groups with zero elements. When the parameter value is `true`, `groupsummary` displays the empty groups.

Data Types: `logical`

## Output Arguments

collapse all

Output table, returned as a table containing the specified computations for each group.

Output array, returned as a vector or matrix containing the group computations. When you specify multiple methods, `groupsummary` horizontally concatenates the computations in the order that they were listed.

Groups for array input data, returned as a column vector or cell array of column vectors each corresponding to a grouping vector.

When you provide more than one grouping vector, `BG` is a cell array containing column vectors of equal length. The group information can be found by looking at the elements rowwise across all vectors in `BG`. Each group maps to the corresponding row of the output array `B`.

Group counts for array input data, returned as a column vector containing the number of elements in each group. The length of `BC` is the same as the length of the group column vectors returned in `BG`.

## Tips

• When making many calls to `groupsummary`, consider converting grouping variables to type `categorical` or `logical` when possible for improved performance. For example, if you have a grouping variable of type `char` (such as `Gender` with elements `'Male'` and `'Female'`), you can convert it to a categorical value using the command `categorical(Gender)`.