Main Content

groupfilter

Filter by group

Description

Table Data

example

G = groupfilter(T,groupvars,method) returns the rows of table or timetable T that satisfy the group-wise filtering condition specified in method. The filtering condition method is a function handle applied to each nongrouping variable. Groups are defined by rows in the variables in groupvars that have the same unique combination of values. For example, G = groupfilter(T,"Trial",@(x) numel(x) > 5) groups the data in T by Trial, and keeps the rows that belong to groups with more than five trials.

example

G = groupfilter(T,groupvars,groupbins,method) specifies to bin rows in groupvars according to binning scheme groupbins prior to grouping. For example, G = groupfilter(T,"SaleDate","year",@(x) numel(x) > 5) bins the data in SaleDate by year, and keeps the rows whose bin has more than five elements.

example

G = groupfilter(___,datavars) specifies the table variables to apply the filtering method to for any of the previous syntaxes. For example, G = groupfilter(T,"Trial",@(x) x == max(x),"Height") keeps the rows of T that correspond to the maximum height for each trial.

G = groupfilter(___,"IncludedEdge",LR) specifies the included bin edge as "left" or "right" to indicate which end of the bin interval is inclusive. You can use IncludeEdge with any previous syntax that specifies groupbins.

Array Data

B = groupfilter(A,groupvars,method) returns the rows of vector or matrix A that satisfy the group-wise filtering condition specified in method. The filtering condition method is a function handle applied to all column vectors. Groups are defined by rows in the column vectors in groupvars that have the same unique combination of values.

example

B = groupfilter(A,groupvars,groupbins,method) specifies to bin rows in groupvars according to binning scheme groupbins prior to grouping.

B = groupfilter(A,groupvars,groupbins,method,"IncludedEdge",LR) specifies whether to include the left or right edge in each bin when groupbins is specified.

example

[B,BG] = groupfilter(A,___) also returns the unique grouping vector combinations corresponding to the rows in B.

Examples

collapse all

Create a table containing two variables.

groupID = [1 1 1 2 2 3]';
sample = [3 1 2 9 8 5]';
T = table(groupID,sample)
T=6×2 table
    groupID    sample
    _______    ______

       1         3   
       1         1   
       1         2   
       2         9   
       2         8   
       3         5   

Group by ID number, and return rows corresponding to groups with more than two samples.

Gnumel = groupfilter(T,"groupID",@(x) numel(x) > 2)
Gnumel=3×2 table
    groupID    sample
    _______    ______

       1         3   
       1         1   
       1         2   

Return rows whose group samples are between 0 and 6.

Gvals = groupfilter(T,"groupID",@(x) min(x) > 0 && max(x) < 6)
Gvals=4×2 table
    groupID    sample
    _______    ______

       1         3   
       1         1   
       1         2   
       3         5   

Create a table containing two variables that represent a day number and temperature.

daynum = [1 1 1 1 2 2 2 2]';
temp = [67 65 71 55 61 79 58 78]';
T = table(daynum,temp)
T=8×2 table
    daynum    temp
    ______    ____

      1        67 
      1        65 
      1        71 
      1        55 
      2        61 
      2        79 
      2        58 
      2        78 

Group by day number, and return the largest two temperatures for each day.

G = groupfilter(T,"daynum",@(x) ismember(x,maxk(x,2)))
G=4×2 table
    daynum    temp
    ______    ____

      1        67 
      1        71 
      2        79 
      2        78 

Create a table of dates and corresponding profits.

timeStamps = datetime([2017 3 4; 2017 3 2; 2017 3 15; 2017 4 10; ...
                       2017 4 14; 2017 4 30; 2017 5 25; ...
                       2017 5 29; 2017 5 21]);
profit = [2032 3071 1185 2587 1998 2899 3112 909 2619]';
T = table(timeStamps,profit)
T=9×2 table
    timeStamps     profit
    ___________    ______

    04-Mar-2017     2032 
    02-Mar-2017     3071 
    15-Mar-2017     1185 
    10-Apr-2017     2587 
    14-Apr-2017     1998 
    30-Apr-2017     2899 
    25-May-2017     3112 
    29-May-2017      909 
    21-May-2017     2619 

Group the dates by month, and return rows that correspond to the maximum profit for that month.

Gmax = groupfilter(T,"timeStamps","month",@(x) x == max(x))
Gmax=3×3 table
    timeStamps     profit    month_timeStamps
    ___________    ______    ________________

    02-Mar-2017     3071         Mar-2017    
    30-Apr-2017     2899         Apr-2017    
    25-May-2017     3112         May-2017    

Return rows whose month had an average profit greater than $2300.

Gavg = groupfilter(T,"timeStamps","month",@(x) mean(x) > 2300)
Gavg=3×3 table
    timeStamps     profit    month_timeStamps
    ___________    ______    ________________

    10-Apr-2017     2587         Apr-2017    
    14-Apr-2017     1998         Apr-2017    
    30-Apr-2017     2899         Apr-2017    

Create a table T that contains information about nine individuals.

groupID = [1 2 3 1 2 3 1 2 3]';
Height = [62 61 59 66 70 72 57 67 71]';
HealthStatus = categorical(["Poor"; "Good"; "Fair"; "Poor"; "Fair"; "Excellent"; "Poor"; "Excellent"; "Fair"]);
T = table(groupID,Height,HealthStatus)
T=9×3 table
    groupID    Height    HealthStatus
    _______    ______    ____________

       1         62       Poor       
       2         61       Good       
       3         59       Fair       
       1         66       Poor       
       2         70       Fair       
       3         72       Excellent  
       1         57       Poor       
       2         67       Excellent  
       3         71       Fair       

Group by ID number, and return rows for groups that contain only members with a minimum height of 60.

G1 = groupfilter(T,"groupID",@(x) min(x) >= 60,"Height")
G1=3×3 table
    groupID    Height    HealthStatus
    _______    ______    ____________

       2         61       Good       
       2         70       Fair       
       2         67       Excellent  

Group by ID number, and return rows for groups that contain only members whose health status is Poor.

G2 = groupfilter(T,"groupID",@(x) all(x == "Poor"),"HealthStatus")
G2=3×3 table
    groupID    Height    HealthStatus
    _______    ______    ____________

       1         62          Poor    
       1         66          Poor    
       1         57          Poor    

Create a vector of dates and a vector of corresponding profit values.

timeStamps = datetime([2017 3 4; 2017 3 2; 2017 3 15; 2017 3 10; ...
                       2017 3 14; 2017 3 31; 2017 3 25; ...
                       2017 3 29; 2017 3 21; 2017 3 18]);
profit = [2032 3071 1185 2587 1998 2899 3112 909 2619 3085]';

Group by day of the week, and compute the maximum profit for each group. Display the maximum profits and their corresponding groups.

[maxDailyProfit,dayOfWeek] = groupfilter(profit,timeStamps, ...
    "dayname",@(x) x == max(x))
maxDailyProfit = 5×1

        3071
        1185
        2899
        3112
        2619

dayOfWeek = 5x1 categorical
     Thursday 
     Wednesday 
     Friday 
     Saturday 
     Tuesday 

Input Arguments

collapse all

Input table, specified as a table or timetable.

Input array, specified as a column vector or a group of column vectors stored as a matrix.

Grouping variables or vectors, specified as one of these options:

  • For array input data, groupvars can be either a column vector with the same number of rows as A or a group of column vectors arranged in a matrix or a cell array.

  • For table or timetable input data, groupvars indicates which variables to use to compute groups in the data. You can specify the grouping variables with any of the options in this table.

    OptionDescriptionExamples
    Variable name

    A character vector or string scalar specifying a single table variable name

    'Var1'

    "Var1"

    Vector of variable names

    A cell array of character vectors or string array, where each element is a table variable name

    {'Var1' 'Var2'}

    ["Var1" "Var2"]

    Scalar or vector of variable indices

    A scalar or vector of table variable indices

    1

    [1 3 5]

    Logical vector

    A logical vector whose elements each correspond to a table variable, where true includes the corresponding variable and false excludes it

    [true false true]

    Function handle

    A function handle that takes a table variable as input and returns a logical scalar

    @isnumeric

    vartype subscript

    A table subscript generated by the vartype function

    vartype("numeric")

Example: groupfilter(T,"Var3",method)

Filtering method, specified as a function handle.

method defines the function used to filter out members from each group. The function must return a logical scalar or a logical column vector with the same number of rows as the input data indicating which group members to select.

  • If the function returns a logical scalar, then either all members of the group are filtered out (when the value is false) or none are filtered out (when the value is true).

  • If the function returns a logical vector, then members of groups are filtered out when the corresponding element is false, and members are kept when the corresponding element is true.

To define the function handle, use a syntax of the form @(inputargs) mymethod, where mymethod depends on inputargs.

  • A function can filter for rows corresponding to groups that meet a condition. For example, @(x) mean(x) > 10 passes to the output only rows corresponding to groups with a group mean greater than 10.

  • A function can filter for rows that meet a condition within their corresponding group. For example, @(x) x == max(x) passes to the output only rows corresponding to the maximum value of rows within their group.

For more information, see Create Function Handle and Anonymous Functions.

When groupfilter applies the method to more than one nongrouping variable at a time, the method returns a logical scalar or vector for each variable. For each row, the corresponding values in all returned scalars or vectors must be true to pass the row to the output.

Binning scheme, specified as one of these options:

  • "none", indicating no binning

  • A list of bin edges, specified as a numeric vector, or a datetime vector for datetime grouping variables or vectors

  • A number of bins, specified as a positive integer scalar

  • A time duration, specified as a scalar of type duration or calendarDuration, indicating bin widths (for datetime or duration grouping variables or vectors only)

  • A cell array listing binning methods for each grouping variable or vector

  • A time bin for datetime and duration grouping variables or vectors only, specified as one of these strings.

    ValueDescriptionData Type
    "second"

    Each bin is 1 second.

    datetime and duration
    "minute"

    Each bin is 1 minute.

    datetime and duration
    "hour"

    Each bin is 1 hour.

    datetime and duration
    "day"

    Each bin is 1 calendar day. This value accounts for daylight saving time shifts.

    datetime and duration
    "week"Each bin is 1 calendar week.datetime only
    "month"Each bin is 1 calendar month.datetime only
    "quarter"Each bin is 1 calendar quarter.datetime only
    "year"

    Each bin is 1 calendar year. This value accounts for leap days.

    datetime and duration
    "decade"Each bin is 1 decade (10 calendar years).datetime only
    "century"Each bin is 1 century (100 calendar years).datetime only
    "secondofminute"

    Bins are seconds from 0 to 59.

    datetime only
    "minuteofhour"

    Bins are minutes from 0 to 59.

    datetime only
    "hourofday"

    Bins are hours from 0 to 23.

    datetime only
    "dayofweek"

    Bins are days from 1 to 7. The first day of the week is Sunday.

    datetime only
    "dayname"Bins are full day names such as "Sunday".datetime only
    "dayofmonth"Bins are days from 1 to 31.datetime only
    "dayofyear"Bins are days from 1 to 366.datetime only
    "weekofmonth"Bins are weeks from 1 to 6.datetime only
    "weekofyear"Bins are weeks from 1 to 54.datetime only
    "monthname"Bins are full month names such as "January".datetime only
    "monthofyear"

    Bins are months from 1 to 12.

    datetime only
    "quarterofyear"Bins are quarters from 1 to 4.datetime only

When multiple grouping variables or vectors are specified, you can provide a single binning method that is applied to all grouping variables or vectors, or a cell array containing a binning method for each grouping variable or vector such as {"none",[0 2 4 Inf]}.

Table variables to operate on, specified as one of the options in this table. datavars indicates which variables of the input table or timetable to apply the filtering methods to. Other variables not specified by datavars pass through to the output without being operated on. groupfilter applies the filtering methods to the specified variables and uses the results to remove rows from all variables. When datavars is not specified, groupfilter operates on each nongrouping variable.

OptionDescriptionExamples
Variable name

A character vector or string scalar specifying a single table variable name

'Var1'

"Var1"

Vector of variable names

A cell array of character vectors or string array, where each element is a table variable name

{'Var1' 'Var2'}

["Var1" "Var2"]

Scalar or vector of variable indices

A scalar or vector of table variable indices

1

[1 3 5]

Logical vector

A logical vector whose elements each correspond to a table variable, where true includes the corresponding variable and false excludes it

[true false true]

Function handle

A function handle that takes a table variable as input and returns a logical scalar

@isnumeric

vartype subscript

A table subscript generated by the vartype function

vartype("numeric")

Example: groupfilter(T,groupvars,method,["Var1" "Var2" "Var4"])

Included bin edge, specified as either "left" or "right", indicating which end of the bin interval is inclusive.

This argument can be specified only when groupbins is specified, and the value applies to all binning schemes for all grouping variables or vectors.

Output Arguments

collapse all

Output table for table or timetable input data, returned as a table or timetable. G contains the rows in T that satisfy the group-wise filtering method.

Output array for array input data, returned as a vector or matrix. B contains the rows in A that satisfy the group-wise filtering method.

Grouping vectors for array input data, returned as a column vector or cell array of column vectors. BG contains the unique grouping vector or binned grouping vector combinations that correspond to the rows in B.

Extended Capabilities

Version History

Introduced in R2019b

expand all