filloutliers

Detect and replace outliers in data

collapse all in page

Syntax

B = filloutliers(A,fillmethod)

B = filloutliers(A,fillmethod,findmethod)

B = filloutliers(A,fillmethod,"percentiles",threshold)

B = filloutliers(A,fillmethod,movmethod,window)

B = filloutliers(___,dim)

B = filloutliers(___,Name=Value)

[B,TF]= filloutliers(___)

[B,TF,L,U,C]
= filloutliers(___)

Description

B = filloutliers(A,fillmethod) finds outliers in A and replaces them according to fillmethod. For example, filloutliers(A,"previous") replaces outliers with the previous nonoutlier element.

If A is a matrix, then filloutliers operates on each column of A separately.
If A is a multidimensional array, then filloutliers operates along the first dimension of A whose size does not equal 1.
If A is a table or timetable, then filloutliers operates on each variable of A separately.

By default, an outlier is a value that is more than three scaled median absolute deviations (MAD) from the median.

You can use filloutliers functionality interactively by adding the Clean Outlier Data task to a live script.

example

B = filloutliers(A,fillmethod,findmethod) specifies a method for detecting outliers. For example, filloutliers(A,"previous","mean") defines an outlier as an element of A more than three standard deviations from the mean.

example

B = filloutliers(A,fillmethod,"percentiles",threshold) defines outliers as points outside of the percentiles specified in threshold. The threshold argument is a two-element row vector containing the lower and upper percentile thresholds, such as [10 90].

B = filloutliers(A,fillmethod,movmethod,window) detects local outliers using a moving window mean or median with window length window. For example, filloutliers(A,"previous","movmean",5) identifies outliers as elements more than three local standard deviations from the local mean within a five-element window.

example

B = filloutliers(___,dim) specifies the dimension of A to operate along for any of the previous syntaxes. For example, filloutliers(A,"linear",2) operates on each row of a matrix A.

example

B = filloutliers(___,Name=Value) specifies additional parameters for detecting and replacing outliers using one or more name-value arguments. For example, filloutliers(A,"previous",SamplePoints=t) detects outliers in A relative to the corresponding elements of a time vector t.

example

[B,TF]= filloutliers(___) also returns a logical array TF that indicates the position of the filled elements of B that were previously outliers.

example

[B,TF,L,U,C] = filloutliers(___) also returns the lower threshold L, upper threshold U, and center value C used by the outlier detection method.

example

Examples

collapse all

Interpolate Outliers in Vector

Open Live Script

Fill outliers in a vector of data using the "linear" method, and visualize the filled data.

Create a vector of data containing two outliers.

A = [57 59 60 100 59 58 57 58 300 61 62 60 62 58 57];

Replace the outliers using linear interpolation.

B = filloutliers(A,"linear");

Plot the original data and the data with the outliers filled.

plot(A)
hold on
plot(B,"o-")
legend("Original Data","Filled Data")

Figure contains an axes object. The axes object contains 2 objects of type line. These objects represent Original Data, Filled Data.

Use Mean Detection and Nearest Fill Methods

Open Live Script

Identify potential outliers in a table of data, fill any outliers using the "nearest" fill method, and visualize the cleaned data.

Create a timetable of data, and visualize the data to detect potential outliers.

T = hours(1:15);
V = [57 59 60 100 59 58 57 58 300 61 62 60 62 58 57];
A = timetable(T',V');
plot(A.Time,A.Var1)

Fill outliers in the data, where an outlier is defined as a point more than three standard deviations from the mean. Replace the outlier with the nearest element that is not an outlier.

B = filloutliers(A,"nearest","mean")

B=15×1 timetable
    Time     Var1
    _____    ____

    1 hr      57 
    2 hr      59 
    3 hr      60 
    4 hr     100 
    5 hr      59 
    6 hr      58 
    7 hr      57 
    8 hr      58 
    9 hr      61 
    10 hr     61 
    11 hr     62 
    12 hr     60 
    13 hr     62 
    14 hr     58 
    15 hr     57

In the same graph, plot the original data and the data with the outlier filled.

hold on
plot(B.Time,B.Var1,"o-")
legend("Original Data","Filled Data")

Figure contains an axes object. The axes object contains 2 objects of type line. These objects represent Original Data, Filled Data.

Use Moving Detection Method

Open Live Script

Use a moving median to detect and fill local outliers within a sine wave that corresponds to a time vector.

Create a vector of data containing a local outlier.

x = -2*pi:0.1:2*pi;
A = sin(x);
A(47) = 0;

Create a time vector that corresponds to the data in A.

t = datetime(2017,1,1,0,0,0) + hours(0:length(x)-1);

Define outliers as points more than three local scaled MAD from the local median within a sliding window. Find the location of the outlier in A relative to the points in t with a window size of 5 hours. Fill the outlier with the computed threshold value using the method "clip".

[B,TF,L,U,C] = filloutliers(A,"clip","movmedian",hours(5),SamplePoints=t);

Plot the original data and the data with the outlier filled.

plot(t,A)
hold on
plot(t,B,"o-")
legend("Original Data","Filled Data")

Figure contains an axes object. The axes object contains 2 objects of type line. These objects represent Original Data, Filled Data.

Fill Outliers in Matrix Rows

Open Live Script

Create a matrix of data containing outliers along the diagonal.

A = randn(5,5) + diag(1000*ones(1,5))

A = 5×5
10³ ×

    1.0005   -0.0013   -0.0013   -0.0002    0.0007
    0.0018    0.9996    0.0030   -0.0001   -0.0012
   -0.0023    0.0003    1.0007    0.0015    0.0007
    0.0009    0.0036   -0.0001    1.0014    0.0016
    0.0003    0.0028    0.0007    0.0014    1.0005

Fill outliers with zeros based on the data in each row, and display the new values.

[B,TF] = filloutliers(A,0,2);
B

B = 5×5

         0   -1.3077   -1.3499   -0.2050    0.6715
    1.8339         0    3.0349   -0.1241   -1.2075
   -2.2588    0.3426         0    1.4897    0.7172
    0.8622    3.5784   -0.0631         0    1.6302
    0.3188    2.7694    0.7147    1.4172         0

You can access the detected outlier values and their filled values using TF as an index vector.

[A(TF) B(TF)]

ans = 5×2
10³ ×

    1.0005         0
    0.9996         0
    1.0007         0
    1.0014         0
    1.0005         0

Specify Outlier Locations

Open Live Script

Create a vector containing two outliers and detect their locations.

A = [57 59 60 100 59 58 57 58 300 61 62 60 62 58 57];
detect = isoutlier(A)

detect = 1×15 logical array

   0   0   0   1   0   0   0   0   1   0   0   0   0   0   0

Fill the outliers using the "nearest" method. Instead of using a detection method, provide the outlier locations detected by isoutlier.

B = filloutliers(A,"nearest",OutlierLocations=detect)

B = 1×15

    57    59    60    59    59    58    57    58    61    61    62    60    62    58    57

Return Outlier Thresholds

Open Live Script

Replace the outlier in a vector of data using the "clip" fill method.

Create a vector of data with an outlier.

A = [60 59 49 49 58 100 61 57 48 58];

Detect outliers with the default method "median", and replace the outlier with the upper threshold value by using the "clip" fill method.

[B,TF,L,U,C] = filloutliers(A,"clip");

Plot the original data, the data with the outlier filled, and the thresholds and center value determined by the outlier detection method. The center value is the median of the data, and the upper and lower thresholds are three scaled MAD above and below the median.

plot(A)
hold on
plot(B,"o-")
yline([L U C],":",["Lower Threshold","Upper Threshold","Center Value"])
legend("Original Data","Filled Data")

Figure contains an axes object. The axes object contains 5 objects of type line, constantline. These objects represent Original Data, Filled Data.

Fill Values Above Scalar Threshold

Since R2024a

Open Live Script

Create a table and fill outliers defined as values greater than 10. Create a table of logical variables loc that indicates the locations of outliers to fill. Then, specify the known outlier locations for filloutliers using the OutlierLocations name-value argument.

A = [1; 4; 9; 12; 3];
B = [9; 0; 6; 2; 1];
C = [14; 4; 2; 3; 8];
T = table(A,B,C)

T=5×3 table
    A     B    C 
    __    _    __

     1    9    14
     4    0     4
     9    6     2
    12    2     3
     3    1     8

loc = T>10

loc=5×3 table
      A        B        C  
    _____    _____    _____

    false    false    true 
    false    false    false
    false    false    false
    true     false    false
    false    false    false

T = filloutliers(T,10,OutlierLocations=loc)

T=5×3 table
    A     B    C 
    __    _    __

     1    9    10
     4    0     4
     9    6     2
    10    2     3
     3    1     8

Input Arguments

collapse all

`A` — Input data
vector | matrix | multidimensional array | table | timetable

Input data, specified as a vector, matrix, multidimensional array, or a table or timetable with numeric variables.

If A is a table, then its variables must be of type double or single, or you can use the DataVariables argument to list double or single variables explicitly. Specifying variables is useful when you are working with a table that contains variables with data types other than double or single.
If A is a timetable, then filloutliers operates only on the table elements. If row times are used as sample points, then they must be unique and listed in ascending order.

Data Types: double | single | table | timetable

`fillmethod` — Fill method
numeric scalar | `"center"` | `"clip"` | `"previous"` | `"next"` | `"nearest"` | `"linear"` | `"spline"` | `"pchip"` | `"makima"`

Fill method for replacing outliers, specified as one of these values.

Fill Method	Description
Numeric scalar	Specified scalar value
`"center"`	Center value determined by `findmethod`
`"clip"`	Lower threshold value for elements smaller than the lower threshold determined by `findmethod`; upper threshold value for elements larger than the upper threshold determined by `findmethod`
`"previous"`	Previous nonoutlier value
`"next"`	Next nonoutlier value
`"nearest"`	Nearest nonoutlier value
`"linear"`	Linear interpolation of neighboring, nonoutlier values
`"spline"`	Piecewise cubic spline interpolation
`"pchip"`	Shape-preserving piecewise cubic spline interpolation
`"makima"`	Modified Akima cubic Hermite interpolation (numeric, `duration`, and `datetime` data types only)

Data Types: double | single | char | string

`findmethod` — Method for detecting outliers
`"median"` (default) | `"mean"` | `"quartiles"` | `"grubbs"` | `"gesd"`

Method for detecting outliers, specified as one of these values.

Method	Description
`"median"`	Outliers are defined as elements more than three scaled MAD from the median. The scaled MAD is defined as `cmedian(abs(A-median(A)))`, where `c=-1/(sqrt(2)erfcinv(3/2))`.
`"mean"`	Outliers are defined as elements more than three standard deviations from the mean. This method is faster but less robust than `"median"`.
`"quartiles"`	Outliers are defined as elements more than 1.5 interquartile ranges above the upper quartile (75 percent) or below the lower quartile (25 percent). This method is useful when the data in `A` is not normally distributed.
`"grubbs"`	Outliers are detected using Grubbs’ test, which removes one outlier per iteration based on hypothesis testing. This method assumes that the data in `A` is normally distributed.
`"gesd"`	Outliers are detected using the generalized extreme Studentized deviate test for outliers. This iterative method is similar to `"grubbs"` but can perform better when multiple outliers are masking each other. The maximum outlier count specified by `MaxNumOutliers` depends on the number of elements in `A`.

`threshold` — Percentile thresholds
two-element row vector with values in [0, 100]

Percentile thresholds, specified as a two-element row vector whose elements are in the interval [0,100]. The first element indicates the lower percentile threshold, and the second element indicates the upper percentile threshold. The first element of threshold must be less than the second element.

For example, a threshold of [10 90] defines outliers as points below the 10th percentile and above the 90th percentile.

`movmethod` — Moving method for detecting outliers
`"movmedian"` | `"movmean"`

Moving method for detecting outliers, specified as one of these values.

Method	Description
`"movmedian"`	Outliers are defined as elements more than three local scaled MAD from the local median over a window length specified by `window`. This method is also known as a Hampel filter.
`"movmean"`	Outliers are defined as elements more than three local standard deviations from the local mean over a window length specified by `window`.

`window` — Window length
positive integer scalar | two-element vector of positive integers | positive duration scalar | two-element vector of positive durations

Window length for moving method, specified as a positive integer scalar, a two-element vector of positive integers, a positive duration scalar, or a two-element vector of positive durations.

When window is a positive integer scalar, the window is centered about the current element and contains window-1 neighboring elements. If window is even, then the window is centered about the current and previous elements.

When window is a two-element vector of positive integers [b f], the window contains the current element, b elements backward, and f elements forward.

When A is a timetable or SamplePoints is specified as a datetime or duration vector, window must be of type duration, and the windows are computed relative to the sample points.

For more information about the window position, see Moving Window Size.

`dim` — Array dimension to operate along
positive integer scalar

Array dimension to operate along, specified as a positive integer scalar. If no value is specified, then the default is the first array dimension whose size does not equal 1.

Consider an m-by-n input matrix, A:

filloutliers(A,fillmethod,1) fills outliers according to the data in each column of A and returns an m-by-n matrix.
filloutliers(A,fillmethod,2) fills outliers according to the data in each row of A and returns an m-by-n matrix.

For table or timetable input data, dim is not supported and operation is along each table or timetable variable separately.

Name-Value Arguments

expand all

Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter.

Example: filloutliers(A,"center","mean",ThresholdFactor=4)

Data Options

expand all

`SamplePoints` — Sample points
vector | table variable name | scalar | function handle | table `vartype` subscript

Sample points, specified as a vector of sample point values or one of the options in the following table when the input data is a table. The sample points represent the x-axis locations of the data, and must be sorted and contain unique elements. Sample points do not need to be uniformly sampled. The vector [1 2 3 ...] is the default.

When the input data is a table, you can specify the sample points as a table variable using one of these options.

Indexing Scheme Examples

Indexing Scheme	Examples
Variable name: A string scalar or character vector	`"A"` or `'A'` — A variable named `A`
Variable index: An index number that refers to the location of a variable in the table A logical vector. Typically, this vector is the same length as the number of variables, but you can omit trailing `0` or `false` values	`3` — The third variable from the table `[false false true]` — The third variable
Function handle: A function handle that takes a table variable as input and returns a logical scalar	`@isnumeric` — One variable containing numeric values
Variable type: A `vartype` subscript that selects one variable of a specified type	`vartype("numeric")` — One variable containing numeric values

Variable name:

A string scalar or character vector

"A" or 'A' — A variable named A

Variable index:

An index number that refers to the location of a variable in the table
A logical vector. Typically, this vector is the same length as the number of variables, but you can omit trailing 0 or false values

3 — The third variable from the table
[false false true] — The third variable

Function handle:

A function handle that takes a table variable as input and returns a logical scalar

@isnumeric — One variable containing numeric values

Variable type:

A vartype subscript that selects one variable of a specified type

vartype("numeric") — One variable containing numeric values

Note

This name-value argument is not supported when the input data is a timetable. Timetables use the vector of row times as the sample points. To use different sample points, you must edit the timetable so that the row times contain the desired sample points.

Moving windows are defined relative to the sample points. For example, if t is a vector of times corresponding to the input data, then filloutliers(rand(1,10),"previous","movmean",3,SamplePoints=t) has a window that represents the time interval between t(i)-1.5 and t(i)+1.5.

When the sample points vector has data type datetime or duration, the moving window length must have type duration.

Example: filloutliers([1 100 3 4],"nearest",SamplePoints=[1 2.5 3 4])

Example: filloutliers(T,"nearest",SamplePoints="Var1")

Data Types: single | double | datetime | duration

`DataVariables` — Table variables to operate on
table variable name | scalar | vector | cell array | pattern | function handle | table `vartype` subscript

Table variables to operate on, specified as one of the options in this table. The DataVariables value indicates which variables of the input table to fill. The data type associated with the indicated variables must be double or single.

Other variables in the table not specified by DataVariables pass through to the output without being filled.

Indexing Scheme	Values to Specify	Examples
Variable name	A string scalar or character vector A string array or cell array of character vectors A `pattern` object	`"A"` or `'A'` — A variable named `A` `["A" "B"]` or `{'A','B'}` — Two variables named `A` and `B` `"Var"+digitsPattern(1)` — Variables named `"Var"` followed by a single digit
Variable index	An index number that refers to the location of a variable in the table A vector of numbers A `logical` vector. Typically, this vector is the same length as the number of variables, but you can omit trailing `0` (`false`) values.	`3` — The third variable from the table `[2 3]` — The second and third variables from the table `[false false true]` — The third variable
Function handle	A function handle that takes a table variable as input and returns a `logical` scalar	`@isnumeric` — All the variables containing numeric values
Variable type	A `vartype` subscript that selects variables of a specified type	`vartype("numeric")` — All the variables containing numeric values

Example: filloutliers(A,"previous",DataVariables=["Var1" "Var2" "Var4"])

`ReplaceValues` — Replace values in table
`true` or `1` (default) | `false` or `0`

Whether to replace values in a table or timetable, specified as one of these values:

true or 1 — Replace input table variables containing outliers entries with filled table variables.
false or 0 — Append the input table with all table variables that were checked for outliers. The outliers in the appended variables are filled.

For array input data, ReplaceValues is not supported.

Example: filloutliers(T,"previous",ReplaceValues=false)

Outlier Detection Options

expand all

`ThresholdFactor` — Detection threshold factor
nonnegative scalar

Detection threshold factor, specified as a nonnegative scalar.

For methods "median" and "movmedian", the detection threshold factor replaces the number of scaled MAD, which is 3 by default.

For methods "mean" and "movmean", the detection threshold factor replaces the number of standard deviations from the mean, which is 3 by default.

For methods "grubbs" and "gesd", the detection threshold factor is a scalar ranging from 0 to 1. Values close to 0 result in a smaller number of outliers, and values close to 1 result in a larger number of outliers. The default detection threshold factor is 0.05.

For the "quartiles" method, the detection threshold factor replaces the number of interquartile ranges, which is 1.5 by default.

This name-value argument is not supported when the specified method is "percentiles".

`MaxNumOutliers` — Maximum filled outliers by GESD
positive integer scalar

Maximum filled outliers by GESD, specified as a positive integer scalar. The MaxNumOutliers value specifies the maximum number of outliers that are filled by the "gesd" method. For example, filloutliers(A,"linear","gesd","MaxNumOutliers",5) fills no more than five outliers.

The default value for MaxNumOutliers is the integer nearest to 10 percent of the number of elements in A. Setting a larger value for the maximum number of outliers makes it more likely that all outliers are detected but at the cost of reduced computational efficiency.

The "gesd" method assumes the nonoutlier input data is sampled from an approximate normal distribution. When the data is not sampled in this way, the number of filled outliers might exceed the MaxNumOutliers value.

`OutlierLocations` — Known outlier indicator
logical array | table

Known outlier indicator, specified as a logical array or a table with logical variables (since R2024a).

If OutlierLocations is an array, it must be the same size as A. If OutlierLocations is a table or timetable, it must contain logical variables with the same sizes and names as the input table variables to operate on.

Elements with a value of 1 (true) indicate the locations of outliers in A. Elements with a value of 0 (false) indicate nonoutliers. When you specify OutlierLocations, filloutliers does not use an outlier detection method. Instead, it uses the elements of the known outlier indicator to define outliers.

You cannot specify OutlierLocations if you specify findmethod.

Data Types: logical | table | timetable

Output Arguments

collapse all

`B` — Filled data
vector | matrix | multidimensional array | table | timetable

Filled data, returned as a vector, matrix, multidimensional array, or a table or timetable with numeric variables.

B is the same size as A unless the value of ReplaceValues is false. If the value of ReplaceValues is false, then the width of B is the sum of the input data width and the number of data variables specified.

`TF` — Filled data indicator
vector | matrix | multidimensional array

Filled data indicator, returned as a vector, matrix, or multidimensional array. Elements with a value of 1 (true) correspond to filled elements of B that were previously outliers. Elements with a value of 0 (false) correspond to unchanged elements.

TF is the same size as B.

Data Types: logical

`L` — Lower threshold
vector | matrix | multidimensional array | table | timetable

Lower threshold used by the outlier detection method, returned as a vector, matrix, multidimensional array, or a table or timetable with numeric variables. For example, the lower threshold value of the default outlier detection method is three scaled MAD below the median of the input data.

If findmethod is used for outlier detection, then L has the same size as A in all dimensions except for the operating dimension where the length is 1. If movmethod is used, then L has the same size as A.

`U` — Upper threshold
vector | matrix | multidimensional array | table | timetable

Upper threshold used by the outlier detection method, returned as a vector, matrix, multidimensional array, or a table or timetable with numeric variables. For example, the upper threshold value of the default outlier detection method is three scaled MAD above the median of the input data.

If findmethod is used for outlier detection, then U has the same size as A in all dimensions except for the operating dimension where the length is 1. If movmethod is used, then U has the same size as A.

`C` — Center value
vector | matrix | multidimensional array | table | timetable

Center value used by the outlier detection method, returned as a vector, matrix, multidimensional array, or a table or timetable with numeric variables. For example, the center value of the default outlier detection method is the median of the input data.

If findmethod is used for outlier detection, then C has the same size as A in all dimensions except for the operating dimension where the length is 1. If movmethod is used, then C has the same size as A.

More About

collapse all

Median Absolute Deviation

For a finite-length vector A made up of N scalar observations, the median absolute deviation (MAD) is defined as

$MAD = median (| A_{i} - median (A) |)$

for i = 1,2,...,N.

The scaled MAD is defined as c*median(abs(A-median(A))), where c=-1/(sqrt(2)*erfcinv(3/2)). [1]

Moving Window Size

This table illustrates the window position across the default uniformly spaced sample points vector [1 2 3 4 5 6 7].

Description	Window Size and Location	Sample Points in Window
For a scalar window size, the leading edge of the window is excluded and the trailing edge of the window is included.	`window = 3` Current sample point = 4	3, 4, 5
	`window = 4` Current sample point = 4	2, 3, 4, 5
For a vector window size, the leading edge and the trailing edge are included.	`window = [2 2]` Current sample point = 4	2, 3, 4, 5, 6
For sample points near the endpoints of the input data, `filloutliers` truncates the window so it begins at the first sample point or ends at the last sample point.	`window = [2 2]` Current sample point = 2	1, 2, 3, 4

Alternative Functionality

Live Editor Task

You can use filloutliers functionality interactively by adding the Clean Outlier Data task to a live script.

Clean Outlier Data task in the Live Editor

References

[1] NIST/SEMATECH e-Handbook of Statistical Methods, https://www.itl.nist.gov/div898/handbook/, 2013.

Extended Capabilities

expand all

Tall Arrays
Calculate with arrays that have more rows than fit in memory.

The filloutliers function supports tall arrays with the following usage notes and limitations:

The "percentiles", "grubbs", and "gesd" methods are not supported.
The "movmedian" and "movmean" methods do not support tall timetables.
The SamplePoints and MaxNumOutliers name-value arguments are not supported.
The value of DataVariables cannot be a function handle.
The OutlierLocations name-value argument cannot specify a table or timetable.
Computation of filloutliers(A,fillmethod), filloutliers(A,fillmethod,"median",…) or filloutliers(A,fillmethod,"quartiles",…) along the first dimension is supported only when A is a tall column vector.
The syntaxes filloutliers(A,"spline",…) and filloutliers(A,"makima",…) are not supported.

For more information, see Tall Arrays.

C/C++ Code Generation
Generate C and C++ code using MATLAB® Coder™.

Usage notes and limitations:

The "movmean" and "movmedian" methods for detecting outliers do not support timetable input data, datetime SamplePoints values, or duration SamplePoints values.
Only the "center", "clip", and numeric scalar methods for filling outliers are supported when the input data is a timetable or when the SamplePoints value has type datetime or duration.
To use the "spline" and "pchip" fill methods, you must enable support for variable-size arrays.
String and character array inputs must be constant.
The OutlierLocations name-value argument cannot specify a table or timetable.

GPU Code Generation
Generate CUDA® code for NVIDIA® GPUs using GPU Coder™.

Refer to the usage notes and limitations in the C/C++ Code Generation section. The same usage notes and limitations apply to GPU code generation.

Thread-Based Environment
Run code in the background using MATLAB® `backgroundPool` or accelerate code with Parallel Computing Toolbox™ `ThreadPool`.

The filloutliers function fully supports thread-based environments. For more information, see Run MATLAB Functions in Thread-Based Environment.

Version History

Introduced in R2017a

expand all

R2024b: Support `"makima"` as input value to fill method

The fill method now supports "makima" as an input value for C/C++ code generation.

R2024a: Define outlier locations as table

Define the locations of outliers by specifying the OutlierLocations name-value argument as a table containing logical variables with names present in the input table. Previously, you could specify OutlierLocations only as a vector, matrix, or multidimensional array.

R2022a: Append filled values

For table or timetable input data, append the input table with all table variables that were checked for outliers. The outliers in the appended variables are filled. Append, rather than replace, table variables by setting the ReplaceValues name-value argument to false.

R2021b: Specify sample points as table variable

For table input data, specify the sample points as a table variable using the SamplePoints name-value argument.

filloutliers

Syntax

Description

Examples

Interpolate Outliers in Vector

Use Mean Detection and Nearest Fill Methods

Use Moving Detection Method

Fill Outliers in Matrix Rows

Specify Outlier Locations

Return Outlier Thresholds

Fill Values Above Scalar Threshold

Input Arguments

A — Input data vector | matrix | multidimensional array | table | timetable

fillmethod — Fill method numeric scalar | "center" | "clip" | "previous" | "next" | "nearest" | "linear" | "spline" | "pchip" | "makima"

findmethod — Method for detecting outliers "median" (default) | "mean" | "quartiles" | "grubbs" | "gesd"

threshold — Percentile thresholds two-element row vector with values in [0, 100]

movmethod — Moving method for detecting outliers "movmedian" | "movmean"

window — Window length positive integer scalar | two-element vector of positive integers | positive duration scalar | two-element vector of positive durations

dim — Array dimension to operate along positive integer scalar

Name-Value Arguments

Data Options

SamplePoints — Sample points vector | table variable name | scalar | function handle | table vartype subscript

DataVariables — Table variables to operate on table variable name | scalar | vector | cell array | pattern | function handle | table vartype subscript

ReplaceValues — Replace values in table true or 1 (default) | false or 0

Outlier Detection Options

ThresholdFactor — Detection threshold factor nonnegative scalar

MaxNumOutliers — Maximum filled outliers by GESD positive integer scalar

OutlierLocations — Known outlier indicator logical array | table

Output Arguments

B — Filled data vector | matrix | multidimensional array | table | timetable

TF — Filled data indicator vector | matrix | multidimensional array

L — Lower threshold vector | matrix | multidimensional array | table | timetable

U — Upper threshold vector | matrix | multidimensional array | table | timetable

C — Center value vector | matrix | multidimensional array | table | timetable

More About

Median Absolute Deviation

Moving Window Size

Alternative Functionality

Live Editor Task

References

Extended Capabilities

Tall Arrays Calculate with arrays that have more rows than fit in memory.

C/C++ Code Generation Generate C and C++ code using MATLAB® Coder™.

GPU Code Generation Generate CUDA® code for NVIDIA® GPUs using GPU Coder™.

Thread-Based Environment Run code in the background using MATLAB® backgroundPool or accelerate code with Parallel Computing Toolbox™ ThreadPool.

Version History

R2024b: Support "makima" as input value to fill method

R2024a: Define outlier locations as table

R2022a: Append filled values

R2021b: Specify sample points as table variable

See Also

Functions

Live Editor Tasks

Apps

Topics

`A` — Input data
vector | matrix | multidimensional array | table | timetable

`fillmethod` — Fill method
numeric scalar | `"center"` | `"clip"` | `"previous"` | `"next"` | `"nearest"` | `"linear"` | `"spline"` | `"pchip"` | `"makima"`

`findmethod` — Method for detecting outliers
`"median"` (default) | `"mean"` | `"quartiles"` | `"grubbs"` | `"gesd"`

`threshold` — Percentile thresholds
two-element row vector with values in [0, 100]

`movmethod` — Moving method for detecting outliers
`"movmedian"` | `"movmean"`

`window` — Window length
positive integer scalar | two-element vector of positive integers | positive duration scalar | two-element vector of positive durations

`dim` — Array dimension to operate along
positive integer scalar

`SamplePoints` — Sample points
vector | table variable name | scalar | function handle | table `vartype` subscript

`DataVariables` — Table variables to operate on
table variable name | scalar | vector | cell array | pattern | function handle | table `vartype` subscript

`ReplaceValues` — Replace values in table
`true` or `1` (default) | `false` or `0`

`ThresholdFactor` — Detection threshold factor
nonnegative scalar

`MaxNumOutliers` — Maximum filled outliers by GESD
positive integer scalar

`OutlierLocations` — Known outlier indicator
logical array | table

`B` — Filled data
vector | matrix | multidimensional array | table | timetable

`TF` — Filled data indicator
vector | matrix | multidimensional array

`L` — Lower threshold
vector | matrix | multidimensional array | table | timetable

`U` — Upper threshold
vector | matrix | multidimensional array | table | timetable

`C` — Center value
vector | matrix | multidimensional array | table | timetable

Tall Arrays
Calculate with arrays that have more rows than fit in memory.

C/C++ Code Generation
Generate C and C++ code using MATLAB® Coder™.

GPU Code Generation
Generate CUDA® code for NVIDIA® GPUs using GPU Coder™.

Thread-Based Environment
Run code in the background using MATLAB® `backgroundPool` or accelerate code with Parallel Computing Toolbox™ `ThreadPool`.

R2024b: Support `"makima"` as input value to fill method