fillmissing
Fill missing values
Syntax
Description
fills missing entries of an array or table with the constant value F
= fillmissing(A
,'constant',v
)v
. If
A
is a matrix or multidimensional array, then v
can
be either a scalar or a vector. When v
is a vector, each element
specifies the fill value in the corresponding column of A
. If
A
is a table or timetable, then v
can also be a cell
array whose elements contain fill values for each table variable.
Missing values are defined according to the data type of A
:
NaN
—double
,single
,duration
, andcalendarDuration
NaT
—datetime
<missing>
—string
<undefined>
—categorical
' '
—char
{''}
—cell
of character arrays
If A
is a table, then the data type of each
column defines the missing value for that column.
fills gaps of missing entries using a custom method specified by a function handle
F
= fillmissing(A
,fillfun
,gapwindow
)fillfun
and a fixed window surrounding each gap from which the fill
values are computed. fillfun
must have the input arguments
xs
, ts
, and tq
, which are vectors
containing the sample data xs
of length gapwindow
, the
sample data locations ts
of length gapwindow
, and the
missing data locations tq
. The locations in ts
and
tq
are a subset of the sample points vector.
specifies
additional parameters for filling missing values using one or more
namevalue pair arguments. For example, if F
= fillmissing(___,Name,Value
)t
is
a vector of time values, then fillmissing(A,'linear','SamplePoints',t)
interpolates
the data in A
relative to the times in t
.
Examples
Vector with NaN
Values
Create a vector that contains NaN
values and replace each NaN
with the previous nonmissing value.
A = [1 3 NaN 4 NaN NaN 5];
F = fillmissing(A,'previous')
F = 1×7
1 3 3 4 4 4 5
Matrix with NaN
Values
Create a 2by2 matrix with a NaN
value in each column. Fill NaN
with 100
in the first column and 1000
in the second column.
A = [1 NaN; NaN 2]
A = 2×2
1 NaN
NaN 2
F = fillmissing(A,'constant',[100 1000])
F = 2×2
1 1000
100 2
Interpolate Missing Data
Use interpolation to replace NaN
values in nonuniformly sampled data.
Define a vector of nonuniform sample points and evaluate the sine function over the points.
x = [4*pi:0.1:0, 0.1:0.2:4*pi]; A = sin(x);
Inject NaN
values into A
.
A(A < 0.75 & A > 0.5) = NaN;
Fill the missing data using linear interpolation, and return the filled vector F
and the logical vector TF
. The value 1 (true
) in entries of TF
corresponds to the values of F
that were filled.
[F,TF] = fillmissing(A,'linear','SamplePoints',x);
Plot the original data and filled data.
plot(x,A,'.', x(TF),F(TF),'o') xlabel('x'); ylabel('sin(x)') legend('Original Data','Filled Missing Data')
Replace NaN
with Moving Median
Use a moving median to fill missing numeric data.
Create a vector of sample points x
and a vector of data A
that contains missing values.
x = linspace(0,10,200); A = sin(x) + 0.5*(rand(size(x))0.5); A([1:10 randi([1 length(x)],1,50)]) = NaN;
Replace NaN
values in A
using a moving median with a window of length 10, and plot both the original data and the filled data.
F = fillmissing(A,'movmedian',10); plot(x,F,'r.',x,A,'b.') legend('Filled Missing Data','Original Data')
Fill with Previous Value Using Custom Function
Define a custom function to fill NaN
values with the previous nonmissing value.
Define a vector of sample points t
and a vector of corresponding data A
containing NaN
values. Plot the data.
t = 10:10:100;
A = [0.1 0.2 0.3 NaN NaN 0.6 0.7 NaN 0.9 1];
plot(t,A,'o')
Use the local function forwardfill
(defined at the end of the example) to fill missing gaps with the previous nonmissing value. The function handle inputs include:
xs
— data values used for fillingts
— locations of the values used for filling relative to the sample pointstq
— locations of the missing values relative to the sample pointsn
— number of values in the gap to fill
n = 2;
gapwindow = [10 0];
[F,TF] = fillmissing(A,@(xs,ts,tq) forwardfill(xs,ts,tq,n),gapwindow,'SamplePoints',t);
The gap window value [10 0]
tells fillmissing
to consider one data point before a missing gap and no data points after a gap, since the previous nonmissing value is located 10 units prior to the gap. The function handle input values determined by fillmissing
for the first gap are:
xs = 0.3
ts = 30
tq = [40 50]
The function handle input values for the second gap are:
xs = 0.7
ts = 70
tq = 80
Plot the original data and the filled data.
plot(t,A,'o',t(TF),F(TF),'ro')
function y = forwardfill(xs,ts,tq,n) % Fill n values in the missing gap using the previous nonmissing value y = NaN(1,numel(tq)); y(1:min(numel(tq),n)) = xs; end
Matrix with Missing Endpoints
Create a matrix with missing entries and fill across the columns (second dimension) one row at a time using linear interpolation. For each row, fill leading and trailing missing values with the nearest nonmissing value in that row.
A = [NaN NaN 5 3 NaN 5 7 NaN 9 NaN; 8 9 NaN 1 4 5 NaN 5 NaN 5; NaN 4 9 8 7 2 4 1 1 NaN]
A = 3×10
NaN NaN 5 3 NaN 5 7 NaN 9 NaN
8 9 NaN 1 4 5 NaN 5 NaN 5
NaN 4 9 8 7 2 4 1 1 NaN
F = fillmissing(A,'linear',2,'EndValues','nearest')
F = 3×10
5 5 5 3 4 5 7 8 9 9
8 9 5 1 4 5 5 5 5 5
4 4 9 8 7 2 4 1 1 1
Table with Multiple Data Types
Fill missing values for table variables with different data types.
Create a table whose variables include categorical
, double
, and char
data types.
A = table(categorical({'Sunny';'Cloudy';''}),[66;NaN;54],{'';'N';'Y'},[37;39;NaN],... 'VariableNames',{'Description' 'Temperature' 'Rain' 'Humidity'})
A=3×4 table
Description Temperature Rain Humidity
___________ ___________ __________ ________
Sunny 66 {0x0 char} 37
Cloudy NaN {'N' } 39
<undefined> 54 {'Y' } NaN
Replace all missing entries with the value from the previous entry. Since there is no previous element in the Rain
variable, the missing character vector is not replaced.
F = fillmissing(A,'previous')
F=3×4 table
Description Temperature Rain Humidity
___________ ___________ __________ ________
Sunny 66 {0x0 char} 37
Cloudy 66 {'N' } 39
Cloudy 54 {'Y' } 39
Replace the NaN
values from the Temperature
and Humidity
variables in A
with 0.
F = fillmissing(A,'constant',0,'DataVariables',{'Temperature','Humidity'})
F=3×4 table
Description Temperature Rain Humidity
___________ ___________ __________ ________
Sunny 66 {0x0 char} 37
Cloudy 0 {'N' } 39
<undefined> 54 {'Y' } 0
Alternatively, use the isnumeric
function to identify the numeric variables to operate on.
F = fillmissing(A,'constant',0,'DataVariables',@isnumeric)
F=3×4 table
Description Temperature Rain Humidity
___________ ___________ __________ ________
Sunny 66 {0x0 char} 37
Cloudy 0 {'N' } 39
<undefined> 54 {'Y' } 0
Now fill the missing values in A
with a specified constant for each table variable, which are contained in a cell array.
F = fillmissing(A,'constant',{categorical({'None'}),1000,'Unknown',1000})
F=3×4 table
Description Temperature Rain Humidity
___________ ___________ ___________ ________
Sunny 66 {'Unknown'} 37
Cloudy 1000 {'N' } 39
None 54 {'Y' } 1000
Specify Maximum Gap
Create a time vector t
in seconds and a corresponding vector of data A
that contains NaN
values.
t = seconds([2 4 8 17 98 134 256 311 1001]); A = [1 3 23 NaN NaN NaN 100 NaN 233];
Fill only missing values in A
that correspond to a maximum gap size of 250 seconds. Since the second gap is larger than 250 seconds, the NaN
value is not filled.
F = fillmissing(A,'linear','SamplePoints',t,'MaxGap',seconds(250))
F = 1×9
1.0000 3.0000 23.0000 25.7944 50.9435 62.1210 100.0000 NaN 233.0000
Input Arguments
A
— Input data
vector  matrix  multidimensional array  table  timetable
Input data, specified as a vector, matrix, multidimensional array, table, or timetable.
When the input argument is a cell array, it must be a cell array of character vectors. If
A
is a timetable, then only table values are filled. If the
associated vector of row times contains a NaT
or
NaN
value, then fillmissing
produces an error.
Row times must be unique and listed in ascending order.
Data Types: double
 single
 int8
 int16
 int32
 int64
 uint8
 uint16
 uint32
 uint64
 logical
 char
 string
 cell
 table
 timetable
 categorical
 datetime
 duration
 calendarDuration
v
— Fill constant
scalar  vector  cell array
Fill constant, specified as a scalar, vector, or cell array.
v
can be a vector when A
is a matrix or
multidimensional array, indicating a different fill value for each operating dimension.
The length of v
must match the length of the operating
dimension.
v
can be a cell array of fill values when A
is a
table or timetable, indicating a different fill value for each variable. The number of
elements in the cell array must match the number of variables in the table.
Data Types: double
 single
 int8
 int16
 int32
 int64
 uint8
 uint16
 uint32
 uint64
 logical
 char
 cell
 categorical
 datetime
 duration
method
— Fill method
'previous'
 'next'
 'nearest'
 'linear'
 'spline'
 'pchip'
 'makima'
Fill method, specified as one of the following:
Method  Description 

'previous'  previous nonmissing value 
'next'  next nonmissing value 
'nearest'  nearest nonmissing value 
'linear'  linear interpolation of neighboring, nonmissing values (numeric, duration ,
and datetime data types only) 
'spline'  piecewise cubic spline interpolation (numeric, duration ,
and datetime data types only) 
'pchip'  shapepreserving piecewise cubic spline interpolation (numeric, duration ,
and datetime data types only) 
'makima'  modified Akima cubic Hermite interpolation (numeric,
duration , and datetime data types
only) 
movmethod
— Moving method
'movmean'
 'movmedian'
Moving method to fill missing data, specified as one of the following:
Method  Description 

'movmean'  Moving average over a window of length window (numeric
data types only) 
'movmedian'  Moving median over a window of length window (numeric
data types only) 
fillfun
— Custom fill method
function handle
Example: @(xs,ts,tq) myfun(xs,ts,tq)
Custom fill method, specified as a function handle. Valid function handles must include the following three input arguments:
Input Argument  Description 

xs  Vector containing data values used for filling. The length of
xs must match the length of the specified window. 
ts  Vector containing locations of the values used for filling. The length of
ts must match the length of the specified window.
ts is a subset of the sample points vector. 
tq  Vector containing locations of the missing values. tq
is a subset of the sample points vector. 
The function must return either a scalar or a vector with the same
length as tq
.
window
— Window length
positive integer scalar  twoelement vector of positive integers  positive duration scalar  twoelement vector of positive durations
Window length for moving methods, specified as a positive integer scalar, a twoelement vector of positive integers, a positive duration scalar, or a twoelement vector of positive durations. The window is defined relative to the sample points.
When window
is a positive integer scalar, then the window is
centered about the current element and contains window1
neighboring
elements. If window
is even, then the window is centered about the
current and previous elements. If window
is a twoelement vector of
positive integers [b f]
, then the window contains the current
element, b
elements backward, and f
elements
forward.
When A
is a timetable or 'SamplePoints'
is
specified as a datetime
or duration
vector,
window
must be of type duration
.
Data Types: double
 single
 int8
 int16
 int32
 int64
 uint8
 uint16
 uint32
 uint64
 duration
gapwindow
— Gap window length
positive integer scalar  twoelement vector of positive integers  positive duration scalar  twoelement vector of positive durations
Gap window length for custom fill functions, specified as a positive integer scalar, a twoelement vector of positive integers, a positive duration scalar, or a twoelement vector of positive durations. The gap window is defined relative to the sample points.
When specifying a function handle fillfun
for the fill method,
the value of gapwindow
represents a fixed window length that
surrounds each gap of missing values in the input data. The fill value is then computed
by fillfun
using the values in that window. For example, for default
sample points t = 1:10
and data A = [10 20 NaN NaN 50 60 70
NaN 90 100]
, a window length gapwindow = 3
specifies the
first window as [20 NaN NaN 50]
for which fillfun
operates on to compute the fill value. The second gap window for which
fillfun
operates on is [70 NaN 90]
.
When A
is a timetable or 'SamplePoints'
is
specified as a datetime
or duration
vector,
window
must be of type duration
.
Data Types: double
 single
 int8
 int16
 int32
 int64
 uint8
 uint16
 uint32
 uint64
 duration
dim
— Dimension to operate along
positive integer scalar
Dimension to operate along, specified as a positive integer scalar. If no value is specified, then the default is the first array dimension whose size does not equal 1.
When A
is a table or timetable, dim
is
not supported. fillmissing
operates along each
table or timetable variable separately.
Consider a twodimensional input array, A
.
If
dim=1
, thenfillmissing
fillsA
column by column.If
dim=2
, thenfillmissing
fillsA
row by row.
Data Types: double
 single
 int8
 int16
 int32
 int64
 uint8
 uint16
 uint32
 uint64
NameValue Arguments
Specify optional
commaseparated pairs of Name,Value
arguments. Name
is
the argument name and Value
is the corresponding value.
Name
must appear inside quotes. You can specify several name and value
pair arguments in any order as
Name1,Value1,...,NameN,ValueN
.
fillmissing(A,'DataVariables',{'Temperature','Altitude'})
fills
only the columns corresponding to the Temperature
and Altitude
variables
of an input tableSamplePoints
— Sample points
vector  table variable name  scalar  function handle  table vartype
subscript
Sample points, specified as the commaseparated pair consisting of
'SamplePoints'
and either a vector of sample point values or one
of the options in the following table when the input data is a table. The sample
points represent the xaxis locations of the data, and must be
sorted and contain unique elements. Sample points do not need to be uniformly sampled.
The vector [1 2 3 ...]
is the default.
When the input data is a table, you can specify the sample points as a table variable using one of the following options.
Option for Table Input  Description  Examples 

Variable name  A character vector or scalar string specifying a single table variable name 

Scalar variable index  A scalar table variable index 

Logical vector  A logical vector whose elements each correspond to a table variable, where


Function handle  A function handle that takes a table variable as input and returns a logical scalar,
which must be 

vartype subscript  A table subscript generated by the 

Note
This namevalue pair is not supported when the input data is a timetable
. Timetables always use the vector of row times as the sample points. To use different sample points, you must edit the timetable so that the row times contain the desired sample points.
Moving windows are defined relative to the sample points. For example, if
t
is a vector of times corresponding to the input data, then
fillmissing(rand(1,10),'movmean',3,'SamplePoints',t)
has a window
that represents the time interval between t(i)1.5
and
t(i)+1.5
.
When the sample points vector has data type datetime
or
duration
, then the moving window length must have type
duration
.
Example: fillmissing([1 NaN 3 4],'linear','SamplePoints',[1 2.5 3
4])
Example: fillmissing(T,'linear','SamplePoints',"Var1")
Data Types: single
 double
 datetime
 duration
DataVariables
— Table variables to operate on
table variable name  scalar  vector  cell array  function handle  table vartype
subscript
Table variables to operate on, specified as the commaseparated pair consisting of
'DataVariables'
and one of the options in this table. The
'DataVariables'
value indicates which variables of the input
table to fill. Other variables in the table not specified by
'DataVariables'
pass through to the output without being operated
on.
Option  Description  Examples 

Variable name  A character vector or scalar string specifying a single table variable name 

Vector of variable names  A cell array of character vectors or string array where each element is a table variable name 

Scalar or vector of variable indices  A scalar or vector of table variable indices 

Logical vector  A logical vector whose elements each correspond to a table variable, where


Function handle  A function handle that takes a table variable as input and returns a logical scalar 

vartype subscript  A table subscript generated by the 

Example: fillmissing(T,'linear','DataVariables',["Var1" "Var2"
"Var4"])
EndValues
— Method for handling endpoints
'extrap'
(default)  'previous'
 'next'
 'nearest'
 'none'
 scalar
Method for handling endpoints, specified as the commaseparated
pair consisting of 'EndValues'
and one of 'extrap'
, 'previous'
, 'next'
, 'nearest'
, 'none'
,
or a constant scalar value. The endpoint fill method handles leading
and trailing missing values based on the following definitions:
Method  Description 

'extrap'  same as method 
'previous'  previous nonmissing value 
'next'  next nonmissing value 
'nearest'  nearest nonmissing value 
'none'  no fill value 
scalar  constant value (numeric, duration , and datetime data
types only) 
Data Types: double
 single
 int8
 int16
 int32
 int64
 uint8
 uint16
 uint32
 uint64
 logical
 datetime
 duration
MissingLocations
— Known missing indicator
vector  matrix  multidimensional array
Known missing indicator, specified as the commaseparated pair consisting of
'MissingLocations'
and a logical vector, matrix, or
multidimensional array of the same size as A
. The indicator
elements can be true
to indicate a missing value in the
corresponding location of A
or false
otherwise.
Data Types: logical
MaxGap
— Maximum gap size to fill
numeric scalar  duration
scalar  calendarDuration
scalar
Maximum gap size to fill, specified as a numeric scalar,
duration
scalar, or calendarDuration
scalar.
Gaps are clusters of consecutive missing values whose size is the distance between the
nonmissing values surrounding the gap. The gap size is computed relative to the
sample points. Gaps smaller than or equal to the max gap size are filled, and gaps
larger than the gap size are not.
For example, consider the vector y = [25 NaN NaN 100]
using the
default sample points [1 2 3 4]
. The gap size in the vector is
computed from the sample points as 4  1 = 3
, so a
MaxGap
value of 2
leaves the missing values
unaltered, while a MaxGap
value of 3
fills in
the missing values.
For missing values at the beginning or end of the data:
A single missing value at the end of the input data has a gap size of 0 and is always filled.
Clusters of missing values occurring at the beginning or end of the input data are not completely surrounded by nonmissing values, so the gap size is computed using the nearest existing sample points. For the default sample points
1:N
, this produces a gap size that is 1 smaller than if the same cluster occurred in the middle of the data.
Output Arguments
F
— Filled data
vector  matrix  multidimensional array  table  timetable
Filled data, returned as a vector, matrix, multidimensional
array, table, or timetable. F
is the same size
as A
.
Data Types: double
 single
 int8
 int16
 int32
 int64
 uint8
 uint16
 uint32
 uint64
 logical
 char
 string
 cell
 table
 timetable
 categorical
 datetime
 duration
 calendarDuration
TF
— Filled data indicator
vector  matrix  multidimensional array
Filled data indicator, returned as a vector, matrix, or multidimensional
array. TF
is a logical array where 1 (true
)
corresponds to entries in F
that were filled and
0 (false
) corresponds to unchanged entries. TF
is
the same size as A
and F
.
Data Types: logical
Extended Capabilities
Tall Arrays
Calculate with arrays that have more rows than fit in memory.
Usage notes and limitations:
The
'spline'
and'makima'
methods are not supported.Function handle fill methods are not supported.
The
'MaxGap'
,'SamplePoints'
, and'MissingLocations'
namevalue pairs are not supported.The
'DataVariables'
namevalue pair cannot specify a function handle.The
'EndValues'
namevalue pair can only specify'extrap'
.The syntax
fillmissing(A,movmethod,window)
is not supported whenA
is a tall timetable.The syntax
fillmissing(A,'constant',v)
must specify a scalar value forv
.The syntax
fillmissing(A,___)
does not support character vector variables whenA
is a tall table or tall timetable.
For more information, see Tall Arrays.
C/C++ Code Generation
Generate C and C++ code using MATLAB® Coder™.
Usage notes and limitations:
The
'MaxGap'
namevalue pair is not supported.The
'makima'
option is not supported.When the
'SamplePoints'
value has typedatetime
or the input data is a timetable withdatetime
row times, only the methods'constant'
,'movmean'
, and'movmedian'
are supported.Function handle inputs for the
fillmethod
argument are not supported.
ThreadBased Environment
Run code in the background using MATLAB® backgroundPool
or accelerate code with Parallel Computing Toolbox™ ThreadPool
.
This function fully supports threadbased environments. For more information, see Run MATLAB Functions in ThreadBased Environment.
See Also
ismissing
 standardizeMissing
 rmmissing
 filloutliers
 isnan
 missing
 Clean Missing
Data
Open Example
You have a modified version of this example. Do you want to open this example with your edits?
MATLAB Command
You clicked a link that corresponds to this MATLAB command:
Run the command by entering it in the MATLAB Command Window. Web browsers do not support MATLAB commands.
Select a Web Site
Choose a web site to get translated content where available and see local events and offers. Based on your location, we recommend that you select: .
You can also select a web site from the following list:
How to Get Best Site Performance
Select the China site (in Chinese or English) for best site performance. Other MathWorks country sites are not optimized for visits from your location.
Americas
 América Latina (Español)
 Canada (English)
 United States (English)
Europe
 Belgium (English)
 Denmark (English)
 Deutschland (Deutsch)
 España (Español)
 Finland (English)
 France (Français)
 Ireland (English)
 Italia (Italiano)
 Luxembourg (English)
 Netherlands (English)
 Norway (English)
 Österreich (Deutsch)
 Portugal (English)
 Sweden (English)
 Switzerland
 United Kingdom (English)