Measures of Dispersion
The purpose of measures of dispersion is to find out how spread out the data values are on the number line. Another term for these statistics is measures of spread.
The table gives the function names and descriptions.
Mean absolute deviation
Central moment of all orders
The range (the difference between the maximum and minimum values) is the simplest measure of spread. But if there is an outlier in the data, it will be the minimum or maximum value. Thus, the range is not robust to outliers.
The standard deviation and the variance are popular measures of spread that are optimal for normally distributed samples. The sample variance is the minimum variance unbiased estimator (MVUE) of the normal parameter σ2. The standard deviation is the square root of the variance and has the desirable property of being in the same units as the data. That is, if the data is in meters, the standard deviation is in meters as well. The variance is in meters2, which is more difficult to interpret.
Neither the standard deviation nor the variance is robust to outliers. A data value that is separate from the body of the data can increase the value of the statistics by an arbitrarily large amount.
The mean absolute deviation (MAD) is also sensitive to outliers. But the MAD does not move quite as much as the standard deviation or variance in response to bad data.
The interquartile range (IQR) is the difference between the 75th and 25th percentile of the data. Since only the middle 50% of the data affects this measure, it is robust to outliers.
Compare Measures of Dispersion
This example shows how to compute and compare measures of dispersion for sample data that contains one outlier.
Generate sample data that contains one outlier value.
x = [ones(1,6),100]
x = 1×7 1 1 1 1 1 1 100
Compute the interquartile range, mean absolute deviation, range, and standard deviation of the sample data.
stats = [iqr(x),mad(x),range(x),std(x)]
stats = 1×4 0 24.2449 99.0000 37.4185
The interquartile range (
iqr) is the difference between the 75th and 25th percentile of the sample data, and is robust to outliers. The range (
range) is the difference between the maximum and minimum values in the data, and is strongly influenced by the presence of an outlier.
Both the mean absolute deviation (
mad) and the standard deviation (
std) are sensitive to outliers. However, the mean absolute deviation is less sensitive than the standard deviation.