Fit a Nonparametric Distribution with Pareto Tails
This example shows how to fit a nonparametric probability distribution to sample data using Pareto tails to smooth the distribution in the tails.
Step 1. Generate sample data.
Generate sample data that contains more outliers than expected from a standard normal distribution.
rng('default') % For reproducibility left_tail = -exprnd(1,10,1); right_tail = exprnd(5,10,1); center = randn(80,1); data = [left_tail;center;right_tail];
The data contains 80% values from a standard normal distribution, 10% from an exponential distribution with a mean of 5, and 10% from an exponential distribution with mean of -1. Compared to a standard normal distribution, the exponential values are more likely to be outliers, especially in the upper tail.
Step 2. Fit probability distributions to the data.
Fit a normal distribution and a t location-scale distribution to the data, and plot for a visual comparison.
probplot(data); p = fitdist(data,'tlocationscale'); h = probplot(gca,p); set(h,'color','r','linestyle','-'); title('Probability Plot') legend('Normal','Data','t location-scale','Location','SE')
Both distributions appear to fit reasonably well in the center, but neither the normal distribution nor the t location-scale distribution fit the tails very well.
Step 3. Generate an empirical distribution.
To obtain a better fit, use
ecdf to generate an empirical cdf based on the sample data.
The empirical distribution provides a perfect fit, but the outliers make the tails very discrete. Random samples generated from this distribution using the inversion method might include, for example, values near 4.33 and 9.25, but no values in between.
Step 4. Fit a distribution using Pareto tails.
paretotails to generate an empirical cdf for the middle 80% of the data and fit generalized Pareto distributions to the lower and upper 10%.
pfit = paretotails(data,0.1,0.9)
pfit = Piecewise distribution with 3 segments -Inf < x < -1.24623 (0 < p < 0.1): lower tail, GPD(-0.334156,0.798745) -1.24623 < x < 1.48551 (0.1 < p < 0.9): interpolated empirical cdf 1.48551 < x < Inf (0.9 < p < 1): upper tail, GPD(1.23681,0.581868)
To obtain a better fit,
paretotails fits a distribution by piecing together an ecdf or kernel distribution in the center of the sample, and smooth generalized Pareto distributions (GPDs) in the tails. Use
paretotails to create
paretotails probability distribution object. You can access information about the fit and perform further calculations on the object using the object functions of the
paretotails object. For example, you can evaluate the cdf or generate random numbers from the distribution.
Step 5. Compute and plot the cdf.
Compute and plot the cdf of the fitted
x = -4:0.01:10; plot(x,cdf(pfit,x))
paretotails cdf closely fits the data but is smoother in the tails than the ecdf generated in Step 3.