Proving CLT using real data
7 views (last 30 days)
Show older comments
I'm trying to understand the CLT with real data, not only with a simulation, but for now, consider the following simulation that I made.
n = 50; % Number of samples
m = 1e3; % Number of times I simulate the sample
mu = 1;
sigma2 = 2;
X = normrnd(mu, sqrt(sigma2), n, M); % Supposing that my data follows a N~(1,2) distribution
X_bar = mean(X);
% Checking that the mean follows a normal distribution, according to the CLT
histogram(X_bar, 'Normalization', 'pdf')
hold on
x = linspace(0, 2, 5e2);
plot(x, normpdf(x, mu, sqrt(sigma2 / n)))
legend('Distr. of X\_bar', 'Asymp. Normal')
hold off
This code throws the following result
As expected, the data follows a normal distribution, but for this example I know that the data follows a normal distribution. If I have raw data as the vector 'Temperatures_Countries_Democracies.Data' (please find the needed files here), which are the temperatures in 1913 of the countries that have democracy, how can I proof that the mean of that raw data follows a normal distribution, i.e, proving the CLT, without knowing the data distribution?
You can get that raw data executing the following script:
datetime.setDefaultFormats('default','yyyy-MM-dd');
T = readtable('GlobalLandTemperaturesByCountry.csv');
T.Properties.VariableNames = {'Date' 'Data' 'AverageTemperatureUncertainty' 'Country'};
T = clean_table(T);
Countries_Democracies = readtable('Full_Flawed_Democracy.csv');
Temperatures_Countries_Democracies = get_data(T,"1913",Countries_Democracies);
Data = Temperatures_Countries_Democracies.Data; % The raw data
% Clearly it does not follows a normal distribution, neither a Poisson, or Exponential...
create_histogram(Temperatures_Countries_Democracies,0,0.5,0,'Expected tmp. Democracies')
This is my best try so far:
% ... continuing the code
n = length(Data);
m = 1e3;
X = normrnd(mean(Data), std(Data), n, m);
X_bar = mean(X);
histogram(X_bar, 'Normalization', 'pdf') % This is normal, but only because I used normrnd(·)
0 Comments
Answers (1)
Jeff Miller
on 5 May 2019
I'm not sure exactly what you would accept as proving CLT, but you find this helpful:
X_bar = bootstrp(1000,@mean,Data);
histogram(X_bar,'Normalization', 'pdf');
The bootstrp function will produce 1000 different means by sampling the original data with replacement, and the histogram will show that these means look a lot more normal than the original scores.
0 Comments
See Also
Categories
Find more on Hypothesis Tests in Help Center and File Exchange
Products
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!