Proving CLT using real data

7 views (last 30 days)
Martín Mateos Sánchez
Martín Mateos Sánchez on 4 May 2019
Answered: Jeff Miller on 5 May 2019
I'm trying to understand the CLT with real data, not only with a simulation, but for now, consider the following simulation that I made.
n = 50; % Number of samples
m = 1e3; % Number of times I simulate the sample
mu = 1;
sigma2 = 2;
X = normrnd(mu, sqrt(sigma2), n, M); % Supposing that my data follows a N~(1,2) distribution
X_bar = mean(X);
% Checking that the mean follows a normal distribution, according to the CLT
histogram(X_bar, 'Normalization', 'pdf')
hold on
x = linspace(0, 2, 5e2);
plot(x, normpdf(x, mu, sqrt(sigma2 / n)))
legend('Distr. of X\_bar', 'Asymp. Normal')
hold off
This code throws the following result
As expected, the data follows a normal distribution, but for this example I know that the data follows a normal distribution. If I have raw data as the vector 'Temperatures_Countries_Democracies.Data' (please find the needed files here), which are the temperatures in 1913 of the countries that have democracy, how can I proof that the mean of that raw data follows a normal distribution, i.e, proving the CLT, without knowing the data distribution?
You can get that raw data executing the following script:
datetime.setDefaultFormats('default','yyyy-MM-dd');
T = readtable('GlobalLandTemperaturesByCountry.csv');
T.Properties.VariableNames = {'Date' 'Data' 'AverageTemperatureUncertainty' 'Country'};
T = clean_table(T);
Countries_Democracies = readtable('Full_Flawed_Democracy.csv');
Temperatures_Countries_Democracies = get_data(T,"1913",Countries_Democracies);
Data = Temperatures_Countries_Democracies.Data; % The raw data
% Clearly it does not follows a normal distribution, neither a Poisson, or Exponential...
create_histogram(Temperatures_Countries_Democracies,0,0.5,0,'Expected tmp. Democracies')
This is my best try so far:
% ... continuing the code
n = length(Data);
m = 1e3;
X = normrnd(mean(Data), std(Data), n, m);
X_bar = mean(X);
histogram(X_bar, 'Normalization', 'pdf') % This is normal, but only because I used normrnd(·)

Answers (1)

Jeff Miller
Jeff Miller on 5 May 2019
I'm not sure exactly what you would accept as proving CLT, but you find this helpful:
X_bar = bootstrp(1000,@mean,Data);
histogram(X_bar,'Normalization', 'pdf');
The bootstrp function will produce 1000 different means by sampling the original data with replacement, and the histogram will show that these means look a lot more normal than the original scores.

Tags

Products

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!