How to check if data is normally distributed

592 views (last 30 days)
Hi all,
I want to run a f-test on two samples to see if their variances are independent. Wikipedia says that the f test is sensitive to non normality of sample (<>. How can I check if my samples are normally distributed or not.
I read some forums which said I can use kstest and lillietest. When can I use either? I get an answer h=0. Does that mean my data is normally distributed?
Thanks. Nancy

Accepted Answer

Tom Lane
Tom Lane on 7 Aug 2012
The functions you mention return H=0 when a test cannot reject the hypothesis of a normal distribution. They can't prove that the distribution is normal, but they don't find much evidence against that hypothesis.
The VARTESTN function has an option that is robust to non-normal distributions.
Tom Lane
Tom Lane on 9 Aug 2012
Suppose you would normally do
x1 = randn(20,1); x2 = 1.5*randn(25,1);
[h,p] = vartest2(x1,x2)
Then you can do something like this instead:
grp = [ones(size(x1)); 2*ones(size(x2))];
vartestn([x1;x2], grp)
I believe the two-sample vartestn test is not identical to the vartest2 test, but the p-values are likely to be similar. Then you can add options to do a robust test using vartestn.

Sign in to comment.

More Answers (2)

Sean on 7 Aug 2012
Hello Nancy,
You cannot tell from only 2 samples whether they are normally distributed or not. If you have a larger sample set and you are only testing them in pairs, then you could use the larger sample set to test for a particular distribution.
For example: (simple q-q plot)
data= randn(100); %generate random normally distributed 100x100 matrix
ref1= randn(100); %generate random normally distributed 100x100 matrix
ref2= rand(100); %generate random uniformly distributed 100x100 matrix
subplot(1,2,1); plot(x,y1);
subplot(1,2,2); plot(x,y2);
The first plot should be a straight line (indicating that the data distribution matches the reference distribution. The second plot isn't a straight line, indicating that the distributions do not match.
Nancy on 7 Aug 2012
The data samples you have given have equal sizes. What would I do if there are unequal sizes. I need to compare the variances across a lot of samples. I am wondering if there was a test like the t test for doing so. If I submit a report, I would just to write in the p values.
Thanks for your help Sean.

Sign in to comment.

Sarutahiko on 11 Dec 2013
Assuming you agree with the Anderson-Darling test for Normality, I'd just use Matlab's prebuilt function for that. It is

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!