P-value and sigma consideration

I'm using a statistical test on a population of size N. This test returns a p-value which I use to accept or refute the null hypothesis.
Let's say I use a 5 sigma criterium with bilateral consideration, that is p-value 0.999999426696856 and 1 in 1.7 million chance.
The question is: is it obligatory for my population's size to be at least 1.7 million or is the p-value test result independent of the population size?
As an example, for a 2.2 sigma criterium the result is valid only if the population size N >= 36?

 Accepted Answer

Star Strider
Star Strider on 23 Feb 2014
Edited: Star Strider on 23 Feb 2014
As a general rule, the test for ‘statistical significance’ is p < 0.05. That translates to the probability of your results being due to a random process, rather than your design, being < 0.05. (It is necessary to consider your population size, but this should be part of the experimental design and the statistics you choose at that time to evaluate your results. The t-distribution takes this into account.)
The p value itself is independent of the population size, and simply reflects the probability that your population includes a specific value.

4 Comments

Star Strider, as always thank you for a very fast reply.
I have actually asked this because of MATLAB's lack of warning or any tool to relate the statistical significance to the population size.
Consider the following code:
%%Detection of false positives in t-test according to population size
h2 = 0;
h2sigma = 0;
h22 = 0;
for i=1:1000
% N = 4, appropriate for 68% significance
[h,p] = ttest(rand(1,4));
h2 = h2 + h; % 95% significance
h2sigma = h2sigma + (p<1-erf(1/sqrt(2))); % 68% significance, 1 sigma
% N = 22, appropriate for 95% significance
[h,~] = ttest(rand(1,22));
h22 = h22 + h;
end
perFail2 = (i-h2)/i
perFail2sigma = (i-h2sigma)/i
perFail22 = (i-h22)/i
All percentages (perFail2, perFail2sigma and perFail22) should be near 0.
You can see that perFail2 will be quite high since the population size 4 is incompatible with the standard p-value of 95%. On the other hand, if the p-value is adjusted to a compatible level, perFail2sigma goes to 0 and perFail22 is always zero given the population size 22 is more than enough for the 95% statistical significance.
Shouldn't MATLAB warn the user the population size of the data is not enough to cover extremes of the model for the given statistical significance or is this left up to the user?
My pleasure!
You seem to be exploring the properties of the t-distribution in the code you posted.
The population size 4 definitely is compatible with the standard p < 0.05. The t-distribution takes sample size into account, one of the reasons it is so useful. (It describes the distribution of the sample taken from the population, not the distribution of the population itself that is characteristically described by the normal distribution.)
Returning to the sample size of 4, evaluating this statement:
tscore = tinv(0.025,3)
(with n-1 = 3 degrees-of-freedom) yields:
tscore = -3.1824
as opposed to the population z-score (from the normal distribution) of about -1.96. The larger the sample size, the closer the t-distribution approaches the normal distribution.
You asked: ‘Shouldn't MATLAB warn the user the population size of the data is not enough to cover extremes of the model for the given statistical significance or is this left up to the user?’
The answer is that it is entirely left up to the user (MATLAB can’t design your experiment for you), although MATLAB has a special section in the Statistics Toolbox on Design of Experiments that can help you with these decisions. I’m not sure what you’re studying and analysing, but you have encountered a frequent problem in biomedical and other research involving small samples, and therefore having to use small-sample statistics. You have to design your experiment with those in mind. This frequently requires that you go to the literature to see what others have done in their designs in similar studies, or with similar populations to yours.
Problems like yours are important in research design. Books have been written on this, and the Statistics Toolbox has a large section devoted to it.
With respect to your population of 36, the critical t-score for a two-tailed significance of p = 0.05 is -2.0301.
Thank you for your comprehensive reply and the link regarding design of experiments, I'll give that a good reading.
Star Strider
Star Strider on 24 Feb 2014
Edited: Star Strider on 24 Feb 2014
My pleasure!
I’ll be glad to provide what help I can, especially if you’re just beginning your research.

Sign in to comment.

More Answers (0)

Asked:

on 23 Feb 2014

Edited:

on 24 Feb 2014

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!