Meaning of the p-value calculation for Kolmogorov-Smirnov test in KSTEST2 function

41 views (last 30 days)
I am using the Kolmogorov-Smirnov test to determine if two samples of data derive from the same distribution or not. I understand how the KS statistic is calculated: Basicly it's the maximum difference between the empirical CDF of the two samples. The null hypothesis is that the samples are from the same distribution, so if the p-value is lower than alpha (say, alpha=0.05), we reject the null hypothesis and state that the two samples are from different distributions. What i don't understand is how the p-value is calculated. I am using the MATLAB KSTEST2 function. the p-value calculations is (relavnt parts):
n = n1 * n2 /(n1 + n2);
lambda = max((sqrt(n) + 0.12 + 0.11/sqrt(n)) * KSstatistic , 0);
j = (1:101)';
pValue = 2 * sum((-1).^(j-1).*exp(-2*lambda*lambda*j.^2));
pValue = min(max(pValue, 0), 1);
Setting aside the max and min that are probably just making sure lambda isn't negative and p-valus isn't above 1 or under 0, I don't understand how and why the p-value is calculated this way. Can anyone explain?

Accepted Answer

Abhas
Abhas on 29 Dec 2024
The p-value is derived from the limiting distribution of the KS statistic under the null hypothesis. The KS test compares the empirical CDFs of the two samples, and the p-value represents the probability of observing a KS statistic as extreme (or more) as the one calculated, assuming the null hypothesis is true.
  • The term "exp(-2*lambda*lambda*j.^2))" comes from the theory of Brownian bridges and the asymptotic behavior of the KS statistic.
  • Alternating signs "(-1).^(j-1)" account for corrections in the cumulative distribution function.
  • The summation gives the cumulative probability up to the given KS statistic.
After extensive searching, I came across some valuable resources on the topic. One is "http://e-maxx.ru/bookz/files/numerical_recipes.pdf" by Press et al., pages 736–740 (2007). It references another, albeit more complex, resource: "https://www.jstor.org/stable/2984408?seq=1#page_scan_tab_contents" by Stephens, found on pages 115–122 in the Journal of the Royal Statistical Society: Series B (Methodological), 1970.
For interpretation refer the below strategy:
  • If pValue<α (e.g., α=0.05): Reject the null hypothesis: The two samples come from different distributions.
  • If pValueα: Fail to reject the null hypothesis: Insufficient evidence to claim the distributions differ.
You may refer to the below MathWorks documentation links to know more about the same: https://www.mathworks.com/help/stats/kstest2.html
I hope this helps!

More Answers (0)

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!