Assumption of Normality > Kolmogorov-Smirnov Test
- What is the Kolmogorov-Smirnov Test?
- How to run the test by hand
- Using software
- K-S Test P-Value Table
- Advantages and Disadvantages
- Kolmogorov-Smirnov Distribution
The Kolmogorov-Smirnov Goodness of Fit Test (K-S test) compares your data with a known distribution and lets you know if they have the same distribution. Although the test is nonparametric — it doesn’t assume any particular underlying distribution — it is commonly used as a test for normality to see if your data is normally distributed.It’s also used to check the assumption of normality in Analysis of Variance.
Lilliefors test, a corrected version of the K-S test for normality, generally gives a more accurate approximation of the test statistic’s distribution. In fact, many statistical packages (like SPSS) combine the two tests as a “Lilliefors corrected” K-S test.
Note: If you’ve never compared an experimental distribution to a hypothetical distribution before, you may want to read the empirical distribution article first. It’s a short article, and includes an example where you compare two data sets simply— using a scatter plot instead of a hypothesis test.
Back to top
Need help with the steps? Check out our tutoring page!
The hypotheses for the test are:
- Null hypothesis (H0): the data comes from the specified distribution.
- Alternate Hypothesis (H1): at least one value does not match the specified distribution.
H0: P = P0, H1: P ≠ P0.
Where P is the distribution of your sample (i.e. the EDF) and P0 is a specified distribution.
The general steps to run the test are:
- Create an EDF for your sample data (see Empirical Distribution Function for steps),
- Specify a parent distribution (i.e. one that you want to compare your EDF to),
- Graph the two distributions together.
- Measure the greatest vertical distance between the two graphs.
- Calculate the test statistic.
- Find the critical value in the KS table.
- Compare to the critical value.
Calculating the Test Statistic
The K-S test statistic measures the largest distance between the EDF Fdata(x) and the theoretical function F0(x), measured in a vertical direction (Kolmogorov as cited in Stephens 1992). The test statistic is given by:
Where (for a two-tailed test):
- F0(x) = the cdf of the hypothesized distribution,
- Fdata(x) = the empirical distribution function of your observed data.
Step 1: Find the EDF. In the EDF article, I generated an EDF using Excel that I’ll use for this example.
Step 2: Specify the parent distribution. In the same article, I also calculated the corresponding values for the gamma function.
Step 3: Graph the functions together. A snapshot of the scatter graph looked like this:
Step 4: Measure the greatest vertical distance. Let’s assume that I graphed the entire sample and the largest vertical distance separating my two graphs is .04 (in the yellow highlighted box).
Step 6: Compare the results from Step 4 and Step 5. Since .04 is less than .190, the null hypothesis (that the distributions are the same) is accepted.
Most software packages can run this test.
- The test is distribution free. That means you don’t have to know the underlying population distribution for your data before running this test.
- The D statistic (not to be confused with Cohen’s D) used for the test is easy to calculate.
- It can be used as a goodness of fit test following regression analysis.
- There are no restrictions on sample size; Small samples are acceptable.
- Tables are readily available.
Although the K-S test has many advantages, it also has a few limitations:
- In order for the test to work, you must specify the location, scale, and shape parameters. If these parameters are estimated from the data, it invalidates the test. If you don’t know these parameters, you may want to run a less formal test (like the one outlined in the empirical distribution function article).
- It generally can’t be used for discrete distributions, especially if you are using software (most software packages don’t have the necessary extensions for discrete K-S Test and the manual calculations are convoluted).
- Sensitivity is higher at the center of the distribution and lower at the tails.
The term “Kolmogorov Smirnov Distribution” refers to the distribution of the K-S statistic. In the vast majority of cases, the assumption is that the underlying cumulative distribution function is continuous. In those cases, the K-S distribution can be globally approximated by the general beta distribution .
However, some real-life applications call for a discrete distribution. While it is possible to fit a discrete distribution, working around the jump discontinuities has been problematic. In fact, there are no generally accepted efficient or exact computational methods that deal with this situation. Dimitrova et. al  did provide one method for a discontinuous Kolmogorov Smirnov distribution; it results in exact p-values for the test. The approach is beyond the scope of this article, but is involves expressing the complementary PDF through the rectangle probability for uniform order statistics via Fast Fourier Transform.
Kolmogorov-Smirnov Distribution: References
 Zhang, J. & Wu, Y. (2001). Beta Approximation to the Distribution of Kolmogorov-Smirnov Statistic. Ann. Inst. Statistical Math. Vol 54. No. 3, 577-584. Retrieved November 1, 2021 from: https://www.ism.ac.jp/editsec/aism/pdf/054_3_0577.pdf
 Dimitrova, D. et al. (2020). Computing the Kolmogorov-Smirnov Distribution When the Underlying CDF is Purely Discrete, Mixed, or Continuous. Journal of Statistical Software. Volume 95, Issue 10 (Oct). KS.
 Kolmogorov–Smirnov distribution. Retrieved November 15, 2021 from: http://www.math.wm.edu/~leemis/chart/UDR/PDFs/Kolmogorovsmirnov.pdf
Chakravarti, Laha, and Roy, (1967). Handbook of Methods of Applied Statistics, Volume I, John Wiley and Sons, pp. 392-394.
Ruppert, D. (2004). Statistics and Finance: An Introduction. Springer Science and Business Media.
Stephens M.A. (1992) Introduction to Kolmogorov (1933) On the Empirical Determination of a Distribution. In: Kotz S., Johnson N.L. (eds) Breakthroughs in Statistics. Springer Series in Statistics (Perspectives in Statistics). Springer, New York, NY
Stephanie Glen. "Kolmogorov-Smirnov Goodness of Fit Test" From StatisticsHowTo.com: Elementary Statistics for the rest of us! https://www.statisticshowto.com/kolmogorov-smirnov-test/
Feel like “cheating” at Statistics? Check out our Practically Cheating Statistics Handbook, which gives you hundreds of easy-to-follow answers in a PDF format. Recommended reading at top universities!
Comments? Need to post a correction? Please post a comment on our Facebook page.