Assumption of Normality > Kolmogorov-Smirnov Test
What is the Kolmogorov-Smirnov Test?
The Kolmogorov-Smirnov Goodness of Fit Test (K-S test) compares your data with a known distribution and let’s you know if they have the same distribution. Although the test is nonparametric — it doesn’t assume any particular underlying distribution — it is commonly used as a test for normality to see if your data is normally distributed. The two versions of the test are:
- Two-sample test: tests to see if the observed sample comes from the specified parent sample.
- One-sample test: test to see if the observed data set comes from a continuous model.
The hypotheses for the test are:
- H0: the data does not come from the specified distribution.
- H1: the data comes from the specified distribution.
For manual calculations, the test statistic is given by:
Where (for a two-tailed test):
- F0(x) = the cdf of the hypothesized distribution,
- Fdata(x) = the empirical distribution function of your observed data.
For one-tailed test, omit the absolute values from the formula.
If D is greater than the critical value, the null hypothesis is rejected. Critical values for D are found in the table below.
K-S Test P-Value Table
Advantages and Disadvantages
- The test is distribution free.
- The D statistic is easy to calculate.
- It can be used as a goodness of fit test following regression analysis.
- There aren’t any restrictions on sample size.
- Tables are readily available.
Although the K-S test has many advantages, it also has a few limitations:
- In order for the test to work, you must specify the location, scale, and shape parameters. If estimated from the data, it invalidates the test.
- It generally can’t be used for discrete distributions, especially if you are using software (most software packages don’t have the necessary extensions for discrete K-S Test and the manual calculations are convoluted).
- Sensitivity is higher at the center of the distribution and lower at the tails.
If you prefer an online interactive environment to learn R and statistics, this free R Tutorial by Datacamp is a great way to get started. If you're are somewhat comfortable with R and are interested in going deeper into Statistics, try this Statistics with R track.Comments are now closed for this post. Need help or want to post a correction? Please post a comment on our Facebook page and I'll do my best to help!