Assumption of Normality > Kolmogorov-Smirnov Test

## What is the Kolmogorov-Smirnov Test?

The Kolmogorov-Smirnov Goodness of Fit Test (K-S test) compares your data with a known distribution and let’s you know if they have the same distribution. Although the test is nonparametric — it doesn’t assume any particular underlying distribution — it is commonly used as a test for normality to see if your data is normally distributed. The two versions of the test are:

- Two-sample test: tests to see if the observed sample comes from the specified parent sample.
- One-sample test: test to see if the observed data set comes from a continuous model.

The hypotheses for the test are:

- H
_{0}: the data does not come from the specified distribution. - H
_{1}: the data comes from the specified distribution.

Most software packages (e.g. R) can run this test, although there are several online calculators available, like this one, and this one.

For manual calculations, the test statistic is given by:

Where (for a two-tailed test):

- F
_{0}(x) = the cdf of the hypothesized distribution, - F
_{data}(x) = the empirical distribution function of your observed data.

For one-tailed test, omit the absolute values from the formula.

If D is greater than the critical value, the null hypothesis is rejected. Critical values for D are found in the table below.

## K-S Test P-Value Table

## Advantages and Disadvantages

Advantages include:

- The test is distribution free.
- The D statistic is easy to calculate.
- It can be used as a goodness of fit test following regression analysis.
- There aren’t any restrictions on sample size.
- Tables are readily available.

Although the K-S test has many advantages, it also has a few limitations:

- In order for the test to work, you must specify the location, scale, and shape parameters. If estimated from the data, it invalidates the test.
- It generally can’t be used for discrete distributions, especially if you are using software (most software packages don’t have the necessary extensions for discrete K-S Test and the manual calculations are convoluted).
- Sensitivity is higher at the center of the distribution and lower at the tails.

If you prefer an online interactive environment to learn R and statistics, this free R Tutorial by Datacamp is a great way to get started. If you're are somewhat comfortable with R and are interested in going deeper into Statistics, try this Statistics with R track.

Comments are now closed for this post. Need help or want to post a correction? Please post a comment on our Facebook page and I'll do my best to help!