Probability Distributions > Truncated Distribution
What is a truncated distribution?
A truncated distribution has its domain (the x-values) restricted to a certain range of values. For example, you might restrict your x-values to between 0 and 100, written in math terminology as {0 > x > 100}. There are several types of truncated distributions:
- Truncated from above: high values of x are cut off so your range is from negative infinity to some maximum value of x {-∞,xmax}.
- Truncated from below: low values of x are cut off so your range is from some minimum value of x to positive infinity {xmin, ∞}
- Double truncation: both the low values and x values are cut off {xmin, xmax}.
- If values range from negative infinity to infinity {-∞, ∞}, there is no truncation.
Truncation can happen when datasets have values that are outside of a usual range. Let’s say you wanted to study income data for the first 1,000 people who submitted census forms. As you’re only studying the first 1,000 people, any income data you calculate would be truncated.
The truncated normal distribution
The truncated normal distribution shares the same properties as the normal distribution and is determined by the mean (μ) and standard deviation (σ). Additionally, an upper, lower, or double “truncated” range is chosen to limit the distribution. More specifically, the truncated normal distribution arises from bounding a normally distributed random variable either from above, below or both. The truncated normal preserves the main features of the normal distribution while avoiding extreme values.
As an example, the truncated normal distribution is often used in elementary statistics to introduce the normal distribution; it is often truncated at three standard deviations either side of the mean — excluding data that falls four, five, or even 100 standard deviations from the mean. This is to make the analysis more manageable and results in a z-table that is more compact.
The truncated normal distribution is widely used in statistics and econometrics, particularly to model binary outcomes in the probit model and censored data in the tobit model [1]. It can estimate the mean and standard deviation of a population — without considering extremes — and can also be used to test hypotheses about the “bulk” of a population. The truncated normal distribution has four key parameters:
- μ: the mean.
- σ: the standard deviation.
- a: the lower x-value (can be as low as -∞).
- b: the upper x-value (can be as high as ∞).
The following normal distribution has had the x-values restricted from 0 to ∞ (a lower truncation), giving the probability density function(pdf) of just half the normal curve:
You’ve probably used truncation without even realizing it; in elementary statistics classes, the normal distribution (and accompanying z-tables) are often truncated to 3 standard deviations around the mean. But the empirical rule tells us that while 99.7% of data does tend to fall within that boundary, there’s 0.03% of the data that falls somewhere outside (there could be the oddball value that’s 10, 20, even 100 standard deviations away from the mean! Truncating allows us to deal with the bulk of the data in a reasonable way.
Properties of the truncated normal distribution
The truncated normal distribution has four key parameters:
- μ: mean.
- σ: standard deviation.
- a: lower x-value > -∞.
- b: upper x-value < ∞.
Informally, the truncated normal probability density function (pdf) is defined as follows:
- Choose a general normal pdf by specifying parameters µ and σ.
- Choose a truncation range (a, b).
More formally, the pdf can be calculated with [3] Where
-
- φ = pdf of the standard normal distribution
- Φ = Cumulative distribution function (CDF) of the standard normal distribution
- μ̄ = mean of the standard normal distribution
- σ̄ = variance of the standard normal distribution.
The mean of the truncated distribution is almost impossible to find by hand, because you need to calculate the pdf first. You’ll want to use software–most statistics packages can handle these calculations. W. Joel Schnedier (who calls it a “bit of a mess,”) has created a handy Excel spreadsheet on their Wordpress blog to find the mean and standard deviation of a truncated distribution. You can download the spreadsheet here.
References
-
- Lecture 7: Models for Censored and Truncated Data — Tobit Model.
- Image: By 018 (talk) via Wikimedia Commons. CC BY 3.0.
- Burkardt, J. (2014). The Truncated Normal Distribution. Retrieved May 31, 20223 from: https://people.sc.fsu.edu/~jburkardt/presentations/truncated_normal.pdf
- Schneider, W. (2014). Using the truncated normal distribution. Retrieved June 4, 2014 from: https://assessingpsyche.wordpress.com/2014/06/04/using-the-truncated-normal-distribution/