Truncation in Statistics

< Probability and statistics definitions < Truncation in statistics

What is truncation in statistics?

Truncation, in general, means shortening. In statistics, it can mean:

• Limiting analysis to data that meets specific criteria (e.g., weights larger than a certain poundage).
• Eliminating values in a probability distribution above or below a certain point. For example, you might truncate a distribution after the 75th percentile — cutting off the last quartile.
• Eliminating, rather than rounding, digits beyond a certain decimal place. This often gives different results. For example, truncating 3.19345 to one decimal place would be 3.1, while rounding up would give 3.2. Truncating an unsigned number is the same as rounding to zero or rounding to floor [1]. A floor function in mathematics is a procedure that rounds a number down to the nearest integer.

Rectification vs truncation

Truncation and rectified probability distribution mean different things, although they may in some cases result in the same shapes of distribution:

• Truncating a probability distribution involves removing values from a probability distribution that fall outside a specific range.
• Rectified probability distributions are where a standard probability distribution replaces all negative values with 0.

Censoring vs truncation

A sample is truncated when some observations are excluded. For instance, let’s say we want to study the relationship between income (y) and education (x). If we only have data on individuals with an income above \$40,000 per year, then we have a truncated sample. On the other hand, a sample is censored when no observations are excluded, but some information is suppressed. In simpler terms, a truncated sample lacks certain observations, while a censored sample includes observations but with incomplete information [3].

When to round and when to truncate in statistics

A general rule of thumb is as follows: When the digit being dropped is 5, truncate if the previous digit is even, but raise if it is odd. Alternatively, truncate if the previous digit is odd, but raise if it is even.

For example,

• For 1.15 – 1.25, raise to 1.2, and for 1.26 – 1.34, truncate to 1.3.
• Alternatively, 1.16 – 1.24 becomes 1.2, and 1.25-1.35 becomes 1.3.

However, the general rule of thumb results in an uneven split between rounding and truncation. For example, if 1.16 – 1.24 becomes 1.2, and 1.25-1.35 becomes 1.3, this yields nine one way and eleven the other. To avoid this problem, try another rule [4]:

When y is 0-4, truncate; when 5-9, raise.

These are rules of thumb though — not rules set in stone. Whether you use rounding or truncation in statistics will largely be up to your professor’s guidelines and that of any journal you might want to publish to.

References

1. Northwestern University. Rounding.
2. Unknown author, CC BY-SA 3.0 https://creativecommons.org/licenses/by-sa/3.0, via Wikimedia Commons
3. Ao, X. An introduction to censored, truncated or sample-selected data. Retrieved July 24, 2023 from: https://www.hbs.edu/research-computing-services/Shared%20Documents/Training/censored_selected_truncated.pdf
4. Guare, C. A New System for Rounding Numbers.