Benford Distribution: Definition, Examples

Probability Distributions > Benford distribution

What is the Benford Distribution?

The Benford distribution, denoted X ∼ Benford, describes the distribution of random variables that follows Benford’s law.  Benford’s law (also called the first digit law) states that the first (non-zero) digit (i.e., the digits 1 to 9) in a wide range of number collections doesn’t follow a uniform distribution as one would expect. Instead, the numbers follow the non-uniform Benford distribution, which is a probability distribution for the probability of the first digit in a set of numbers. benford distribution As an example, approximately 30% of numbers appearing in a text will have a leading digit of 1. If all leading numbers had equal probability, the number 1 would occur 1/9 or 11.1% of the time.

The law only works for numbers placed in standard form, stripped of zeros and sign. For example, 2072, and −0.02072 both have a first digit of 2.

Use of the Benford Distribution

While popular for analyzing numbers in texts, the law also appears in a wide range of probability and statistics applications, including in products of i.i.d. random variable mixtures of random samples and in some stochastic models [1].

In real life, the distribution can model most accounting data, census statistics, and stock market data [2]; A specific use of the distribution is to audit financial records. The numbers in these records should theoretically follow the Benford distribution; if they do not, it is a sign that the records may have been falsified. Benford’s law isn’t widely known outside of statistical circles, so it’s unlikely that anyone falsifying records would know to distribute the fake numbers according to the Benford distribution [3].

The Benford distribution also holds for some parametric survival distributions, likely because many popular parametric lifetime models also follow the distribution for certain parameter values [4]. Data that obeys Benford’s law follows a Benford distribution; If a random variable has the Benford distribution, it can be denoted as X ∼ Benford [5].

Properties

The Benford distribution has the probability density function (PDF): benford distribution pdf
And the cumulative distribution function (CDF): benford distribution cdf

Why do numbers follow a Benford distribution?

At first, it might seem counterintuitive that numbers 1 through 9 aren’t uniformly distributed. However, our number system starts at 1, so it seems likely that higher numbers will appear less frequently. But there are many other reasons why numbers follow a Benford distribution. They include:

  • Phone numbers in the US start with an area code of “1”.
  • Most written texts were published in a year beginning with “1” (1999, 1987, 1892,…).
  • One third of days of the month begin with 1 (another third begin with a 2 and another third begin with 3).
  • As a percentage of the population, more people are alive whose ages start with a “1” (around 15%) than whose ages start with a 5, 6, 7, 8, or 9 [5].

References

  1. Berger, A. & Hill, T. A basic theory of Benford’s Law. Probability Surveys Vol. 8 (2011) 1-126.
  2. Hill, T. (1995). A Statistical Derivation of the Significant Digit Law. Statistical Science. 10(4):354-363.
  3. Tam Cho, W. & Gaines, B. Breaking the (Benford) Law: Statistical Fraud Detection in Campaign Finance.
  4. Leemis, L. et al. Survival Distributions Satisfying Benford’s Law. Retrieved November 9, 2021 from: http://www.math.wm.edu/~leemis/2000amstat.pdf
  5. Benford Distribution. Retrieved November 9, 2021 from: http://www.math.wm.edu/~leemis/chart/UDR/PDFs/Benford.pdf

Comments? Need to post a correction? Please Contact Us.