# Pólya Distribution

Probability Distribution > Pólya distribution Contents:

## What is a Pólya Distribution?

The Pólya distribution (also called the Pólya-Eggenberger distribution), named after George Pólya, is a discrete probability distribution related to Pólya’s urn (also called the Pólya-Eggenberger urn scheme) [1]. It describes the number of red balls drawn in the first n trials of Pólya’s urn. The number of black balls follows a negative Pólya-Eggenberger distribution [2]. The Pólya distribution has applications in fields as diverse as genetics, insurance, and modeling epidemics. The multivariate Pólya distribution, sometimes called the Dirichlet-multinomial distribution or Dirichlet compound multinomial distribution, is an extension of the univariate beta binomial distribution.

## Process and PMF for the Pólya Distribution

The distribution models a simple process: draw a random ball from an urn containing r red balls and N r black balls. Record the color of the ball, then return the ball to the urn with c additional balls of the same color. Repeat the process for n draws. If X is the number of red balls removed in the first n trials, then the random variable X follows a Pólya distribution. The probability mass function is [3]:
Where N, n, r, and c are natural numbers. When the sample size is large enough, the Pólya distribution can be estimated with the binomial distribution. In general, this is true if N tends to infinity and p = 1 – q = r/N remains a constant [4].

Interesting video from Numberphile on how to check election results, featuring Pólya’s Urn.

## Rutherford distribution (inspired by Pólya distribution)

Rutherford’s contagious distribution (or simply the Rutherford distribution) was inspired by the Pólya distribution or the Pólya urn model, from which it arises naturally [5]. The distribution, built on prior work by Woodbury [6] concerns the probability of a success at any trial which depends linearly on the number of previous successes. The distribution was proposed by R.S.G. Rutherford; there is no connection to Ernest Rutherford’s distribution that describes the scattering of alpha particles in physics.

Woodbury considered a general Bernoulli scheme where the probability of a success depends on the number of previous successes, formulating the equation

P(n + 1, x + 1) = pxP(n, x) + (1- p x+1) P (n, x + 1).

Where

• px = probability of success after x previous successes,
• P(n, x) = probability of x successes in n trials.

The assumption here is that all pairs of px’s are equal. Rutherford’s contagious distribution detailed a special case of the formula. The idea is when a white ball is drawn from the urn, it is replaced with α other balls. This case of the Pólya distribution leads to a clustering of secondary cases around the first ball drawn. Rutherford used the linear function where px is determined by just two parameters:

px = p + cx (c > 0),

implying that

• n < q/α if α > 0, and
• n < –p/α if α < 0.

## Arfwedson distribution

The Arfwedson distribution is a discrete probability distribution for an urn sampling problem for drawings without replacement.

“An urn contains N numbered balls. We make n drawings replacing the ball into the urn each time. What is the probability of getting v different balls?” Arfwedson [7].

The distribution has been called other names, such as:

• The coupon-collecting distribution, because it describes the probability that a person with n randomly selected coupons will have at least one of each of the k equally likely varieties [8].
• The classical occupancy distribution [9].
• Stirling2 distribution, because of the presence of the Stirling numbers of the second kind [10].
• Dixie cup [11].
• Stevens-Craig [12, 13].

There are many different formulas for the Arfwedson distribution. They depend on the approach to the number of occupied or unoccupied bins; if unoccupied, it reverses the probability mass function (PMF).

Haight [13] lists the distribution as

## References

[1] Kaiser, H. & Stefansky, W. A Polya Distribution for Teaching. The Teacher’s Corner. Retrieved November 13, 2021 from: https://www.jstor.org/stable/2682866

[2] Marshall, A. (1990). Bivariate Distributions Generated from Pólya-Eggenberger Urn Models. Journal of Multivariate Analysis 35, 48-65

[3] Teerapabolarn, K. (2014). An Improved Binomial Distribution to Approximate the Pólya Distribution. International Journal of Pure and Applied Mathematics Volume 93 No. 5, 629-632 ISSN: 1314-3395 (on-line version)

[4] Teerapabolarn, K. An improved binomial distribution to approximate the polya distribution, International Journal of Pure and Applied Mathematics. Volume 93 No. 5 2014, 629-632 ISSN: 1311-8080 (printed version)

[5] Rutherford, R. S. G. (1954). On a Contagious Distribution. The Annals of Mathematical Statistics, 25(4), 703–713. http://www.jstor.org/stable/2236654

[6] Woodbury, M. (1949). On a probability distribution. The Annals of Mathematical Statistics, 20, pp. 311-313. [7] G. Arfwedson, A probability distribution connected with Stirling’s second class numbers. Skand. Aktuarietidskr. 34 (1951), 121–132. [8] David, F. N., and Barton, D. E. (1962). Combinatorial Chance, London: Griffin. [1.1.3, 10.2, 10.3, 10.4.1, 10.5, 10.6.1]

[9] O’Neill, B. (2019). The Classical Occupancy Distribution: Computation and Approximation. The American Statistician. n, DOI: 10.1080/00031305.2019.1699445

[9] Williamson, P. P., Mays, D. P., Abay Asmerom, G., and Yang, Y. (2009), “Revisiting the Classical Occupancy Problem,” The American Statistician, 63, 356–360. [1,2,3]

[11] Johnson, N. L., and Kotz, S. (1977). Urn Models and Their Application, New York: Wiley. [3.10, 4.2.1, 5.1, 10.4.1, 10.4.2, 11.2.19]

[12] Stevens, W. L. (1937). Significance of grouping, Annals of Eugenics, London, 8, 57–60. [10.1, 10.4.1]

[13] Craig, C. C. (1953). On the utilization of marked specimens in estimating populations of flying insects, Biometrika, 40, 170–176. [10.1, 10.4.1]

[14] Haight, F. (1958). Index to the Distributions of Mathematical Statistics. National Bureau of Standards Report.