Pólya Distribution

Probability Distribution > Pólya distribution Contents:

What is a Pólya Distribution?

Pólya Distribution urn model
Urn models are simple ways to represent real life probabilities.

The Pólya distribution (also called the Pólya-Eggenberger distribution), named after George Pólya, is a discrete probability distribution related to Pólya’s urn (also called the Pólya-Eggenberger urn scheme) [1]. It describes the number of red balls drawn in the first n trials of Pólya’s urn. The number of black balls follows a negative Pólya-Eggenberger distribution [2]. The Pólya distribution has applications in fields as diverse as genetics, insurance, and modeling epidemics. The multivariate Pólya distribution, sometimes called the Dirichlet-multinomial distribution or Dirichlet compound multinomial distribution, is an extension of the univariate beta binomial distribution.

Process and PMF for the Pólya Distribution

The distribution models a simple process: draw a random ball from an urn containing r red balls and N r black balls. Record the color of the ball, then return the ball to the urn with c additional balls of the same color. Repeat the process for n draws. If X is the number of red balls removed in the first n trials, then the random variable X follows a Pólya distribution. The probability mass function is [3]: Pólya distribution
Where N, n, r, and c are natural numbers. When the sample size is large enough, the Pólya distribution can be estimated with the binomial distribution. In general, this is true if N tends to infinity and p = 1 – q = r/N remains a constant [4].

Interesting video from Numberphile on how to check election results, featuring Pólya’s Urn.

Rutherford distribution (inspired by Pólya distribution)

Rutherford’s contagious distribution (or simply the Rutherford distribution) was inspired by the Pólya distribution or the Pólya urn model, from which it arises naturally [5]. The distribution, built on prior work by Woodbury [6] concerns the probability of a success at any trial which depends linearly on the number of previous successes. The distribution was proposed by R.S.G. Rutherford; there is no connection to Ernest Rutherford’s distribution that describes the scattering of alpha particles in physics.

Woodbury considered a general Bernoulli scheme where the probability of a success depends on the number of previous successes, formulating the equation

P(n + 1, x + 1) = pxP(n, x) + (1- p x+1) P (n, x + 1).

Where

  • px = probability of success after x previous successes,
  • P(n, x) = probability of x successes in n trials.

The assumption here is that all pairs of px’s are equal. Rutherford’s contagious distribution detailed a special case of the formula. The idea is when a white ball is drawn from the urn, it is replaced with α other balls. This case of the Pólya distribution leads to a clustering of secondary cases around the first ball drawn. Rutherford used the linear function where px is determined by just two parameters:

px = p + cx (c > 0),

implying that

  • n < q/α if α > 0, and
  • n < –p/α if α < 0.

Arfwedson distribution

The Arfwedson distribution is a discrete probability distribution for an urn sampling problem for drawings without replacement.

“An urn contains N numbered balls. We make n drawings replacing the ball into the urn each time. What is the probability of getting v different balls?” Arfwedson [7].

The distribution has been called other names, such as:

  • The coupon-collecting distribution, because it describes the probability that a person with n randomly selected coupons will have at least one of each of the k equally likely varieties [8].
  • The classical occupancy distribution [9].
  • Stirling2 distribution, because of the presence of the Stirling numbers of the second kind [10].
  • Dixie cup [11].
  • Stevens-Craig [12, 13].

There are many different formulas for the Arfwedson distribution. They depend on the approach to the number of occupied or unoccupied bins; if unoccupied, it reverses the probability mass function (PMF).

Haight [13] lists the distribution as

arfwedson distribution formula from Haight


References

[1] Kaiser, H. & Stefansky, W. A Polya Distribution for Teaching. The Teacher’s Corner. Retrieved November 13, 2021 from: https://www.jstor.org/stable/2682866

[2] Marshall, A. (1990). Bivariate Distributions Generated from Pólya-Eggenberger Urn Models. Journal of Multivariate Analysis 35, 48-65

[3] Teerapabolarn, K. (2014). An Improved Binomial Distribution to Approximate the Pólya Distribution. International Journal of Pure and Applied Mathematics Volume 93 No. 5, 629-632 ISSN: 1314-3395 (on-line version)

[4] Teerapabolarn, K. An improved binomial distribution to approximate the polya distribution, International Journal of Pure and Applied Mathematics. Volume 93 No. 5 2014, 629-632 ISSN: 1311-8080 (printed version)

[5] Rutherford, R. S. G. (1954). On a Contagious Distribution. The Annals of Mathematical Statistics, 25(4), 703–713. http://www.jstor.org/stable/2236654

[6] Woodbury, M. (1949). On a probability distribution. The Annals of Mathematical Statistics, 20, pp. 311-313. [7] G. Arfwedson, A probability distribution connected with Stirling’s second class numbers. Skand. Aktuarietidskr. 34 (1951), 121–132. [8] David, F. N., and Barton, D. E. (1962). Combinatorial Chance, London: Griffin. [1.1.3, 10.2, 10.3, 10.4.1, 10.5, 10.6.1]

[9] O’Neill, B. (2019). The Classical Occupancy Distribution: Computation and Approximation. The American Statistician. n, DOI: 10.1080/00031305.2019.1699445

[9] Williamson, P. P., Mays, D. P., Abay Asmerom, G., and Yang, Y. (2009), “Revisiting the Classical Occupancy Problem,” The American Statistician, 63, 356–360. [1,2,3]

[11] Johnson, N. L., and Kotz, S. (1977). Urn Models and Their Application, New York: Wiley. [3.10, 4.2.1, 5.1, 10.4.1, 10.4.2, 11.2.19]

[12] Stevens, W. L. (1937). Significance of grouping, Annals of Eugenics, London, 8, 57–60. [10.1, 10.4.1]

[13] Craig, C. C. (1953). On the utilization of marked specimens in estimating populations of flying insects, Biometrika, 40, 170–176. [10.1, 10.4.1]

[14] Haight, F. (1958). Index to the Distributions of Mathematical Statistics. National Bureau of Standards Report.


Comments? Need to post a correction? Please Contact Us.