Negative Hypergeometric Distribution / Romanovsky Distribution

< Probability Distribution List < Negative Hypergeometric Distribution

What is the negative hypergeometric distribution?

The negative hypergeometric distribution, also called the Romanovsky distribution [1], is used to calculate probabilities when sampling without replacement from a finite population.

This distribution is applicable when each sample can be categorized into two exclusive groups, such as Black/White or Economy class/Business class. As random selections are drawn from the population without replacement, the probability of success changes with each draw due to the decreasing population size.

The probability mass function (pmf) of X is given by

negative hypergeometric distribution pmf formula

Where

N = number of elements or objects,
m = number of successes in the population,
r = number of failures (the stopping point, where you draw until r failures are reached),
k = number of observed successes when r failures are drawn.

Example

Question: In a population of 150 rodents, there are 50 rats and 100 mice. If we capture 20 rodents, what is the probability that half are rats?

Solution: Substitute the information into the formula to get:

Negative hypergeometric distribution vs. hypergeometric distribution

While the negative hypergeometric distribution describes the probability of finding a specific number of successes in a sample, the regular hypergeometric distribution considers the number of successes in a fixed sample size. More specifically, the negative hypergeometric distribution determines the probability of getting a certain number of successes in a sample with a predetermined number of failures. In other words:

Hypergeometric distribution: defines samples as fixed in size,
Negative hypergeometric distribution defines samples as a fixed number of failures.

For example, suppose we have a population of 100 people, of which 20 are successes and 80 are failures. If we want to calculate the probability of getting 5 successes in a sample of 10, then we would use the hypergeometric distribution. However, if we want to calculate the probability of getting 5 successes in a sample with 8 failures, then we would use the negative hypergeometric distribution.

Negative hypergeometric distribution vs. negative binomial distribution

Both the negative hypergeometric distribution and the negative binomial distribution sample until you have a certain number of successes. The difference is that the negative binomial distribution deals with infinite samples, while the negative binomial distribution deals with finite samples.

Romanovsky distribution

The name “Romanovsky distribution” is also used to describe a restricted occupancy distribution in “ball and urn” investigations [3]. It is named after V.I. Romanovsky [5] who proposed the distribution to construct a hypothesis test concerning the homogeneity (similarity) of two samples.

Suppose we have two ordered samples S from the same collection, with volumes N and M and unknown probability density f(x), with:

x₁ ≤ x₂ ≤ … ≤ x_n,

y₁ ≤ y₂ ≤ … ≤ y_n (N ≥ 1, M ≥ 1).

Also suppose that sample x has n samples, not more than x_n+1 and N – n -1 members, at least x_n+1. Then the probability that the second sample will have μ members is not more than x_n+1 and M – μ members over x_n+1:

Historical Notes on the Romanovsky Distribution

Haight [4] lists Romanovsky’s distribution in the index yet points to as a sparse entry titled “Romanovsky’s generalization”:

The references point to Biometrika, where “Romanovsky’s generalised curve” is mentioned in [6] as a generalization of Pearson distributions, also called Pearson frequency curves.

Wishart [6] reports that Romanovsky’s curve “do not appear to better the existing types, owing to the expansion in terms of functions which are not suite to the purpose.” In addition, he notes that apart from a tiny region the series expansions — which expresses a function as an infinite sum, or series, of simpler functions — are not convergent (they do not settle on a particular result), “hence cannot give us really satisfactory fits.”

References

Yusupova A.K., Gafforov R.A. Refining One Theorem For The Romanovsky Distribution. The American Journal of Interdisciplinary Innovations and Research. Vol. 3 No. 06 (2021)
Top image: T113355, CC BY-SA 4.0 https://creativecommons.org/licenses/by-sa/4.0, via Wikimedia Commons
Charalambides, Ch. A. On Restricted and Pseudo-Contagious Occupancy Distributions. Journal of Applied Probability. Vol. 20, No. 4 (Dec., 1983), pp. 872-876 (5 pages) Published by: Applied Probability Trust.
Romanovsky V.I. Ordered samples from the same continuous population. Proceedings of the Institute of Mathematics and Mechanics. Tashkent, 1949, pp. 5-19
Haight, F. (1958). Index to the Distributions of Mathematical Statistics. National Bureau of Standards Report.
Wishart, J. On Romanovsky’s Generalised Frequency Curves. Biometrika Vol. 18, No. 1/2 (Jul., 1926), pp. 221-228 (8 pages) Published By: Oxford University Press