Probability Distributions > Compound Probability Distribution
What is a Compound Probability Distribution?
A compound probability distribution has random variables drawn from a “compound” parametric distribution, where one or more of the distributions parameters (i.e. the mean) are taken from other probability distributions [1]. Compounding distributions make it easier to analyze and visualize data as a while, instead of analyzing separate components.
In simple terms, a compound distribution it isn’t a unique distribution but rather one made up from two or more other probability distributions.
Compound Random Variables
Compound random variables are any variables drawn from a compound distribution. They are defined more precisely with the following equation:
where:
- Y is a compound random variable,
- Xj is a group of iid random variables from a single experiment,
- N is a non-negative, discrete random variable.
In series form, the formula can be written as the random sum Y = X1 + X2 + … + XN [2],
where
- The number of terms N is not known (for example, an unknown number of policy claims or customers)
- Xi are iid with common distribution X
- Each Xi is independent of N.
Example
Princeton University professor Sam Wang created a compound probability distribution for the probability of Democrats or independent getting control of the senate in 2014. The distribution was an amalgam of all contested races in that year.
As you can tell, the compound graph is much easier to read than one where all the individual races are listed: You’re able to see the overall trend, instead of trying to make sense of a large number of individual graphs.
Mixture vs. Compound Distribution
A mixture distribution is a blend of a countable, finite number of distributions (in rare cases, you could have a countably infinite* number of distributions). The compound distribution is the general case of a mixture distribution, where the component distributions are uncountable. * The term “countably infinite” comes from set theory, and means that the set has elements that are in a one-to-one correspondence with the set of natural numbers. Although it would take forever to count all of the elements in the set, you could theoretically get to a certain element in a finite period of time. If you’re familiar with chemistry, you may already intuitively know the difference between a compound and a mixture — compound and mixture distributions are the same idea:
- Compound: a substance composed of more than one type of atom bonded together [3]. In the same way, a compound distribution is “bonded” to make one distribution.
- Mixture: a combination of two or more elements or compounds which have not reacted to form a bond. In the same way, mixture distributions retain features of both parent distributions.
Mixture distributions are formed by merging two or more parent distributions, each representing a different population. An example is blending a normal distribution with a uniform distribution, resulting in a distribution with the mean and variance of the normal distribution and the support of the uniform distribution. On the other hand, compound distributions are formed by conditioning a parent distribution on a categorical variable, which represents a different outcome.
Feature | Mixture distribution | Compound distribution |
---|---|---|
Number of parent distributions | Two or more | One |
Parent distributions | Represent different populations | Represent the same population |
Categorical variable | Not used | Used to condition on a different outcome |
Support | Determined by the parent distributions | Determined by the categorical variable |
Types of compound distribution
There are many different types of compound probability distributions, each with its own applications. Some of the most common ones include:
- Compound Poisson Distribution: Used to model situations where there are multiple sources of randomness, such as the number of hits on a website over time. The Poisson distribution models arrivals (i.e. visits to a website) and the compound part just means that the arrival rate is itself a random variable that follows some other underlying distribution.
- Compound Gamma Distribution: Used to model situations where there is an underlying continuous random process with multiple sources of randomness, such as the length of time people spend on your website. The gamma distribution models waiting times (i.e. how long someone spends on your site) and the compound part just means that one of the gamma’s parameters (the shape parameter) is itself a random variable that follows some other underlying distribution.
References
- A Dictionary of Statistical Terms, 5th edition, prepared for the International Statistical Institute by F.H.C. Marriott. Published for the International Statistical Institute by Longman Scientific and Technical
- Applied Probability and Statistics in Actuarial Science and Financial Economics. Retrieved June 29, 2023 from: https://mathmodelsblog.wordpress.com/2010/01/17/an-introduction-to-compound-distributions/
- 3.4: Classifying Matter According to Its Composition