# Dirichlet Distribution: Simple Definition, PDF, Mean

Probability Distributions > Dirichlet Distribution

Contents:

## 1. What is a Dirichlet Distribution?

A Dirichlet distribution (pronounced Deer-eesh-lay) is a way to model random probability mass functions (PMFs) for finite sets. It is also sometimes used as a prior in Bayesian statistics. The distribution creates n positive numbers (a set of random vectors X1…Xn) that add up to 1; Therefore, it is closely related to the multinomial distribution, which also requires n numbers that sum to 1.

The distribution is named after the 19th century Belgian mathematician Johann Dirichlet.

## 2. What are Random PMFs?

When probability is introduced in basic statistics, one of the common topics to come up is rolling a fair die. The “fair die” is almost certainly a myth; Manufacturing processes are pretty good, but they aren’t perfect. If you roll 1000 dice, the theoretical odds of any particular number showing up (i.e. a 1,2,3,4,5,or 6) are 1/6. However, you won’t get that exact distribution in a real experiment due to manufacturing defects. No die is perfectly weighted– there will always be a tiny bit of sway to one side of a die or another. If you have ten dice, each die will have its own probability mass function (PMF).

Another example of a random PMF is the distribution of words in books and other documents; A book of length k words can be modeled by a Dirichlet distribution with a PMF of length k.

## 3. The Dirichlet Process

The Dirichlet process is a way to model randomness of a probability mass function(PMF) with unlimited options (e.g. an unlimited amount of dice in a bag). The process is similar Polya’s Urn, only instead of having a set number of ball colors you have an unlimited amount.

• Start out with an empty urn.
• Randomly pick a colored ball and place it in the urn.
• Then choose one option:
1. Randomly pick a colored ball and place it in the urn.
2. Randomly remove a colored ball from the urn, then put it back with another ball of the same color.

As the number of balls in the urn increase, the probability of picking a new color decreases. The proportion of balls in the urn after an infinite amount of draws is a Dirichlet process.

## 4. PDF/Mean/Variance

The explanation above gives an outline of a Dirichlet distribution. The actual math behind the distribution is a little more complex. In order to fully understand the distribution, you should have an idea about:

## PDF

The probability density function (PDF) is:

Where:
and a1, …, am are parameters with ai > 0 for i=1,…,m.

## Mean

The mean of θj is:
E(θj) = aj / A.

## Variance

The variance of θj is:
var(θj) = aj / A(A + 1) – aj / A(A + 1).

## 5. Similarity to Other Distributions

• The Dirichlet is the multivariate generalization of the beta distribution. It is an extension of the beta distribution for modeling probabilities for two or more disjoint events; when m=2 (see PDF below), the Dirichlet distribution is equal to the PDF of the beta distribution.
• The Dirichlet equals the uniform distribution when all parameters (α1…αk) are equal.
• The Dirichlet distribution is a conjugate prior to the categorical distribution and and multinomial distributions. A compound variant is the Dirichlet-multinomial.

References:

Blackwell, D. and MacQueen, J. B. (1973). Ferguson distributions via Polya urn schemes. Ann. Statist. 2, 1, 353-355.
Brunner, L.J. and LO, A. Y. (2002). Bayesian classification To appear.
Cifarelli, D. M. and Regazzini, E. (1990). Distribution functions of means of a Dir. process. Ann. Statist. 18 429-442
Emilion, R. (2005). Process of random distributions. Afrika Stat, vol 1, 1, pp. 27-46, http://www.ufrsat.org/jas (contenus).
Ferguson, T.S. (1974). Prior distributions on spaces of probability measures. Ann. Statist. 2 615-629.
Kingman, J. F. C. (1975). Random discrete distributions. J. Roy. Statist. Soc. B, 37, 1-22.

If you prefer an online interactive environment to learn R and statistics, this free R Tutorial by Datacamp is a great way to get started. If you're are somewhat comfortable with R and are interested in going deeper into Statistics, try this Statistics with R track.