Statistics How To

Dirichlet Distribution: Simple Definition, PDF, Mean

Probability Distributions > Dirichlet Distribution

Contents:

  1. What is a Dirichlet Distribution?
  2. What are Random PMFs?
  3. The Dirichlet Process.
  4. PDF/Mean/Variance.
  5. Similarity to Other Distributions.

1. What is a Dirichlet Distribution?


A Dirichlet distribution (pronounced Deer-eesh-lay) is a way to model random probability mass functions (PMFs) for finite sets. It is also sometimes used as a prior in Bayesian statistics. The distribution creates n positive numbers (a set of random vectors X1…Xn) that add up to 1; Therefore, it is closely related to the multinomial distribution, which also requires n numbers that sum to 1.

The distribution is named after the 19th century Belgian mathematician Johann Dirichlet.

2. What are Random PMFs?


When probability is introduced in basic statistics, one of the common topics to come up is rolling a fair die. The “fair die” is almost certainly a myth; Manufacturing processes are pretty good, but they aren’t perfect. If you roll 1000 dice, the theoretical odds of any particular number showing up (i.e. a 1,2,3,4,5,or 6) are 1/6. However, you won’t get that exact distribution in a real experiment due to manufacturing defects. No die is perfectly weighted– there will always be a tiny bit of sway to one side of a die or another. If you have ten dice, each die will have its own probability mass function (PMF).

Another example of a random PMF is the distribution of words in books and other documents; A book of length k words can be modeled by a Dirichlet distribution with a PMF of length k.

3. The Dirichlet Process


The Dirichlet process is a way to model randomness of a probability mass function(PMF) with unlimited options (e.g. an unlimited amount of dice in a bag). The process is similar Polya’s Urn, only instead of having a set number of ball colors you have an unlimited amount.

  • Start out with an empty urn.
  • Randomly pick a colored ball and place it in the urn.
  • Then choose one option:
    1. Randomly pick a colored ball and place it in the urn.
    2. Randomly remove a colored ball from the urn, then put it back with another ball of the same color.

As the number of balls in the urn increase, the probability of picking a new color decreases. The proportion of balls in the urn after an infinite amount of draws is a Dirichlet process.

4. PDF/Mean/Variance


The explanation above gives an outline of a Dirichlet distribution. The actual math behind the distribution is a little more complex. In order to fully understand the distribution, you should have an idea about:

PDF

The probability density function (PDF) is:
pdf dirichlet distribution


Where:
param 1 and a1, …, am are parameters with ai > 0 for i=1,…,m.

Mean

The mean of θj is:
E(θj) = aj / A.

Variance

The variance of θj is:
var(θj) = aj / A(A + 1) – aj / A(A + 1).

5. Similarity to Other Distributions

  • The Dirichlet is the multivariate generalization of the beta distribution. It is an extension of the beta distribution for modeling probabilities for two or more disjoint events; when m=2 (see PDF below), the Dirichlet distribution is equal to the PDF of the beta distribution.
  • The Dirichlet equals the uniform distribution when all parameters (α1…αk) are equal.
  • The Dirichlet distribution is a conjugate prior to the categorical distribution and and multinomial distributions. A compound variant is the Dirichlet-multinomial.

References:

Blackwell, D. and MacQueen, J. B. (1973). Ferguson distributions via Polya urn schemes. Ann. Statist. 2, 1, 353-355.
Brunner, L.J. and LO, A. Y. (2002). Bayesian classification To appear.
Cifarelli, D. M. and Regazzini, E. (1990). Distribution functions of means of a Dir. process. Ann. Statist. 18 429-442
Emilion, R. (2005). Process of random distributions. Afrika Stat, vol 1, 1, pp. 27-46, http://www.ufrsat.org/jas (contenus).
Ferguson, T.S. (1974). Prior distributions on spaces of probability measures. Ann. Statist. 2 615-629.
Kingman, J. F. C. (1975). Random discrete distributions. J. Roy. Statist. Soc. B, 37, 1-22.

If you prefer an online interactive environment to learn R and statistics, this free R Tutorial by Datacamp is a great way to get started. If you're are somewhat comfortable with R and are interested in going deeper into Statistics, try this Statistics with R track.

Comments are now closed for this post. Need help or want to post a correction? Please post a comment on our Facebook page and I'll do my best to help!
Dirichlet Distribution: Simple Definition, PDF, Mean was last modified: October 12th, 2017 by Andale