< Probability and statistics definitions < *Dispersion parameter*

## What is a dispersion parameter in GLM?

**Dispersion parameters** describe the degree of variance inflation or overdispersion in generalized linear models.

*General* linear models are a set of procedures that relate response variables to a linear combination of continuous or categorical predictor variables; On the other hand, g**eneralized linear models** can accommodate non-stable variances and exponential residual distributions such as the binomial distribution and Poisson distributions.

In GLMs, probability distributions of the response variable are usually parametrized by the mean (μ) and **dispersion parameter** (*φ*):

- μ is a location parameter
*φ*is a scale parameter.

Phi (*φ*) is used instead of the natural parameter (also called the canonical parameter) theta (θ).

## Statistical dispersion vs. dispersion parameter

The dispersion parameter in GLM is closely related to the statistical dispersion of the response variable. Although the terms are sometimes used (incorrectly) to mean the same thing, they have different definitions and applications:

- Dispersion parameters define a probability distribution in GLM.
*Statistical dispersion*describes how data is spread about the mean. Common measures of dispersion include the variance and standard deviation.

That said, the dispersion parameter in a GLM *can* be interpreted as a measure of statistical dispersion of the response variable, given the values of the explanatory variables.

## Dispersion parameter examples

The dispersion parameter of a binomial distribution or Poisson distribution is 1; there is not a variance parameter in either distribution due to the assumption that the variance is related to the sample size (for the binomial) or mean (for the Poisson). A parameter greater than 2 generally indicates overdispersion in a model although large values can also be caused by outliers or poorly specified models.

The normal (Gaussian) distribution dispersion parameter is the error variance and is assumed to be independent of the mean.

The negative binomial distribution contains an additional dispersion parameter *k*. This additional parameter is estimated — via method of moments or maximum likelihood estimation — or set to a fixed value [1].

For example, if Y is a random variable count, then the variance of Y is [2]

The variance converges to the mean as the dispersion parameter gets larger and larger, and the negative binomial converges to a Poisson distribution. If the dispersion parameter is fixed as a constant, then the negative binomial distribution can be treated as an exponential distribution. However, if there is evidence of overdispersion then the parameter can be used to adjust the variance, independent of the mean.

## References

[1] Generalized Linear Models Theory

[2] University of Virginia. Getting Started with Negative Binomial Regression Modeling. Retrieved October 14, 2023 from: https://library.virginia.edu/data/articles/getting-started-with-negative-binomial-regression-modeling