< List of probability distributions < Hurdle distribution
A hurdle distribution (also called a zero-altered distribution) is a two-part mixture distribution that accounts for excess zeros in data. It’s called a hurdle distribution because of the need to overcome the “hurdle” of excess zeros such as the recording of rare phenomenon.
“[The hurdle distribution] provides a natural means for modeling overdispersion and underdispersion of the data”
Mullahy, 1986, p. 54 [1]
The hurdle distribution was first proposed by Cragg in 1971 [2]. Since then, the distribution has gained in popularity and is commonly found in epidemiology, genetics, insurance claims, marketing and medicine.
Hurdle distribution duality
The number of events in a hurdle distribution is a result of two distributions [3]:
- A binomial distribution that determines whether zero or non-zero events will be observed. A value of zero can only come from this portion of the model.
- A zero-truncated Poisson distribution or negative binomial distribution to determine the non-zero counts (1, 2, 3, …).
Another way to approach modeling of data with excess zeros is zero-inflated models such as the ZIP distribution and some negative binomial variables of zero-inflated and hurdle models [4]. These distributions differ in how zeros can happen: in zero-inflated models, zeros can happen as an outcome of the counting variable; in hurdle models, zeros can only happen as outcomes when the counting variable is truncated at zero [5].
References
- Mullahy, J. (1986). Specification and testing of some modified count data models.
Journal of econometrics, 33 (3), 341–365. - Cragg J.G. (1971) Some statistical models for limited dependent variables with application to the demand for durable goods. Econometrica, 39, 829–844.
- Martin, P. (2022). Regression Models for Categorical and Count Data. SAGE publications.
- Min, Y., and Agresti, A. (2005). Random effect models for repeated measures of zero-inflated count data. Statistical Modelling, 5 (1), 1–19.
- Zuniga, F. (2021). A New Trivariate Model and Generalized Linear Model for Stochastic Episodes’ Duration, Magnitude and Maximum. Dissertation.