< List of probability distributions < Zero-inflated Poisson distribution

The **zero-inflated Poisson distribution** (also called the *zip distribution*) is a generalization of the regular Poisson distribution to account for extra zeros. It’s often prefered to the Poisson distribution because some datasets contain numerous zeros [1].

The zip model has a wide range of applications in fields such as business, ecology, economics, and epidemiology. For example, it could be used to model a company’s employee attendance — especially if there seems to be a large amount of absences (i.e., zeros in the dataset). Predictors of the number of days of absence in the current year could include lower scores on yearly appraisals or a history of arriving late to work. The model can also be used to account for overdispersion in count data, by assuming that there are two types of data points: those with a zero count of probability *p*, and those with a nonzero count of probability 1-*p*.

## Zero-inflated Poisson distribution PMF

The probability mass function (PMF) of the ZIP distribution is [3]

where ≤ π ≤ 1 and λ ≥ 0.

The λ parameter forces the distribution to inflate the zeros; when λ = 0, the zero-inflated Poisson distribution reduces to the Poisson distribution.

## Zero-inflated vs. zero-modified distributions

The main difference between a zero-inflated model and a zero-modified model lies in how they handle excess zeroes in data. Zip models separate excess zeroes into components, while zero-modified models modify the count distribution to account for excess zeroes.

In the **zero-inflated distribution**, it is assumed that the data contains two types of zeroes with different generating processes:

- Structural zeros: the true absence of events,
- Sampling zeroes: a result of chance.

The excess zeroes are modeled separately from non-zero counts using a mixture distribution.

On the other hand, a zero-modified model assumes that the counts follow a known distribution (such as a Poisson distribution or negative binomial distribution). An additional modification term, which accounts for excess zeroes, could be:

- An additional mass at zero (i.e., there is a probability of generating a zero that is greater than what the distribution would normally generate), or
- An additional distribution that generates zero counts.

Which model you choose depends on the type of data you’re working with and the research question being addressed.

## Zero-inflated Poisson regression

In the last few years, zero-inflated distributions have become popular in regression analysis. This popularity is perhaps due to Lambert’s influential paper [4], which showed that ZIP regression is better than Poisson regression when it comes to fitting data with many zeros. Zero-inflated models most likely originated from the econometrics field [5].

The main difference between a zero-inflated Poisson *distribution* and zero-inflated Poisson *regression* lies in their application. Zero-inflated Poisson distribution is a probability distribution used to model count data with a significant proportion of zero counts. On the other hand, zero-inflated Poisson regression extends the standard Poisson regression to model zero inflation in data. Instead of assuming that the count data follows a standard Poisson, the model assumes that data is generated from a mixture of two processes:

- One process that generates zero counts
- One process that generates count data from a Poisson distribution.

Zero-inflated Poisson regression allows for estimation of model parameters as well as testing hypotheses about the significance of predictors.

#### References

[1] D. Böhning, “Zero-inflated Poisson models and C.A.MAN: a tutorial collection of

evidence”, Biometric Model 40:7 (1998), 833–843. Zbl 0914.62091

[2] Synergy42, CC BY-SA 4.0 https://creativecommons.org/licenses/by-sa/4.0, via Wikimedia Commons

[3] Becket, S. et. al. Zero-inflated Poisson (ZIP) distribution: parameter estimation and applications to

model data from natural calamities. Involve Journal of Mathematics. 2014. Vol 7. No. 6.

[4] Lambert, D. (1992) Zero-inflated Poisson regression, with an application to defects in manufacturing. *Technometrics*, 34, 1-14.

[5] Riddout, M. Zero-inflated models. Retrieved May 4, 2023 from: https://www.kent.ac.uk/smsas/personal/msr/webfiles/zip/zip.html