Zero-Inflated Poisson distribution

< List of probability distributions < Zero-inflated Poisson distribution

The zero-inflated Poisson distribution (also called the zip distribution) is a generalization of the regular Poisson distribution to account for extra zeros. It’s often prefered to the Poisson distribution because some datasets contain numerous zeros [1].

The zip model has a wide range of applications in fields such as business, ecology, economics, and epidemiology. For example, it could be used to model a company’s employee attendance — especially if there seems to be a large amount of absences (i.e., zeros in the dataset). Predictors of the number of days of absence in the current year could include lower scores on yearly appraisals or a history of arriving late to work. The model  can also be used to account for overdispersion in count data, by assuming that there are two types of data points: those with a zero count of probability p, and those with a nonzero count of probability 1-p.

Zero-inflated Poisson distribution PMF

zip distribution histograms
Zip distribution histograms [2].

The probability mass function (PMF) of the ZIP distribution is [3]

zero-inflated poisson distribution pmf

where ≤ π ≤ 1 and λ ≥ 0.

The λ parameter forces the distribution to inflate the zeros; when λ = 0, the zero-inflated Poisson distribution reduces to the Poisson distribution.

Zero-inflated vs. zero-modified distributions

The main difference between a zero-inflated model and a zero-modified model lies in how they handle excess zeroes in data. Zip models separate excess zeroes into components, while zero-modified models modify the count distribution to account for excess zeroes. 

In the zero-inflated distribution, it is assumed that the data contains two types of zeroes with different generating processes:

  • Structural zeros: the true absence of events,
  • Sampling zeroes: a result of chance.

The excess zeroes are modeled separately from non-zero counts using a mixture distribution.

On the other hand, a zero-modified model assumes that the counts follow a known distribution (such as a Poisson distribution or negative binomial distribution). An additional modification term, which accounts for excess zeroes, could be:

  • An additional mass at zero (i.e., there is a probability of generating a zero that is greater than what the distribution would normally generate), or
  • An additional distribution that generates zero counts.

Which model you choose depends on the type of data you’re working with and the research question being addressed.

Zero-inflated Poisson regression

In the last few years,  zero-inflated distributions have become popular in regression analysis. This popularity is perhaps due to Lambert’s influential paper [4], which showed that ZIP regression is better than Poisson regression when it comes to fitting data with many zeros. Zero-inflated models most likely originated from the econometrics field [5].

The main difference between a zero-inflated Poisson distribution and zero-inflated Poisson regression lies in their application. Zero-inflated Poisson distribution is a probability distribution used to model count data with a significant proportion of zero counts. On the other hand, zero-inflated Poisson regression extends the standard Poisson regression to model zero inflation in data. Instead of assuming that the count data follows a standard Poisson, the model assumes that data is generated from a mixture of two processes:

  • One process that generates zero counts
  • One process that generates count data from a Poisson distribution.

Zero-inflated Poisson regression allows for estimation of model parameters as well as testing hypotheses about the significance of predictors.

References

[1] D. Böhning, “Zero-inflated Poisson models and C.A.MAN: a tutorial collection of
evidence”, Biometric Model 40:7 (1998), 833–843. Zbl 0914.62091

[2] Synergy42, CC BY-SA 4.0 https://creativecommons.org/licenses/by-sa/4.0, via Wikimedia Commons

[3] Becket, S. et. al. Zero-inflated Poisson (ZIP) distribution: parameter estimation and applications to
model data from natural calamities. Involve Journal of Mathematics. 2014. Vol 7. No. 6.

[4] Lambert, D. (1992) Zero-inflated Poisson regression, with an application to defects in manufacturing. Technometrics, 34, 1-14.

[5] Riddout, M. Zero-inflated models. Retrieved May 4, 2023 from: https://www.kent.ac.uk/smsas/personal/msr/webfiles/zip/zip.html


Comments? Need to post a correction? Please Contact Us.

Leave a Comment