Regression Analysis > Poisson Regression

## What is Poisson Regression?

Poisson regression is used to model response variables (Y-values) that are counts. It tells you which explanatory variables have a statistically significant effect on the response variable. In other words, it tells you which X-values work on the Y-value. It’s best used for rare events, as these tend to follow a Poisson distribution (as opposed to more common events which tend to be normally distributed). For example:- Number of colds contracted on airplanes.
- Number of bacteria found in a petri dish.
- Counts of catastrophic computer failures at a large tech firm in a calendar year.
- Number of 911 calls that end in the death of a suspect.

For large means, the normal distribution is a good approximation for the Poisson distribution. Therefore, Poisson regression is more suited to cases where the response variable is a small integer.

Poisson regression is only used for numerical, continuous data. The same technique can be used for modeling categorical explanatory variables or counts in the cells of a contingency table. When used in this way, the models are called **loglinear models**.

## Assumptions

The assumptions for Poisson regression are:

**Y-values are counts.**If your response variables aren’t counts, Poisson regression is not a good method to use.**Counts must be positive integers**(i.e. whole numbers) 0 or greater (0,1,2,3…k). The technique will not work with fractions or negative numbers, because the Poisson distribution is a discrete distribution.**Counts must follow a Poisson distribution.**Therefore, the mean and variance should be the same.**Explanatory variables must be continuous, dichotomous or ordinal.****Observations must be independent.**

## Running the Test

Poisson regression involves estimating the regression coefficients using maximum likelihood. These complex calculations aren’t usually performed by hand, but most statistical packages include a procedure.

**R**: The classical Poisson uses a generalized linear model (GLM); use the glm() function in the stats package and the glm.nb() function in the MASS package.**STATA**: Use the Poisson command. From the menu: Statistics > Count outcomes > Poisson regression.

**Reference**:

Zeileis, A. Regression Models for Count Data in R. Retrieved September 9, 2016 from:

https://cran.r-project.org/web/packages/pscl/vignettes/countreg.pdf

If you prefer an online interactive environment to learn R and statistics, this free R Tutorial by Datacamp is a great way to get started. If you're are somewhat comfortable with R and are interested in going deeper into Statistics, try this Statistics with R track.

Comments are now closed for this post. Need help or want to post a correction? Please post a comment on our Facebook page and I'll do my best to help!