Generalized Linear Model (GLZ): An Overview

Regression analysis > Generalized Linear Model

What is the Generalized Linear Model?

The generalized linear model (GLZ) is a way to make predictions from sets of data. It takes the idea of a general linear model (for example, a linear regression equation) a step further. A general linear model (GLM) is the type of model you probably came across in elementary statistics. Ordinary least squares regression is one example of a GLM. They are also found in ANOVA and T Tests. The generalized linear model on the other hand, is much more complex, drawing from an array of different probability distributions to find the “best fit” model. The model uses, among other techniques, Bayesian hypothesis testing to predict outcomes.

Why is the Generalized Linear Model Needed?

Regular linear regression predicts that a constant change in one variable (x) will lead to a constant change in another variable (y). When the data fits a normal distribution, this type of model works well. Unfortunately, many different types of data to not fit this simple model very well at all. Here’s an example:

Your model predicts that in a certain city, for every degree difference in temperature, 100 more ice-cream cones are sold. At 80 degrees, 1,000 people in the city buy ice cream.

This sounds reasonable. If 1,000 buy ice cream when it’s 80 degrees out, you can certainly see 2,000 people buying ice cream when it’s 90 degrees. But going the other way on the model: when it’s 60 degrees out, -1,000 people buy ice creams. That doesn’t make any sense at all. A more logical model would show an increase in sales over a certain temperature and a decrease below a certain point.

The Generalized Linear Model and Probability Distributions.

The generalized linear model extends simple linear regression by allowing each outcome of the dependent variable (y) to come from a large range of probability distributions. These include:

Elements of the Generalized Linear Model.

Three elements make up the generalized linear model:

A probability distribution from the exponential family (as outlined above).
A linear predictor η = Xβ . The linear predictor gives you information about the model’s independent variables.
The link function relates the linear predictor to the expected value.