Statistics How To

Residual Plot: Definition and Examples

Regression Analysis > Residual Plot

What is a Residual Plot?

A residual value is a measure of how much a regression line vertically misses a data point. Regression lines are the best fit of a set of data. You can think of the lines as averages; a few data points will fit the line and others will miss. A residual plot has the Residual Values on the vertical axis; the horizontal axis displays the independent variable.
residual plot


A residual plot is typically used to find problems with regression. Some data sets are not good candidates for regression, including:

  • Heteroscedastic data (points at widely varying distances from the line).
  • Data that is non-linearly associated.
  • Data sets with outliers.

These problems are more easily seen with a residual plot than by looking at a plot of the original data set. Ideally, residual values should be equally and randomly spaced around the horizontal axis.

If your plot looks like any of the following images, then your data set is probably not a good fit for regression.

This plot of absolute residuals vs Y-hat clearly shows a heteroscedastic pattern.

This plot of absolute residuals vs Y-hat clearly shows a heteroscedastic (cone-shaped) pattern. Image: UCLA




The outlier is clearly apparent in this residual plot. Image: PSU.edu

The outlier is clearly apparent in this residual plot. Image: PSU.edu




A non-linear pattern. Image: OregonState.

A non-linear pattern. Image: OregonState.

The residual plot itself doesn’t have a predictive value (it isn’t a regression line), so if you look at your plot of residuals and you can predict residual values that aren’t showing, that’s a sign you need to rethink your model. For example, in the image above, the quadratic function enables you to predict where other data points might fall. For residual plots, that’s not a good thing. If your plot indicates a problem, there can be several reasons why regression isn’t suitable. It doesn’t always mean throwing out your model completely, it could be something simple, like:

  • Missing higher-order variable terms that explain a non-linear pattern.
  • Missing interaction between terms in your existing model.
  • Missing variables.
------------------------------------------------------------------------------

If you prefer an online interactive environment to learn R and statistics, this free R Tutorial by Datacamp is a great way to get started. If you're are somewhat comfortable with R and are interested in going deeper into Statistics, try this Statistics with R track.

Comments are now closed for this post. Need help or want to post a correction? Please post a comment on our Facebook page and I'll do my best to help!
Residual Plot: Definition and Examples was last modified: October 12th, 2017 by Stephanie Glen

One thought on “Residual Plot: Definition and Examples