Regression Analysis > Residual Plot
What is a Residual Plot?
A residual value is a measure of how much a regression line vertically misses a data point. Regression lines are the best fit of a set of data. You can think of the lines as averages; a few data points will fit the line and others will miss. A residual plot has the Residual Values on the vertical axis; the horizontal axis displays the independent variable.
A residual plot is typically used to find problems with regression. Some data sets are not good candidates for regression, including:
- Heteroscedastic data (points at widely varying distances from the line).
- Data that is non-linearly associated.
- Data sets with outliers.
These problems are more easily seen with a residual plot than by looking at a plot of the original data set. Ideally, residual values should be equally and randomly spaced around the horizontal axis.
If your plot looks like any of the following images, then your data set is probably not a good fit for regression.
The residual plot itself doesn’t have a predictive value (it isn’t a regression line), so if you look at your plot of residuals and you can predict residual values that aren’t showing, that’s a sign you need to rethink your model. For example, in the image above, the quadratic function enables you to predict where other data points might fall. For residual plots, that’s not a good thing. If your plot indicates a problem, there can be several reasons why regression isn’t suitable. It doesn’t always mean throwing out your model completely, it could be something simple, like:
- Missing higher-order variable terms that explain a non-linear pattern.
- Missing interaction between terms in your existing model.
- Missing variables.
Need help with a homework or test question? With Chegg Study, you can get step-by-step solutions to your questions from an expert in the field. If you'd rather get 1:1 study help, Chegg Tutors offers 30 minutes of free tutoring to new users, so you can try them out before committing to a subscription.
If you prefer an online interactive environment to learn R and statistics, this free R Tutorial by Datacamp is a great way to get started. If you're are somewhat comfortable with R and are interested in going deeper into Statistics, try this Statistics with R track.
Comments? Need to post a correction? Please post a comment on our Facebook page.