Statistics How To

Regression Analysis: Linear Regression, Scatter Plots

Probability and Statistics > Regression analysis

regression analysis

Image: Columbia University

Regression analysis is used in stats to find trends in data. For example, you might guess that there’s a connection between how much you eat and how much you weigh; regression analysis can help you quantify that. Regression analysis will provide you with an equation for a graph so that you can make predictions about your data. For example, if you’ve been putting on weight over the last few years, it can predict how much you’ll weigh in ten years time if you continue to put on weight at the same rate. It will also give you a slew of statistics (including a p-value and a correlation coefficient) to tell you how accurate your model is. Most elementary stats courses cover very basic techniques, like making scatter plots and performing linear regression. However, you may come across more advanced techniques like multiple regression.

Regression Analysis: An Introduction

In statistics, it’s hard to stare at a set of random numbers in a table and try to make any sense of it. For example, global warming may be reducing average snowfall in your town and you are asked to predict how much snow you think will fall this year. Looking at the following table you might guess somewhere around 10-20 inches. That’s a good guess, but you could make a better guess, by using regression.
regression 1

Essentially, regression is the “best guess” at using a set of data to make some kind of prediction. It’s fitting a set of points to a graph. There’s a whole host of tools that can run regression for you, including Excel, which I used here to help make sense of that snowfall data:
regression 2
Just by looking at the regression line running down through the data, you can fine tune your best guess a bit. You can see that the original guess (20 inches or so) was way off. For 2015, it looks like the line will be somewhere between 5 and 10 inches! That might be “good enough”, but regression also gives you a useful equation, which for this chart is:
y = -2.2923x + 4624.4.
What that means is you can plug in an x value (the year) and get a pretty good estimate of snowfall for any year. For example, 2005:
y = -2.2923(2005) + 4624.4 = 28.3385 inches, which is pretty close to the actual figure of 30 inches for that year.

Best of all, you can use the equation to make predictions. For example, how much snow will fall in 2017?
y = 2.2923(2017) + 4624.4 = 0.8 inches.

Regression also gives you an R squared value, which for this graph is 0.702. This number tells you how good your model is. The values range from 0 to 1, with 0 being a terrible model and 1 being a perfect model. As you can probably see, 0.7 is a fairly decent model so you can be fairly confident in your weather prediction!

Regression Analysis How to Articles.

  1. How to Construct a Scatter Plot.
  2. How to Calculate Pearson’s Correlation Coefficients.
  3. How to Compute a Linear Regression Test Value.
  4. How to Find the Coefficient of Determination.
  5. Chow Test for Split Data Sets
  6. How to Find a Linear Regression Equation.
  7. How to Find a Regression Slope Intercept.
  8. How to Find a Linear Regression Slope.
  9. How to Find the Standard Error of Regression Slope.
  10. Validity Coefficient: What it is and how to find it.
  11. Quadratic Regression.
  12. Stepwise Regression

Definitions

  1. Assumptions and Conditions for Regression.
  2. Betas / Standardized Coefficients.
  3. What is a Beta Weight?
  4. The Breusch-Pagan-Godfrey Test
  5. What is the Correlation Coefficient Formula?
  6. Cook’s Distance.
  7. What is a Covariate?
  8. Detrend Data.
  9. What is the General Linear Model?
  10. What is the Generalized Linear Model?
  11. What is the Hausman Test?
  12. What is Homoscedasticity?
  13. What is an Instrumental Variable?
  14. Lasso Regression.
  15. What is a Linear Relationship?
  16. What is the Line of best fit?
  17. What is Logistic Regression?
  18. Model Misspecification.
  19. Multinomial Logistic Regression.
  20. What is Multiple Regression analysis?
  21. What is Nonlinear Regression?
  22. What is Ordinary Least Squares Regression
  23. Overfitting.
  24. Parsimonious Models.
  25. What is Pearson’s Correlation Coefficient?
  26. Poisson Regression.
  27. Probit Model.
  28. What is a Prediction Interval?
  29. What is Regularization?
  30. What are Residual Plots?
  31. Reverse Causality.
  32. Root Mean Square Error.
  33. Simultaneity Bias.
  34. Simultaneous Equations Model.
  35. What is Spurious Correlation?
  36. What are Tolerance Intervals?
  37. What is Weighted Least Squares Regression?

Check out our YouTube channel for hundreds of videos on elementary statistics, including regression analysis using a variety of tools like Excel and the TI-83.

Regression Analysis: Linear Regression, Scatter Plots was last modified: January 26th, 2017 by Andale

5 thoughts on “Regression Analysis: Linear Regression, Scatter Plots

  1. Sekeli Maboshe

    PLEASE SHOW ME HOW TO DETRED THIS DATA AS (III) BELOW REQUIRES
    The total annual fertilizer consumption in thousands of tonnes during 1995-2001 in XYZ Province of Zambia was recorded as given in the table below.
    Year
    1995
    1996
    1997
    1998
    1999
    2000
    2001
    Consumption
    50
    56
    60
    68
    70
    75
    78
    (i) Fit a straight line trend by the method of least squares and compute the trend quantities.
    (ii) What has been the annual increase in fertiliser consumption?
    (iii) Eliminate the trend variation from the fertilizer consumption data.

  2. Andale Post author

    He Sekeli, just calculate the least squares regression and then subtract the differences for the data points from the trendline. See here.

  3. Sekeli Maboshe

    Hi Andale,
    Am failing to understand the improvement that a three-variable linear regression analysis make over the two-variable case? please Explain.

  4. Andale Post author

    If you have three predictor variables you should be running multiple regression to take into account all of the independent variables in your model. If you only choose two, you’re leaving out info.

  5. FM

    Really helpful website, thank you for simplifying “things”, well explained… makes all the difference!

Leave a Reply

Your email address will not be published. Required fields are marked *