Probability and Statistics > Regression Analysis > Find a Linear Regression Equation

**Contents:**

- How to Find a Linear Regression Equation: Overview.
- Find a Linear Regression Equation in Excel.
- Leverage in Linear Regression.
- TI83 Linear Regression.
- How to Find the Regression Coefficient.

## How to Find a Linear Regression Equation: Overview

**Regression analysis** is used to find equations that fit data. Once we have the regression equation, we can use the model to make predictions. One type of regression analysis is linear analysis. When a **correlation coefficient** shows that data is likely to be able to predict future outcomes and a scatter plot of the data appears to form a straight line, you can use simple linear regression to find a predictive function. If you recall from elementary algebra, the equation for a line is **y = mx + b**. This article shows you how to take data, calculate linear regression, and find the equation **y’ = a + bx**. **Note**: If you’re taking AP statistics, you may see the equation written as b_{0} + b_{1}x, which is the same thing (you’re just using the variables b_{0} + b_{1} instead of a + b.

Watch the video or read the steps below to find a linear regression equation by hand. Scroll to the bottom of the page if you would prefer to use Excel:

## The Linear Regression Equation

Linear regression is a way to model the relationship between two variables. You might also recognize the equation as the **slope formula**. The equation has the form Y=a+bX, where Y is the dependent variable (that’s the variable that goes on the Y axis), X is the independent variable (i.e. it is plotted on the X axis), b is the slope of the line and a is the y-intercept.

The first step in finding a linear regression equation is to determine if there is a relationship between the two variables. This is often a judgment call for the researcher. You’ll also need a list of your data in x-y format (i.e. two columns of data — independent and dependent variables).

**Warnings:**

- Just because two variables are related, it does not mean that one
*causes*the other. For example, although there is a relationship between high GRE scores and better performance in grad school, it doesn’t mean that high GRE scores**cause**good grad school performance. - If you attempt to try and find a linear regression equation for a set of data (especially through an automated program like Excel or a TI-83), you
*will*find one, but it does not necessarily mean the equation is a good fit for your data. One technique is to make a scatter plot first, to see if the data roughly fits a line*before*you try to find a linear regression equation.

## How to Find a Linear Regression Equation: Steps

**Step 1:** *Make a chart of your data, filling in the columns in the same way as you would fill in the chart if you were finding the Pearson’s Correlation Coefficient.*

Subject | Age x | Glucose Level y | xy | x^{2} |
y^{2} |
---|---|---|---|---|---|

1 | 43 | 99 | 4257 | 1849 | 9801 |

2 | 21 | 65 | 1365 | 441 | 4225 |

3 | 25 | 79 | 1975 | 625 | 6241 |

4 | 42 | 75 | 3150 | 1764 | 5625 |

5 | 57 | 87 | 4959 | 3249 | 7569 |

6 | 59 | 81 | 4779 | 3481 | 6561 |

Σ | 247 | 486 | 20485 | 11409 | 40022 |

From the above table, Σx = 247, Σy = 486, Σxy = 20485, Σx2 = 11409, Σy2 = 40022. n is the sample size (6, in our case).

**Step 2:** Use the following equations to find a and b.

a = **65.1416**

b = **.385225**

Click here if you want easy, step-by-step instructions for solving this formula.

**Find a**:

- ((486 × 11,409) – ((247 × 20,485)) / 6 (11,409) – 247
^{2}) - 484979 / 7445
- =
**65.14**

**Find b**:

- (6(20,485) – (247 × 486)) / (6 (11409) – 247
^{2}) - (122,910 – 120,042) / 68,454 – 247
^{2} - 2,868 / 7,445
- =
**.385225**

**Step 3:** *Insert the values into the equation*.

y’ = a + bx

**y’ = 65.14 + .385225x**

*That’s how to find a linear regression equation by hand!
*

Like the explanation? Check out the Practically Cheating Statistics Handbook, which has hundreds more step-by-step solutions, just like this one!

* **Note** that this example has a low correlation coefficient, and therefore wouldn’t be too good at predicting anything.

## Find a Linear Regression Equation in Excel

Watch the video or read the steps below:

## Linear Regression Equation Microsoft Excel: Steps

Step 1: **Install the Data Analysis Toolpak**, if it isn’t already installed. For instructions on how to load the Data Analysis Toolpak, click here.

Step 2: **Type your data into two columns in Excel.** For example, type your “x” data into column A and your “y” data into column b. Do not leave any blank cells between your entries.

Step 3: **Click the “Data Analysis” tab **on the Excel toolbar.

Step 4: **Click “regression” **in the pop up window and then click “OK.”

Step 5: **Select your input Y range.** You can do this two ways: either select the data in the worksheet or type the location of your data into the “Input Y Range box.” For example, if your Y data is in A2 through A10 then type “A2:A10” into the Input Y Range box.

Step 6: **Select your input X range **by selecting the data in the worksheet or typing the location of your data into the “Input X Range box.”

Step 7: **Select the location where you want your output range **to go by selecting a blank area in the worksheet or typing the location of where you want your data to go in the “Output Range” box.

Step 8: **Click “OK”.** Excel will calculate the linear regression and populate your worksheet with the results.

Tip: The linear regression equation information is given in the last output set (the coefficients column). The first entry in the “Intercept” row is “a” (the y-intercept) and the first entry in the “X” column is “b” (the slope).

## Leverage in Linear Regression

Data points that have leverage have the potential to move a linear regression line. They tend to be outliers. An outlier is a point that is either an extremely high or extremely low value.

### Influential Points

If the parameter estimates (sample standard deviation, variance etc.) change significantly when an outlier is removed, that data point is called *influential*.

The more a data point differs from the mean of the other x-values, the more *leverage* it has. The more leverage a point is, the higher the probability that point will be *influential* (i.e. it could change the parameter estimates).

## Leverage in Linear Regression: How it Affects Graphs

In linear regression, the influential point (outlier) will try to pull the linear regression line toward itself. The graph below shows what happens to a linear regression line when outlier A is included:

Outliers with **extreme X values** (values that aren’t within the range of the other data points) have more leverage in linear regression than points with less extreme x values. In other words, **extreme x-value outliers will move the line more** than less extreme values.

The following graph shows a data point outside of the range of the other values. The values range from 0 to about 70,000. This one point has an x-value of about 80,000 which is outside the range. It affects the regression line a lot more than the point in the first image above, which was inside the range of the other values.

In general outliers that have values close to the mean of x will have less leverage that outliers towards the edges of the range. Outliers with values of x outside of the range will have more leverage. Values that are extreme on the y-axis (compared to the other values) will have more influence than values closer to the other y-values.

## TI83 Linear Regression

Watch the video or read the steps below:

## TI 83 Linear Regression: Overview

**Linear regression** is tedious and prone to errors when done by hand, but you can perform linear regression in the time it takes you to input a few variables into a list. **Linear regression** will only give you a reasonable result if your data looks like a line on a scatter plot, so before you find the equation for a **linear regression line** you may want to view the data on a scatter plot first. See this article for how to make a scatter plot on the TI 83.

## TI 83 Linear Regression: Steps

Sample problem: Find a linear regression equation (of the form y = ax + b) for x-values of 1, 2, 3, 4, 5 and y-values of 3, 9, 27, 64, and 102.

**Step 1:** Press STAT, then press ENTER to enter the lists screen. If you already have data in L1 or L2, clear the data: move the cursor onto L1, press CLEAR and then ENTER. Repeat for L2.

**Step 2:** *Enter your x-variables, one at a time.* Follow each number by pressing the ENTER key. For our list, you would enter:

1 ENTER

2 ENTER

3 ENTER

4 ENTER

5 ENTER

**Step 3:** Use the arrow keys to scroll across to the next column, L2.

**Step 4:** Enter your y-variables, one at a time. Follow each number by pressing the enter key. For our list, you would enter:

3 ENTER

9 ENTER

27 ENTER

64 ENTER

102 ENTER

**Step 5:** Press the STAT button, then use the scroll key to highlight “CALC.”

**Step 6:** Press 4 to choose “LinReg(ax+b)”. Press ENTER and then ENTER again. The TI 83 will return the variables needed for the equation. Just insert the given variables (a, b) into the equation for linear regression (y=ax+b). For the above data, this is **y = 25.3x – 34.9**.

That’s how to perform TI 83 Linear Regression!

## How to Find the Regression Coefficient

A regression coefficient is the same thing as the **slope of the line of the regression equation**. The equation for the regression coefficient that you’ll find on the AP Statistics test is: B_{1} = b_{1} = Σ [ (x_{i} – x)(y_{i} – y) ] / Σ [ (x_{i} – x)^{2}]. “y” in this equation is the mean of y and “x” is the mean of x.

You could find the regression coefficient by hand (as outlined in the section at the top of this page).

However, you won’t have to calculate the regression coefficient by hand in the AP test — you’ll use your TI-83 calculator. Why? Calculating linear regression by hand is very time consuming (allow yourself about 30 minutes to do the calculations and check them) and because of the *huge* number of calculations you have to make you’re very likely to make mathematical errors. When you find a linear regression equation on the TI83, you get the regression coefficient as part of the answer.

**Sample problem**: Find the regression coefficient for the following set of data:

x: 1, 2, 3, 4, 5.

y: 3, 9, 27, 64, 102.

**Step 1:** Press STAT, then press ENTER to enter LISTS. You may need to clear data if you already have numbers in L1 or L2. To clear the data: move the cursor onto L1, press CLEAR and then ENTER. Repeat for L2 if you need to.

**Step 2:** *Enter your x-data into a list.* Press the ENTER key after each entry.

1 ENTER

2 ENTER

3 ENTER

4 ENTER

5 ENTER

**Step 3:** Scroll across to the next column, L2 using the arrow keys at the top right of the keypad.

**Step 4:** Enter the y-data:

3 ENTER

9 ENTER

27 ENTER

64 ENTER

102 ENTER

**Step 5:** Press the STAT button, then scroll to highlight “CALC.” Press ENTER

**Step 6:** Press 4 to choose “LinReg(ax+b)”. Press ENTER. The TI 83 will return the variables needed for the linear regression equation. The value you’re looking for — the regression coefficient — is b, which is **25.3 **for this set of data.

*That’s it!*

Like the videos? Subscribe to our Youtube Channel.

If you prefer an online interactive environment to learn R and statistics, this free R Tutorial by Datacamp is a great way to get started. If you’re are somewhat comfortable with R and are interested in going deeper into Statistics, try this Statistics with R track.

Hei!

Let’s say I’ve plotted y in a scatter plot to make some prediction. I know that it’s possible to find functions indicating the “upper and lower” bound for y for a given x with a given percent certainty.

I’ve done some statistics in Norwegian, but I’m not sure about the English term for this. Hope someone understood what I’m trying to do.

Thanks!

Kimble, What you are describing is linear regression — I think you’re thinking of confidence interval. Stephanie

Your explination was very helpful. I kept messing up when i put it into the calculator. I had to redo it over and overto get the answer in the ex then follow that process in my problem…

These questions were somewhat difficult but I tried to do my best and hopefully I will understand them as I take my test. Some of this information is not sinking in my head for some reason. I know I will never take a class like this again on line.

I have to agree with Lisa that I wouldn’t take another math class online. Math is not my strong suit. And, I realize now that I should have been in a classroom setting where I could ask questions face to face. This blog has been very helpful. But, unfortunately it doesn’t make up for my lack of math skills.

I was having trouble in chapter nine with the correlation coefficient, I couldn’t come up with the same answer as Math Zone using the formula to solve for r. The problems don’t show a square route in the denominator but the link to text does. I don’t know if my skills are lacking so much that I just didn’t know that but why would it show it in the text?

There is a square root: Mathzone is missing it in a couple of the questions.

That’s what it is. Mathzone doesn’t explain that to solce for r, you need to square root it. Now, I’m getting it, but before I was so confused. Thanks for this!

I found this information very helpful and easy to follow. Compared to chapter 8, so far this has been a breeze.

Thank you for posting it on the main blackboard page that the square root was missing. This saved a lot of time and headache.

On this one I found it easier (yes I know lazier) to put it into my calculator. It took me a while to remember how to use my calculator, but once I remembered, I got it down. I’m not sure if there is a part on this site that would give some directions on how to use our calculators for these problems?

This blog was very helpful and it was very easy to follow, I agree with the others that taking math online is way more difficult, but thanks to this blog its a little bit easier.

Hey guys,

I’m from the better USA! and I think I found a new spot to hangout

So, anyone care about the Olympic Games?

I found this aspect fairly easy once I got the equation down. However, I also realized that you can use the linreg function on the calculator and get the same answer.

so i tried using the linreg2 function: as in, A=linreg2(x,y), given both x and y.

sooo what should that tell me. coz im running this in matlab and it doesnt seem to know what it is?!?!?

Your calculations for a are wrong.

I did your example as well as I used the trend line on MS Excel and I found on both cases that a=65.14

:)

You are absolutely correct! I’ve updated the page: thanks!

Stephanie

how to make sure which one is X and which one is y ?

Hello!

Can you post this question on the forum? One of our moderators will be glad to help :)

Ok, what am I missing? I never had statistics until college and I don’t remember us discussing this. I compute the same linear regression on my TI-84 Plus. I plug this equation into my y = table, and then use set my calculator to ask for the independent variable and then when I plug in a value of 43, I get 81.705. I can also see in my head that 38% of 43 added to 65 is not going to be equal to 99. I think what is missing in this discussion is the linear regression does its best attempt to draw a straight line through data and form the “best” equation to make predictions. You can see with any two points in the table above, computing slope, and using point-slope form, you will get a different equation. You can take the y=a + bx equation into your calculator as an estimate of y for a “new” given value of x but do not expect your original x-values to return the same y-values (in some cases, this function will return a value less than the original y and in some cases greater than the original y). I think it is also very confusing to use the form y=a+bx when algebra students are so used to y=ax + b.

Also another question is in general for a given set of data, how does one know whether to use a linear, quadratic, cubic, quartic, or other form of regression. The only way I have been able to eliminate a linear regression so far, for example, is if my data contains the point (0,0) and my linear regression calculation gives a non-zero b-value then I try the next higher regression until I find an equation where the constant term is 0 so that I get a true statement i.e., 0=0 when I plug in 0 for x.

Thanks for your thoughts, Doug. This is an article about basic steps — I leave the discussions for the (bloated) textbooks. Stronger algebra students will see that y=a+bx is equal to what they are seeing. Sure, we could stick to the same equation throughout college, but why not expand minds a little? ;)

Doug,

Sometimes you don’t know — it’s a guess and check. That said, if you have the data points in a TI-89 you can guess and check pretty easily (plot both the points and the resulting graph on the same screen and see which equation fits the data best).

Stephanie

Sheit statiscs is hard!

Lol! Not really :)

this site has been very helpful, i am so glad that i found something very helpful for my Statistics test, i never knew statistics would be easy, unless you understand the basics, this site is Awesome!!!!!!

Thanks! Glad it helps :)

thx so much…i had done my homework..

Thank you so much, I had learn enough though ! to those who didn’t understand, try to analyze those E xys’ It’s pretty easy though ! I learned a lot :)

Does the linear regression slope value related to the tan(theta) value of the experimental line

U.Muralikrishna,

Time constraints prevent me from answering stats questions in the comments…but post on our forums and our mod will be happy to help :)

Stephanie

whatz the standard form of the equation? y = a + bx or ax + by = c ??

The standard form is usually ax + by=c.

Stephanie

Hi,

I have one doubt. For the perfect Linear Graph y=mx+c, the m value should be close to 1 and c value should be close to 0.

In the same way, for the perfect Linear Regression equation y=a+bx, the b value should be close to 1 and a value should be close to 0.

Is this right?

Hello, Pramod,

I’m not sure what you mean by the “Perfect” linear graph or linear regression equation. For the equation y=mx+c, any m value would still result in a linear graph. Could you explain what you mean by perfect?

Regards,

Stephanie

this was very helpfull,

how in the hell do you do this

hello :3

This video was incredibly helpful and easy to follow! Thank you so much for this information! This is extremely valuable to a college student who works full time and often has to miss lectures!

This video was incredibly helpful and easy to follow! Thank you so much for this information and simple teaching!

Thanks, Taylor! Good luck with the rest of your class.

how about computing for:

Y = a + bX1 + bX2 + … + bXn + e

I need to compute whether a particular dependent variable (X) is significant or not, just like in spss’s p-value (sig). thank you.

Hi,

What criteria i can keep to understand whether regression is correct or not? I mean how i can understand whether R Sq is co-related or not.

In given video R Square =0.99 and it was considered as highly co-related so i want to understand at what point i can consider the value is not co-related.

Shrikant,

R squared (the coefficient of determination) gives you a rough idea of how well your model fits the points. I’m not exactly sure what video you were looking at that show r-squared of 0.99, but that would mean the model is a very good fit. The closer you get to 1, the more points are going to fit the line. If you get close to zero, no points fit the line. I would say anything about .7 is a pretty good model. Others may disagree.

=Does that answer your question? If not, could you clarify a bit more about what you need.

Regards,

S.

Don’t know if my previous comment/question was actually posted. (It was sent btwn 1:30PM & 2:00PM today.)

If so, you can ignore it – I found the answer. Bill.

hey i found this page trully useful and i wanted to say thank you,

Love this!

thanks

i found this videos so helpful,

but i wonder how we could predict the threshold values (Minimum and maximum )?

i am looking forward for your reply.

thanks again

What are you trying to find threshold values for? a and b? residuals? By “threshold values” do you mean confidence interval, or prediction interval?

we are looking for the threshold value ” prediction interval”

for example:

x value (feed intake): 40-50-80-90-60-42-82

y value (feed deposition): 25-30-26-28-39-30-42

prediction of maximum value for “y” that consider a maximum threshold value using regression (by Levenberg-Marquardt algorithm within spss) or any other suitable method.

thanks and appreciate your reply

Follow the instructions at the bottom of this article: Prediction intervals.