Probability and Statistics > Excel for statistics > Excel Regression Analysis Output Explained

Previous article: Excel 2013 Regression Analysis How To

Watch the video or read the steps below:

## Excel Regression Analysis Output Explained

In the previous article, I explained how to perform Excel regression analysis. After you’ve gone through the steps, Excel will spit out your results, which will look something like this:

## Excel Regression Analysis Output Explained: Multiple Regression

Here’s a breakdown of what each piece of information in the output means:

## EXCEL REGRESSION ANALYSIS OUTPUT PART ONE: REGRESSION STATISTICS

These are the “Goodness of Fit” measures. They tell you how well the calculated linear regression equation fits your data.

**Multiple R.**This is the correlation coefficient. It tells you how strong the linear relationship is. For example, a value of 1 means a perfect positive relationship and a value of zero means no relationship at all. It is the square root of r squared (see #2).**R squared**. This is r^{2}, the Coefficient of Determination. It tells you how many points fall on the regression line. for example, 80% means that 80% of the variation of y-values around the mean are explained by the x-values. In other words, 80% of the values fit the model.**Adjusted R square.**The adjusted R-square adjusts for the number of terms in a model. You’ll want to use this instead of #2 if you have more than one x variable.**Standard Error of the regression:**An estimate of the standard deviation of the error μ. This is*not*the same as the standard error in descriptive statistics! The standard error of the regression is the precision that the regression coefficient is measured; if the coefficient is large compared to the standard error, then the coefficient is probably different from 0.**Observations**. Number of observations in the sample.

## EXCEL REGRESSION ANALYSIS OUTPUT EXPLAINED PART TWO: ANOVA

- SS = Sum of Squares.
- Regression MS = Regression SS / Regression degrees of freedom.
- Residual MS = mean squared error (Residual SS / Residual degrees of freedom).
- F: Overall F test for the null hypothesis.
- Significance F: The significance associated P-Value.

The second part of output you get in Excel is rarely used, compared to the regression output above. It splits the sum of squares into individual components (see: Residual sum of squares), so it can be harder to use the statistics in any meaningful way. If you’re just doing basic linear regression (and have no desire to delve into individual components) then you can skip this section of the output.

For example, to calculate R^{2} from this table, you would use the following formula:

R^{2} = 1 – residual sum of squares (SS Residual) / Total sum of squares (SS Total).

In the above table, residual sum of squares = 0.0366 and the total sum of squares is 0.75, so:

R^{2} = 1 – 0.0366/0.75=0.9817

## EXCEL REGRESSION ANALYSIS PART THREE: INTERPRET REGRESSION COEFFICIENTS

This section of the table gives you very specific information about the components you chose to put into your data analysis. Therefore the first column (in this case, House / Square Feet) will say something different, according to what data you put into the worksheet. For example, it might say “height”, “income” or whatever variables you chose.

The columns are:

- Coefficient: Gives you the least squares estimate.
- Standard Error: the least squares estimate of the standard error.
- T Statistic: The T Statistic for the null hypothesis vs. the alternate hypothesis.
- P Value: Gives you the p-value for the hypothesis test.
- Lower 95%: The lower boundary for the confidence interval.
- Upper 95%: The upper boundary for the confidence interval.

The most useful part of this section is that it gives you the linear regression equation:

y = mx + b.

y = slope * x + intercept.

For the above table, the equation would be approximately:

y = 3.14 – 0.65X1 + 0.024X2.

**Reference:**: http://cameron.econ.ucdavis.edu/excel/ex61multipleregression.html

If you prefer an online interactive environment to learn R and statistics, this free R Tutorial by Datacamp is a great way to get started. If you’re are somewhat comfortable with R and are interested in going deeper into Statistics, try this Statistics with R track.

This article should clearly credit the source it is based on.

It also introduces additional errors, particularly;

“… and the total sum of squares is 1.6050, so:

R2 = 1 – 0.3950 – 1.6050 = 0.8025.”

Should read;

“… and the total sum of squares is 2, so:

R2 = 1 – 0.3950 / 2 = 0.8025.”

Thanks for spotting the error with the sum of squares. It’s now fixed.

I added credit to the article.

Regards,

S

Hi stepahnie

I have more than 2 variables. my variable is 6. pls tell me how to calculate regresson eqution for more varaibles. I am in urgent need.

Thanks

Irfan

y doesn’t equal slope + intercept * x

it equals slope * x + intercept

also, while i can’t see your original data, it appears there are two variables, so the equation should read:

y = 3.14 – 0.65 X1 + 0.024 X2

where X1 = House

and X2 = sqft

Whoops. Thanks for spotting that. Fixed!

What we interpret about the significance F while interpreting the regression output from Excel ??

I’m not quite understanding your question. Can you give me more information? i.e. are you asking what the F-value is?

Hi Stefanie,

in your video tutorial above you say

“The coefficient of determination tells you how many points, percentage wise, fall on the regression line.“

This is absolutely not the case!!! :-(

The number of points on the regression line is, in fact, *unrelated* to R². Not a single point can be on the regression line and still R² can be close to 1! Conversely, 99% of all points can be exactly on the line; with only one point far off the resulting R² will be very low.

R² is the percentage of explained variance, i.e. the percentage of variance of y that stems from the regression line. For a visualization, draw, for each data point, a vertical line to the regression line; also draw a horizontal line for the mean of y. For each vertical line, take the section between the horizontal line and the regression line. The sum of squares of these sections are the explained variance.

Cheers,

Hans

Another visualization is that

Hi, Hans,

Thanks for your response. I was trying to word it for beginning statistics students who don’t have a clue what variance on a regression line means. In other words, in simple terms. I do agree that the wording as it is may be misleading. I think it would be better stated as “The coefficient of determination

gives you an idea ofhow many points fall on the regression line.“ For example, if ALL the points WERE on the line, that would have a perfect coefficient of determination, right? And if the dots were scattered to the wind (with respect to the line), then there would be an insignificant CoD.I’ve corrected that typo. Thanks for your comment :)

Very good information.

Told me everything I need to know about multiple regression analysis output.

Suggestion: Do you have any articles explained the t-test output or ANOVA output?

I am learning to use MLRA to study variation of wavelength upon some solvent parameters. But when I increase the number of independent variables there appears #NUM! in the in the F, Significance F and P value column. And also the predicted and experimental values remain the same giving R square value exactly equal to 1. I am not a statistics student and I am puzzled. If someone can help and mail me regarding this. I shall be highly obliged.

Regards

Pallavi

Check your inputs. Something, somewhere on the worksheet (i.e. a non-numerical value) is causing that #NUM to appear.

I have 10 responses to be worked out from 5 input variables.

I have a database for 18 runs.

Pl tell me how to proceed for regression analysis.

Also I want to prepare mathematical equations for 10 output responses.

Hello, Shraddha,

It would be much easier to answer your question if you could show the data (a screenshot?). Please post it on our help forum. One of our mods will be happy to help.

Hi! How will I know if there is a significant difference? Like for instance, I got 0.402 as my significance F. What does it mean?

This should help: What is the F Statistic?

Great video. It’s nice to have this information in one spot. Also like how you highlighted the results.

Thanks, Andy! Glad you found it helpful.

Below are the results for a 3rd order polynomial regression and a logarithmic regression using the same data:

Data (very small sample only to illustrate variable position in Excel)

X1 X2 X3 Y

14.8 26 36.7 20.8

14.5 26 116.4 22.998

Polynomial = Linest(y,x^{1,2,3,,true}) shift + ctrl + enter

2.635E-09 0.0561 -1.4218 25.584 (1st row of stats output)

Logarithmic = Linest(y, Ln(x),,true) shift + ctrl + enter

8.384 112.62 -28.17 -293.56

Can you assist me to obtain the relevant equations for the regression models please?

Ian

http://www.statisticshowto.com/how-to-find-a-linear-regression-equation/

GREAT VIDEO! I wish you were my Analytics Teacher.

hi, thanks for this explanation but i need more conclusion on how to interpret by words.

Hi, Patrice,

There’s a lot of information in the output! Can you give me an idea of where you would like to see more of an explanation?

pls i want to calculate for t-test,f-test and ordinary least square method, how will i do it in excel

Hi, Grace,

You can search for Excel articles using the search box at the top right. For example, here’s the link to t test in Excel.

regession