Probability and Statistics > Regression Analysis > Multiple Regression Analysis

## What is Multiple Regression analysis?

Multiple regression analysis is used to see if there is a statistically significant relationship between sets of variables. It’s used to find trends in those sets of data.

Multiple regression analysis is *almost* the same as simple linear regression. The only difference between simple linear regression and multiple regression is in the number of predictors (“x” variables) used in the regression.

- Simple regression analysis uses a single x variable for each dependent “y” variable. For example: (x
_{1}, Y_{1}). - Multiple regression uses multiple “x” variables for each independent variable: (x1)
_{1}, (x2)_{1}, (x3)_{1}, Y_{1}).

In one-variable linear regression, you would input one dependent variable (i.e. “sales”) against an independent variable (i.e. “profit”). But you might be interested in how **different types** of sales effect the regression. You could set your X_{1} as one type of sales, your X_{2} as another type of sales and so on.

## When to Use Multiple Regression Analysis.

Ordinary linear regression usually isn’t enough to take into account all of the real-life factors that have an effect on an outcome. For example, the following graph plots a single variable (number of doctors) against another variable (life-expectancy of women).

From this graph it might appear there is a relationship between life-expectancy of women and the number of doctors in the population. In fact, that’s probably true and you could say it’s a simple fix: put more doctors into the population to increase life expectancy. But the reality is you would have to look at other factors like the possibility that doctors in rural areas might have less education or experience. Or perhaps they have a lack of access to medical facilities like trauma centers.

The addition of those extra factors would cause you to add additional dependent variables to your regression analysis and create a multiple regression analysis model.

## Multiple Regression Analysis Output.

Regression analysis is always performed in software, like Excel or SPSS. The output differs according to how many variables you have but it’s essentially the same type of output you would find in a simple linear regression. There’s just more of it:

- Simple regression: Y = b
_{0}+ b_{1}x. - Multiple regression: Y = b
_{0}+ b_{1}x1 + b_{0}+ b_{1}x2…b_{0}…b_{1}xn.

The output would include a summary, similar to a summary for simple linear regression, that includes R (the multiple correlation coefficient), R squared (the coefficient of determination), adjusted R-squared, and the standard error of the estimate to help you determine how well a regression model fits the data. The ANOVA table in the output would give you the p-value and f-statistic.

## Minimum Sample size

“The answer to the sample size question appears to depend in part on the objectives

of the researcher, the research questions that are being addressed, and the type of

model being utilized. Although there are several research articles and textbooks giving

recommendations for minimum sample sizes for multiple regression, few agree

on how large is large enough and not many address the prediction side of MLR.” ~ Gregory T. Knofczynski

If you’re concerned with finding accurate values for squared multiple correlation coefficient, minimizing the

shrinkage of the squared multiple correlation coefficient or have another specific goal, Gregory Knofczynski’s paper is a worthwhile read and comes with lots of references for further study. That said, many people just want to run MLS to get a general idea of trends and they don’t need very specific estimates. If that’s the case, you can use a **rule of thumb**. It’s widely stated in the literature that you should have more than 100 items in your sample. While this is sometimes adequate, you’ll be on the safer side if you have at least 200 observations or better yet — more than 400.

In my research I have designed 3 chemical release structures and have measured the concentration of chemical released into a beaker of water by each structure and by my control daily. I think a multiple regression analysis should be used to compare the amount of chemical released by my designs to the control, but I am not 100% sure. I am ultimately trying to determine if the amount of chemical released by my designs is statistically similar to my control. Any advice?

Hi, Sydney,

Due to the volume of questions I get in the comments, I’m unable to answer every one. Can you please post this on our statistics help forum? One of our mods will be happy to help.

Thanks!

am Evarson Julius a third year student at institute of accountancy arusha (Tanzania) pursuing bachelor degree in economics and finance.

i am asking a help on how to make an interpretation upon simple and multiple regression analysis.