Probability and Statistics > Regression Analysis > Pearson’s Correlation Coefficients

Contents (Click to skip to the section):

- How to Find Pearson’s Correlation Coefficients.
- How to test a correlation coefficient.
- What Does the Correlation Coefficient Mean?
- Cramer’s V Correlation
- Where did the Correlation Coefficient Come From?

## How to Find Pearson’s Correlation Coefficients

**Correlation coefficients** are used in statistics to measure how strong a relationship is between two variables. There are several types of correlation coefficient: Pearson’s correlation or Pearson correlation is a **correlation coefficient** commonly used in linear regression.

**Sample question**: Find the value of the correlation coefficient from the following table:

Subject | Age x | Glucose Level y |
---|---|---|

1 | 43 | 99 |

2 | 21 | 65 |

3 | 25 | 79 |

4 | 42 | 75 |

5 | 57 | 87 |

6 | 59 | 81 |

**Step 1:***Make a chart.* Use the given data, and add three more columns: xy, x^{2}, and y^{2}.

Subject | Age x | Glucose Level y | xy | x^{2} |
y^{2} |
---|---|---|---|---|---|

1 | 43 | 99 | |||

2 | 21 | 65 | |||

3 | 25 | 79 | |||

4 | 42 | 75 | |||

5 | 57 | 87 | |||

6 | 59 | 81 |

**Step 2:**:*Multiply x and y together to fill the xy column. For example, row 1 would be 43 × 99 = 4,257. *

Subject | Age x | Glucose Level y | xy | x^{2} |
y^{2} |
---|---|---|---|---|---|

1 | 43 | 99 | 4257 | ||

2 | 21 | 65 | 1365 | ||

3 | 25 | 79 | 1975 | ||

4 | 42 | 75 | 3150 | ||

5 | 57 | 87 | 4959 | ||

6 | 59 | 81 | 4779 |

**Step 3:** *Take the square of the numbers in the x column, and put the result in the x ^{2} column.*

Subject | Age x | Glucose Level y | xy | x^{2} |
y^{2} |
---|---|---|---|---|---|

1 | 43 | 99 | 4257 | 1849 | |

2 | 21 | 65 | 1365 | 441 | |

3 | 25 | 79 | 1975 | 625 | |

4 | 42 | 75 | 3150 | 1764 | |

5 | 57 | 87 | 4959 | 3249 | |

6 | 59 | 81 | 4779 | 3481 |

**Step 4:** *Take the square of the numbers in the y column, and put the result in the y ^{2} column.*

Subject | Age x | Glucose Level y | xy | x^{2} |
y^{2} |
---|---|---|---|---|---|

1 | 43 | 99 | 4257 | 1849 | 9801 |

2 | 21 | 65 | 1365 | 441 | 4225 |

3 | 25 | 79 | 1975 | 625 | 6241 |

4 | 42 | 75 | 3150 | 1764 | 5625 |

5 | 57 | 87 | 4959 | 3249 | 7569 |

6 | 59 | 81 | 4779 | 3481 | 6561 |

**Step 5:** *Add up all of the numbers in the columns and put the result at the bottom. ^{2} column.* The Greek letter sigma (Σ) is a short way of saying “sum of.”

Subject | Age x | Glucose Level y | xy | x^{2} |
y^{2} |
---|---|---|---|---|---|

1 | 43 | 99 | 4257 | 1849 | 9801 |

2 | 21 | 65 | 1365 | 441 | 4225 |

3 | 25 | 79 | 1975 | 625 | 6241 |

4 | 42 | 75 | 3150 | 1764 | 5625 |

5 | 57 | 87 | 4959 | 3249 | 7569 |

6 | 59 | 81 | 4779 | 3481 | 6561 |

Σ | 247 | 486 | 20485 | 11409 | 40022 |

**Step 6:***Use the following correlation coefficient formula.*

The answer is: ** 2868 / 5413.27 = 0.529809**

Click here if you want easy, step-by-step instructions for solving this formula.

From our table:

- Σx = 247
- Σy = 486
- Σxy = 20,485
- Σx
^{2}= 11,409 - Σy
^{2}= 40,022 - n is the sample size, in our case = 6

The correlation coefficient =

- 6(20,485) – (247 × 486) / [√[[6(11,409) – (247
^{2})] × [6(40,022) – 486^{2}]]]

=0.5298

The range of the correlation coefficient is from -1 to 1. Our result is 0.5298 or 52.98%, which means the variables have a moderate positive correlation.

Back to Top.

Like the explanation? Check out the Practically Cheating Statistics Handbook, which has hundreds more step-by-step explanations, just like this one!

## How to test correlation coefficients

If you can read a table–you can **test for correlation coefficient.**

**Sample problem**: test the significance of the correlation coefficient r = 0.565 using the critical values for PPMC table. Test at α = 0.01 for a sample size of 9.

**Step 1:** *Subtract two from the sample size to get df, degrees of freedom*.

9 – 7 = 2

**Step 2:** *Look the values up in the PPMC Table. * With df = 7 and α = 0.01, the table value is = **0.798**

**Step 3:** *Draw a graph, so you can more easily see the relationship.*

r = 0.565 does not fall into the “reject” region (above 0.798), so there isn’t enough evidence to state a strong linear relationship exists in the data.

## What Does the Correlation Coefficient Mean?

Pearson’s Correlation Coefficient returns a value of between -1 and +1. A -1 means there is a strong negative correlation and +1 means that there is a strong positive correlation. This can initially be a little hard to wrap your head around (who likes to deal with negative numbers?). The Political Science Department at Quinnipiac University posted this useful list of the meaning of Pearson’s Correlation coefficients. They note that these are “**crude estimates**” for interpreting strengths of correlations using Pearson’s Correlation:

r value = | |

+.70 or higher | Very strong positive relationship |

+.40 to +.69 | Strong positive relationship |

+.30 to +.39 | Moderate positive relationship |

+.20 to +.29 | weak positive relationship |

+.01 to +.19 | No or negligible relationship |

0 | No relationship |

-.01 to -.19 | No or negligible relationship |

-.20 to -.29 | weak negative relationship |

-.30 to -.39 | Moderate negative relationship |

-.40 to -.69 | Strong negative relationship |

-.70 or higher | Very strong negative relationship |

It may be helpful to see graphically what these correlations look like:

The images show that a strong negative correlation means that the graph has a downward slope from left to right: as the x-values increase, the y-values get smaller. A strong positive correlation means that the graph has an upward slope from left to right: as the x-values increase, the y-values get larger.

Back to top.

### Cramer’s V Correlation

Cramer’s V Correlation is similar to the Pearson Correlation coefficient. While the Pearson correlation is used to test the strength of linear relationships, Cramer’s V is used to calculate correlation in tables with more than 2 x 2 columns and rows. Cramer’s V correlation varies between 0 and 1. A value close to 0 means that there is very little association between the variables. A Cramer’s V of close to 1 indicates a very strong association.

Cramer’s V | |

.25 or higher | Very strong relationship |

.15 to .25 | Strong relationship |

.11 to .15 | Moderate relationship |

.06 to .10 | weak relationship |

.01 to .05 | No or negligible relationship |

## Where did the Correlation Coefficient Come From?

A correlation coefficient gives you an idea of how well data fits a line or curve. Pearson wasn’t the original inventor of the term correlation but his use of it became one of the most popular ways to measure correlation.

Francis Galton (who was also involved with the development of the interquartile range) was the first person to measure correlation, originally termed “co-relation,” which actually makes sense considering you’re studying the relationship between a couple of different variables. In Co-Relations and Their Measurement, he said “The statures of kinsmen are co-related variables; thus, the stature of the father is correlated to that of the adult son,..and so on; but the index of co-relation … is different in the different cases.” It’s worth noting though that Galton mentioned in his paper that he had borrowed the term from biology, where “Co-relation and correlation of structure” was being used but until the time of his paper it hadn’t been properly defined.

In 1892, British statistician Francis Ysidro Edgeworth published a paper called “Correlated Averages,” Philosophical Magazine, 5th Series, 34, 190-204 where he used the term “Coefficient of Correlation.” It wasn’t until 1896 that British mathematician Karl Pearson used “Coefficient of Correlation” in two papers: Contributions to the Mathematical Theory of Evolution and Mathematical Contributions to the Theory of Evolution. III. Regression, Heredity and Panmixia. It was the second paper that introduced the Pearson product-moment correlation formula for estimating correlation.

Back to Top.

Want to pass your statistics class? Find out how to get a letter grade higher, the easy way!

I’m a little confused. Based on the formula, I thought that instead of squaring 114092 and 40022, you should square x (247)and y (486).

I think this is the part of the course that you can feel your brain growing larger. The Correlation Coefficient equation is a long process, if only there was a way to shorten the problem.

I too wish there was a shorter way to do this problem. I’m just thankful that I actually understand how to work the problem. Your explanation was helpful and easy to follow. Thank you!

This example was really helpful and I understand how to calculate the problem and how to do all of the steps but the only problem I am having is how did you get the final answer which in your example it says 1.44281 … in mathzone I did the whole problem like it said and I even saw the example and it was right but the final answer I dont know how they got to. i got 2.14866937 E -4 , but the answer was 0.947. please help me .. im I missing something?

I understand, I just figured out the right answer now. I didnt know I had to square root the bottom part, and even though this helped me alot, i used google and they helped me figure out the last part by explaining everything step by step and unfortunately thats what i need.

I have found that it is easiest, and you get the same answer by going through the Linreg function on the calculator, it gives you the correlation coefficent as well as the correlation of determination.

Excellent example. A couple of mistakes though! 6×11409 = 68454

Also you must take the square-root of the denominator. I make the answer 0.5298

On this page you showed that r’s denominator is a square root

http://www.statisticshowto.com/articles/how-to-compute-pearsons-correlation-coefficients/

but on this page, you didn’t do it.

http://www.statisticshowto.com/help-with-statistics-equations/

Also, step 5 is wrong.

6 * 11409 = 68,454 not 66,294

In step 7, you used 68,454 which is correct but when you subtracted 61009 from it you got the incorrect value of 5,285.

68,454 – 61,009 = 7,445

In step 11, it becomes 7,445 * 3936 = 29303520.

The final answer should be 2896 / 29303520 = 9.78722e-05 = 0.000097822

Regards,

you are absolutely correct

ITS GOOD,EASY TO UNDERSTAND

Thanks for spotting the error in the formula! An update is on the way for the long step from the book. In the meantime, this page has been updated with the correct answer (thanks, Tony!).

This explanation i personally found to be the best after going through many explanation based on the same formula.Thank you very much for such a simple and understanding method of explanation of Pearson’s Correlation Coefficients.

wow, thank you so much, the steps are wonderfully helpful and adaptable

A B C D E F G H

10.8 49.9 99.65 200.8 1.004 5.005 10.004 50.01

10.8 49.9 99.65 200.8 1.004 5.005 10.004 50.01

10.8 49.9 99.65 200.8 1.004 5.005 10.004 50.01

AVg

10.8 49.9 99.65 200.8 1.004 5.005 10.004 50.01

How to calculate corelation coefficient of above .

kindly assist on how to calculate the correlation coefficient step by step

Thank you so much for the easy self explanatory examples given. I really appreciate it. I was given an assignment for analytical chemistry on statistics and so thankful that this website help solve my 75% of my assignment.

Great! That’s what the site is here for. It’s always nice to hear it helped out :)

Stephanie

perfect example.has great step by step guidance which makes it very easy to understand

will some one help us how to solve genetic correlation problems? it is our report yet were not ready>> we are totally dead to our very owned teacher… S O S..

pls. add link to my f.b page. [email protected]

thank you!

Super easy to follow, loved the whole “step-by-step” thing! I’m seriously mathematically challenged, so I was super happy to have found such a helpful website!

Your steps of calculating correlation coefficient is wonderful

Thanks, Isaac!

Must bookmark this site! Extremely helpful in taking online classes and trying to teach myself statistics. I kept getting numbers like 38 for r until I read this article v.v

It is very helpful website.We can easy to understand our question from this website.

thanks your steps makes it easy to understand!

help me to answer this please its urgent:(a researcher correlated the MTAI scores of a group of 100 experienced secondary school teachers with the number of students each teacher failed in a year.He obtained an r of -0.39.He concluded that teachers tend to fail students because they do not have “accepting”attitudes towards students. Comment on the researcher’s methods and conclusions.

Hello, Kobby,

Please ask your question on the forum and one of our mods will get back to you:

http://www.statisticshowto.com/forums/

Thanks!

Stephanie

Remember to get the square root of the denominator before dividing!

The formala as shown should show:

6(20,485) – (247 × 486) /

[ [6(11,409) – (2472)] × [6(40,022) – 4862]] <– square root!

=0.5298

Excellent. Help me a lot to find out correlation coefficients.

X Y

7 9

8 11

12 12

4 13

16 15

18 17

10 18

FIND COEFFICIENT OF CORRELATION BY PEARSON METHOD PLEASE?

Hi, Maria,

Would you mind posting this in our forum? One of our mods would be happy to help :)

http://www.statisticshowto.com/forums/

Thanks,

Stephanie

thanx Stephanie. To share something more ,can u mail at

[email protected]

Hi kobby

To me,teachers r right

Can someone help me tell about level of significance at 0.05/0.01 from Chai square table,with reference to hopothesis H0/HA ?? thx

Hi everyone, help please.

I am very new to stats but need to grasp it quickly to analyze data in my thesis.

I want to find out the nature of relationship between pop love songs themes and imagined interactions (Imagined Interaction Theory. The instrument to measure imagined interactions is a 7point interval scale ranging from strongly disagree to strongly agree.

Can I use Pearson’s R to test the coefficient between my variables: love songs themes and imagined interactions?

Thanks in advance for help

can anyone help me do this stat homework?

1) Watson & Watson Repair Inc. provides maintenance service for a large apartment complex in downtown

Saint Petersburg, Florida. W & W managers are evaluating the possibility of hiring another maintenance

person because it seems maintenance calls are increasing. Rafael Roddick and Andy Nadal are currently

responsible for maintenance tasks. To investigate “what” drives Repair Time, the managers hire you as

statistician to conduct a regression analysis. The table below provides data from a random selected sample of

10 maintenance calls.

a. (1pt)How would you include “responsible for maintenance” in your regression? (How would you define

it?)

STEP 1: use the dummy variable REPAIRPERSON = 1 IF responsible = RAFAEL

REPAIRPERSON = 0 IF responsible = ANDY

A regression model is set up using ONLY repairperson as variable to explain REPAIRTIME

a. (1pt) Comment on the “correlation” between Repairperson and Repairtime.

b. (1pt) Comment on goodness of fit of the model

Maintenance

Call

Repair

Time

(hours)

Months

Since

Last Service

Responsible for

maintenance

1 2.9 3 Rafael Roddick

2 3 3.9 Rafael Roddick

3 4.8 8.2 Andy Nadal

4 1.8 3 Rafael Roddick

5 2.9 2 Rafael Roddick

6 4.9 7 Andy Nadal

7 4.4 9 Andy Nadal

8 4.5 8.5 Andy Nadal

9 4.4 4 Andy Nadal

10 4.5 6 Rafael Roddick

Correlations

Repairtime Repair

Pearson

Correlation

Repairtime 1.000 -.783

Repairperson -.783 1.000

Model Summary

b

Model R R Square

Adjusted R

Square

Std. Error of the

Estimate

1 .783

a

.614 .565 .70071

a. Predictors: (Constant), Repairperson

b. Dependent Variable: Repairtime2

c. (2pt) Report the statistical significance of the coefficients.

STEP 2: Use MONTHS SINCE LAST SERVICE AND REPAIRPERSON in a regression to explain REPAIRTIME

a. (1pt) Comment on the scatter diagram for Months-since-last-service and Repairtime.

b. (2pt) Comment on goodness of fit of the model. Do you find any difference with respect to the goodness of

fit of the model in STEP 1?

c. (1pt) Comment on the normality assumptions of the model.

Coefficients

a

Model

Unstandardized Coefficients

Standardized

Coefficients

B Std. Error Beta t Sig.

1 (Constant) 4.600 .313 14.679 .000

Repairperson -1.580 .443 -.783 -3.565 .007

a. Dependent Variable: Repairtime

Model Summary

b

Model R R Square Adjusted R Square

Std. Error of the

Estimate

1 .839

a

.705 .620 .65498

a. Predictors: (Constant), monthslastservice, Repairperson

b. Dependent Variable: Repairtime3

d. (3pt) Report the statistical significance of the coefficients.

e. (1pt) Why do you think the statistical significance of the coefficient for repairperson has changed from step 1

to step 2?

In Step 1, repairperson was the only variable explaining repair time. It seems that the

combining this variable with months since last service the, repairperson loses explanatory

power, which is reflected in the SS of the coefficient.

f. (1pt) Write down the estimated regression equation.

g. (2pt) Interpret the intercept for this model

h. (2pt) Provide an interpretation for the slope coefficients of the model.

Coefficients

a

Unstandardized Coefficients

Standardized

Coefficients

B Std. Error Beta t Sig.

(Constant) 3.195 1.001 3.192 .015

Repairperson -.860 .642 -.426 -1.340 .222

monthslastservice .191 .130 .467 1.468 .18

STEP 3: Use MONTHS SINCE LAST SERVICE to capture the curvature explaining REPAIRTIME

1) (2pt) From all models bellow, which you think is best?

Model Summary and Parameter Estimates

Dependent Variable:Repairtime

Equation

Model Summary Parameter Estimates

R Square F df1 df2 Sig. Constant b1 b2 b3

Linear .629 13.558 1 8 .006 2.036 .325

Quadratic .709 8.531 2 7 .013 .213 1.130 -.072

Cubic .765 6.515 3 6 .026 3.639 -1.227 .405 -.029

The independent variable is monthslastservice.

The cubic model has a good fit as 76.5% so it represents a

better fit for the model

2) (10pt) Given the following estimated regression equation and SPSS output from regression, fill in the

missing values. Show your calculations.

ANOVA

Model Sum of Squares df Mean Square F

1 Regression

Residual

Total 25.5 7

Coefficients

Model

Unstandardized Coefficients

B Std. Error t

1 (Constant) 83.23 1.574 52.882

X1 0.304

X2 1.301 0.321 4.057

Hi All,

How does this calculation work when one of the datasets are percetages?

How to calculate pearson coefficienr for a line in the graph.

Hi, Justin,

Please post your question on the forums. One of our mods will be able to help you (but please post one question at a time :) ).

Regards,

Stephanie

i am unable to find the correct coefficient of correlation when it gives the negative value in the square root.

Anum,

Unfortunately, time constraints prevent me from answering stats related questions on the comments section. But please ask for help on our forums — one of our moderators will be glad to help!

http://www.statisticshowto.com/forums/

Stephanie

Ahhtar,

Unfortunately, time constraints prevent me from answering stats related questions on the comments section. But please ask for help on our forums — one of our moderators will be glad to help!

http://www.statisticshowto.com/forums/

Stephanie

Gert,

Unfortunately, time constraints prevent me from answering stats related questions on the comments section. But please ask for help on our forums — one of our moderators will be glad to help!

http://www.statisticshowto.com/forums/

Stephanie

Hi, Ria,

http://www.statisticshowto.com/forums/

Stephanie

Thanks to help

helpfull an easy to understand

can you assist me in choosing the test statistic tools in analyzing my hypotheses such as follows,

1.there is a relationship between m-pesa and the economic and social outcomes in the society.

2.there is a relationship between strategies and approaches used by m-pesa and customer satisfaction.

3.there is relationship between transaction cost and the extent of use of m-pesa

Lilian,

Time constraints prevent me from answering stats questions in the comments…but post on our forums and our mod will be happy to help :)

Stephanie

Thank you so much for the step by step approach. Now if only I get get my college professors to explain things this way!

When constructing the data table, do you use the percent or decimal? For example, x = the number of jobs in a particular state and y = the percent of poverty in that state. Would y = 15.2% or would y = 0.152 for the calculation? Thanks

Sherry,

Use decimal. That makes multiplication possible. For example, if you were to multiply 10% by 10%, you would first have to convert them to decimals anyway (.1 * .1).

Regards,

Stephanie

Folks,

We are in a grp project for our research class in medical informatics. We are to present the coorelation data in class , any suggestions on how to present this data graphically ? Any software any thing ? Can excel do it ?

Thanks in advance for your help

regards

DenO

f

CAN I GET THE LIST OF ALL THE FORMULAS FOR CORRELATION & COEFFICIENTS

Thank you so much for the step by step approach.The steps are wonderful adaptable.

Hi, Joshua,

Thank you for your question. Unfortunately, time constraints prevent me from answering math questions in the comments. Could you post your question on our forums? One of our mods would be glad to help.

Stephanie

Hi, Guarang,

Thank you for your question. Unfortunately, time constraints prevent me from answering math questions in the comments. Could you post your question on our forums? One of our mods would be glad to help.

Stephanie