# Correlation Coefficients: Find Pearson’s Correlation Coefficient

Probability and Statistics > Regression Analysis > Pearson’s Correlation Coefficients (Linear Correlation Coefficient)

## How to Find Pearson’s Correlation Coefficients

Correlation coefficients are used in statistics to measure how strong a relationship is between two variables. There are several types of correlation coefficient: Pearson’s correlation or Pearson correlation is a correlation coefficient commonly used in linear regression.

Sample question: Find the value of the correlation coefficient from the following table:

Subject Age x Glucose Level y
1 43 99
2 21 65
3 25 79
4 42 75
5 57 87
6 59 81

Step 1:Make a chart. Use the given data, and add three more columns: xy, x2, and y2.

Subject Age x Glucose Level y xy x2 y2
1 43 99
2 21 65
3 25 79
4 42 75
5 57 87
6 59 81

Step 2::Multiply x and y together to fill the xy column. For example, row 1 would be 43 × 99 = 4,257.

Subject Age x Glucose Level y xy x2 y2
1 43 99 4257
2 21 65 1365
3 25 79 1975
4 42 75 3150
5 57 87 4959
6 59 81 4779

Step 3: Take the square of the numbers in the x column, and put the result in the x2 column.

Subject Age x Glucose Level y xy x2 y2
1 43 99 4257 1849
2 21 65 1365 441
3 25 79 1975 625
4 42 75 3150 1764
5 57 87 4959 3249
6 59 81 4779 3481

Step 4: Take the square of the numbers in the y column, and put the result in the y2 column.

Subject Age x Glucose Level y xy x2 y2
1 43 99 4257 1849 9801
2 21 65 1365 441 4225
3 25 79 1975 625 6241
4 42 75 3150 1764 5625
5 57 87 4959 3249 7569
6 59 81 4779 3481 6561

Step 5: Add up all of the numbers in the columns and put the result at the bottom.2 column. The Greek letter sigma (Σ) is a short way of saying “sum of.”

Subject Age x Glucose Level y xy x2 y2
1 43 99 4257 1849 9801
2 21 65 1365 441 4225
3 25 79 1975 625 6241
4 42 75 3150 1764 5625
5 57 87 4959 3249 7569
6 59 81 4779 3481 6561
Σ 247 486 20485 11409 40022

Step 6:Use the following correlation coefficient formula.

The answer is: 2868 / 5413.27 = 0.529809

Click here if you want easy, step-by-step instructions for solving this formula.

From our table:

• Σx = 247
• Σy = 486
• Σxy = 20,485
• Σx2 = 11,409
• Σy2 = 40,022
• n is the sample size, in our case = 6

The correlation coefficient =

• 6(20,485) – (247 × 486) / [√[[6(11,409) – (2472)] × [6(40,022) – 4862]]]
• =0.5298

The range of the correlation coefficient is from -1 to 1. Our result is 0.5298 or 52.98%, which means the variables have a moderate positive correlation.

Like the explanation? Check out the Practically Cheating Statistics Handbook, which has hundreds more step-by-step explanations, just like this one!

You can also find the correlation coefficient in Minitab.

## Correlation Coefficient Hypothesis Test

If you can read a table–you can test for correlation coefficient. Note that correlations should only be calculated for an entire range of data. If you restrict the range, r will be weakened.

Sample problem: test the significance of the correlation coefficient r = 0.565 using the critical values for PPMC table. Test at α = 0.01 for a sample size of 9.

Step 1: Subtract two from the sample size to get df, degrees of freedom.
9 – 7 = 2

Step 2: Look the values up in the PPMC Table. With df = 7 and α = 0.01, the table value is = 0.798

Step 3: Draw a graph, so you can more easily see the relationship.

r = 0.565 does not fall into the “reject” region (above 0.798), so there isn’t enough evidence to state a strong linear relationship exists in the data.

## Meaning of the Linear Correlation Coefficient.

Pearson’s Correlation Coefficient is a linear correlation coefficien that returns a value of between -1 and +1. A -1 means there is a strong negative correlation and +1 means that there is a strong positive correlation. A 0 means that there is no correlation (this is also called zero order correlation).

This can initially be a little hard to wrap your head around (who likes to deal with negative numbers?). The Political Science Department at Quinnipiac University posted this useful list of the meaning of Pearson’s Correlation coefficients. They note that these are “crude estimates” for interpreting strengths of correlations using Pearson’s Correlation:

 r value = +.70 or higher Very strong positive relationship +.40 to +.69 Strong positive relationship +.30 to +.39 Moderate positive relationship +.20 to +.29 weak positive relationship +.01 to +.19 No or negligible relationship 0 No relationship [zero order correlation] -.01 to -.19 No or negligible relationship -.20 to -.29 weak negative relationship -.30 to -.39 Moderate negative relationship -.40 to -.69 Strong negative relationship -.70 or higher Very strong negative relationship

It may be helpful to see graphically what these correlations look like:

Graphs showing a correlation of -1 (a negative correlation), 0 and +1 (a positive correlation)

The images show that a strong negative correlation means that the graph has a downward slope from left to right: as the x-values increase, the y-values get smaller. A strong positive correlation means that the graph has an upward slope from left to right: as the x-values increase, the y-values get larger.

### Cramer’s V Correlation

Cramer’s V Correlation is similar to the Pearson Correlation coefficient. While the Pearson correlation is used to test the strength of linear relationships, Cramer’s V is used to calculate correlation in tables with more than 2 x 2 columns and rows. Cramer’s V correlation varies between 0 and 1. A value close to 0 means that there is very little association between the variables. A Cramer’s V of close to 1 indicates a very strong association.

 Cramer’s V .25 or higher Very strong relationship .15 to .25 Strong relationship .11 to .15 Moderate relationship .06 to .10 weak relationship .01 to .05 No or negligible relationship

## Where did the Correlation Coefficient Come From?

A correlation coefficient gives you an idea of how well data fits a line or curve. Pearson wasn’t the original inventor of the term correlation but his use of it became one of the most popular ways to measure correlation.

Francis Galton (who was also involved with the development of the interquartile range) was the first person to measure correlation, originally termed “co-relation,” which actually makes sense considering you’re studying the relationship between a couple of different variables. In Co-Relations and Their Measurement, he said “The statures of kinsmen are co-related variables; thus, the stature of the father is correlated to that of the adult son,..and so on; but the index of co-relation … is different in the different cases.” It’s worth noting though that Galton mentioned in his paper that he had borrowed the term from biology, where “Co-relation and correlation of structure” was being used but until the time of his paper it hadn’t been properly defined.

In 1892, British statistician Francis Ysidro Edgeworth published a paper called “Correlated Averages,” Philosophical Magazine, 5th Series, 34, 190-204 where he used the term “Coefficient of Correlation.” It wasn’t until 1896 that British mathematician Karl Pearson used “Coefficient of Correlation” in two papers: Contributions to the Mathematical Theory of Evolution and Mathematical Contributions to the Theory of Evolution. III. Regression, Heredity and Panmixia. It was the second paper that introduced the Pearson product-moment correlation formula for estimating correlation.

The Pearson Product-Moment Correlation equation.

If you prefer an online interactive environment to learn R and statistics, this free R Tutorial by Datacamp is a great way to get started. If you're are somewhat comfortable with R and are interested in going deeper into Statistics, try this Statistics with R track.

Correlation Coefficients: Find Pearson’s Correlation Coefficient was last modified: October 14th, 2017 by

# 51 thoughts on “Correlation Coefficients: Find Pearson’s Correlation Coefficient”

1. Bill Bryan

I think this is the part of the course that you can feel your brain growing larger. The Correlation Coefficient equation is a long process, if only there was a way to shorten the problem.

2. Donna Allen

I too wish there was a shorter way to do this problem. I’m just thankful that I actually understand how to work the problem. Your explanation was helpful and easy to follow. Thank you!

3. Vanessa

This example was really helpful and I understand how to calculate the problem and how to do all of the steps but the only problem I am having is how did you get the final answer which in your example it says 1.44281 … in mathzone I did the whole problem like it said and I even saw the example and it was right but the final answer I dont know how they got to. i got 2.14866937 E -4 , but the answer was 0.947. please help me .. im I missing something?

4. Vanessa

I understand, I just figured out the right answer now. I didnt know I had to square root the bottom part, and even though this helped me alot, i used google and they helped me figure out the last part by explaining everything step by step and unfortunately thats what i need.

5. Alison Bryant

I have found that it is easiest, and you get the same answer by going through the Linreg function on the calculator, it gives you the correlation coefficent as well as the correlation of determination.

6. Tony

Excellent example. A couple of mistakes though! 6×11409 = 68454
Also you must take the square-root of the denominator. I make the answer 0.5298

7. Stephanie

Thanks for spotting the error in the formula! An update is on the way for the long step from the book. In the meantime, this page has been updated with the correct answer (thanks, Tony!).

8. Seema Dessai

This explanation i personally found to be the best after going through many explanation based on the same formula.Thank you very much for such a simple and understanding method of explanation of Pearson’s Correlation Coefficients.

9. paresh

A B C D E F G H
10.8 49.9 99.65 200.8 1.004 5.005 10.004 50.01
10.8 49.9 99.65 200.8 1.004 5.005 10.004 50.01
10.8 49.9 99.65 200.8 1.004 5.005 10.004 50.01
AVg
10.8 49.9 99.65 200.8 1.004 5.005 10.004 50.01
How to calculate corelation coefficient of above .

10. jonathan D Hantapat

Thank you so much for the easy self explanatory examples given. I really appreciate it. I was given an assignment for analytical chemistry on statistics and so thankful that this website help solve my 75% of my assignment.

11. Andale

Great! That’s what the site is here for. It’s always nice to hear it helped out :)

Stephanie

12. Genardo_27

will some one help us how to solve genetic correlation problems? it is our report yet were not ready>> we are totally dead to our very owned teacher… S O S..

thank you!

13. Erika P

Super easy to follow, loved the whole “step-by-step” thing! I’m seriously mathematically challenged, so I was super happy to have found such a helpful website!

14. Pixie

Must bookmark this site! Extremely helpful in taking online classes and trying to teach myself statistics. I kept getting numbers like 38 for r until I read this article v.v

15. kobby maloiso

help me to answer this please its urgent:(a researcher correlated the MTAI scores of a group of 100 experienced secondary school teachers with the number of students each teacher failed in a year.He obtained an r of -0.39.He concluded that teachers tend to fail students because they do not have “accepting”attitudes towards students. Comment on the researcher’s methods and conclusions.

16. CW

Remember to get the square root of the denominator before dividing!

The formala as shown should show:

6(20,485) – (247 × 486) /

[ [6(11,409) – (2472)] × [6(40,022) – 4862]] <– square root!

=0.5298

17. MARIA

Can someone help me tell about level of significance at 0.05/0.01 from Chai square table,with reference to hopothesis H0/HA ?? thx

18. Ria

I am very new to stats but need to grasp it quickly to analyze data in my thesis.

I want to find out the nature of relationship between pop love songs themes and imagined interactions (Imagined Interaction Theory. The instrument to measure imagined interactions is a 7point interval scale ranging from strongly disagree to strongly agree.

Can I use Pearson’s R to test the coefficient between my variables: love songs themes and imagined interactions?

19. Justin

can anyone help me do this stat homework?
1) Watson & Watson Repair Inc. provides maintenance service for a large apartment complex in downtown
Saint Petersburg, Florida. W & W managers are evaluating the possibility of hiring another maintenance
person because it seems maintenance calls are increasing. Rafael Roddick and Andy Nadal are currently
responsible for maintenance tasks. To investigate “what” drives Repair Time, the managers hire you as
statistician to conduct a regression analysis. The table below provides data from a random selected sample of
10 maintenance calls.
a. (1pt)How would you include “responsible for maintenance” in your regression? (How would you define
it?)
STEP 1: use the dummy variable REPAIRPERSON = 1 IF responsible = RAFAEL
REPAIRPERSON = 0 IF responsible = ANDY
A regression model is set up using ONLY repairperson as variable to explain REPAIRTIME
a. (1pt) Comment on the “correlation” between Repairperson and Repairtime.
b. (1pt) Comment on goodness of fit of the model

Maintenance
Call
Repair
Time
(hours)
Months
Since
Last Service
Responsible for
maintenance
1 2.9 3 Rafael Roddick
2 3 3.9 Rafael Roddick
4 1.8 3 Rafael Roddick
5 2.9 2 Rafael Roddick
10 4.5 6 Rafael Roddick
Correlations
Repairtime Repair
Pearson
Correlation
Repairtime 1.000 -.783
Repairperson -.783 1.000
Model Summary
b
Model R R Square
Square
Std. Error of the
Estimate
1 .783
a
.614 .565 .70071
a. Predictors: (Constant), Repairperson
b. Dependent Variable: Repairtime2
c. (2pt) Report the statistical significance of the coefficients.
STEP 2: Use MONTHS SINCE LAST SERVICE AND REPAIRPERSON in a regression to explain REPAIRTIME
a. (1pt) Comment on the scatter diagram for Months-since-last-service and Repairtime.
b. (2pt) Comment on goodness of fit of the model. Do you find any difference with respect to the goodness of
fit of the model in STEP 1?
c. (1pt) Comment on the normality assumptions of the model.
Coefficients
a
Model
Unstandardized Coefficients
Standardized
Coefficients
B Std. Error Beta t Sig.
1 (Constant) 4.600 .313 14.679 .000
Repairperson -1.580 .443 -.783 -3.565 .007
a. Dependent Variable: Repairtime
Model Summary
b
Model R R Square Adjusted R Square
Std. Error of the
Estimate
1 .839
a
.705 .620 .65498
a. Predictors: (Constant), monthslastservice, Repairperson
b. Dependent Variable: Repairtime3
d. (3pt) Report the statistical significance of the coefficients.

e. (1pt) Why do you think the statistical significance of the coefficient for repairperson has changed from step 1
to step 2?
In Step 1, repairperson was the only variable explaining repair time. It seems that the
combining this variable with months since last service the, repairperson loses explanatory
power, which is reflected in the SS of the coefficient.
f. (1pt) Write down the estimated regression equation.

g. (2pt) Interpret the intercept for this model
h. (2pt) Provide an interpretation for the slope coefficients of the model.
Coefficients
a
Unstandardized Coefficients
Standardized
Coefficients
B Std. Error Beta t Sig.
(Constant) 3.195 1.001 3.192 .015
Repairperson -.860 .642 -.426 -1.340 .222
monthslastservice .191 .130 .467 1.468 .18
STEP 3: Use MONTHS SINCE LAST SERVICE to capture the curvature explaining REPAIRTIME
1) (2pt) From all models bellow, which you think is best?
Model Summary and Parameter Estimates
Dependent Variable:Repairtime
Equation
Model Summary Parameter Estimates
R Square F df1 df2 Sig. Constant b1 b2 b3
Linear .629 13.558 1 8 .006 2.036 .325
Quadratic .709 8.531 2 7 .013 .213 1.130 -.072
Cubic .765 6.515 3 6 .026 3.639 -1.227 .405 -.029
The independent variable is monthslastservice.
The cubic model has a good fit as 76.5% so it represents a
better fit for the model
2) (10pt) Given the following estimated regression equation and SPSS output from regression, fill in the

ANOVA
Model Sum of Squares df Mean Square F
1 Regression
Residual
Total 25.5 7
Coefficients
Model
Unstandardized Coefficients
B Std. Error t
1 (Constant) 83.23 1.574 52.882
X1 0.304
X2 1.301 0.321 4.057

20. anum

i am unable to find the correct coefficient of correlation when it gives the negative value in the square root.

21. lilian richard

can you assist me in choosing the test statistic tools in analyzing my hypotheses such as follows,

1.there is a relationship between m-pesa and the economic and social outcomes in the society.

2.there is a relationship between strategies and approaches used by m-pesa and customer satisfaction.

3.there is relationship between transaction cost and the extent of use of m-pesa

22. Derek

Thank you so much for the step by step approach. Now if only I get get my college professors to explain things this way!

23. Sherry

When constructing the data table, do you use the percent or decimal? For example, x = the number of jobs in a particular state and y = the percent of poverty in that state. Would y = 15.2% or would y = 0.152 for the calculation? Thanks

24. Andale

Sherry,

Use decimal. That makes multiplication possible. For example, if you were to multiply 10% by 10%, you would first have to convert them to decimals anyway (.1 * .1).

Regards,
Stephanie

25. deno

Folks,

We are in a grp project for our research class in medical informatics. We are to present the coorelation data in class , any suggestions on how to present this data graphically ? Any software any thing ? Can excel do it ?

regards
DenO
f

26. Ben Rois

Thank you so much for the lesson, very easy to understand and prep notes. am sure that you are a great teacher.. thank you again…

27. Andale

It’s not possible to have a pearson’s correlation coefficient of 2.71. Recheck your calculations.

28. louie

hi to all can you please me in my homework in statistical analysis

can you please gave me 2 problems using person correlation/ pearson r?

thanks. good afternoon