Statistics How To

Correlation Coefficients: Find Pearson’s Correlation Coefficient

Main Statistics Topic Index > Pearson’s Correlation Coefficients

Contents (Click to skip to the section):

  1. How to Find Pearson’s Correlation Coefficients.
  2. How to test a correlation coefficient.
  3. What Does the Correlation Coefficient Mean?
  4. Cramer’s V Correlation
  5. Where did the Correlation Coefficient Come From?

How to Find Pearson’s Correlation Coefficients

Correlation coefficients are used in statistics to measure how strong a relationship is between two variables. There are several types of correlation coefficient: Pearson’s correlation or Pearson correlation is a correlation coefficient commonly used in linear regression.

Sample question: Find the value of the correlation coefficient from the following table:

Subject Age x Glucose Level y
1 43 99
2 21 65
3 25 79
4 42 75
5 57 87
6 59 81

Step 1:Make a chart. Use the given data, and add three more columns: xy, x2, and y2.

Subject Age x Glucose Level y xy x2 y2
1 43 99
2 21 65
3 25 79
4 42 75
5 57 87
6 59 81

Step 2::Multiply x and y together to fill the xy column. For example, row 1 would be 43 × 99 = 4,257.

Subject Age x Glucose Level y xy x2 y2
1 43 99 4257
2 21 65 1365
3 25 79 1975
4 42 75 3150
5 57 87 4959
6 59 81 4779

Step 3: Take the square of the numbers in the x column, and put the result in the x2 column.

Subject Age x Glucose Level y xy x2 y2
1 43 99 4257 1849
2 21 65 1365 441
3 25 79 1975 625
4 42 75 3150 1764
5 57 87 4959 3249
6 59 81 4779 3481

Step 4: Take the square of the numbers in the y column, and put the result in the y2 column.

Subject Age x Glucose Level y xy x2 y2
1 43 99 4257 1849 9801
2 21 65 1365 441 4225
3 25 79 1975 625 6241
4 42 75 3150 1764 5625
5 57 87 4959 3249 7569
6 59 81 4779 3481 6561

Step 5: Add up all of the numbers in the columns and put the result at the bottom.2 column. The Greek letter sigma (Σ) is a short way of saying “sum of.”

Subject Age x Glucose Level y xy x2 y2
1 43 99 4257 1849 9801
2 21 65 1365 441 4225
3 25 79 1975 625 6241
4 42 75 3150 1764 5625
5 57 87 4959 3249 7569
6 59 81 4779 3481 6561
Σ 247 486 20485 11409 40022

Step 6:Use the following correlation coefficient formula.
pearsons correlation coefficient

The answer is: 2868 / 5413.27 = 0.529809

Click here if you want easy, step-by-step instructions for solving this formula.

From our table:

  • Σx = 247
  • Σy = 486
  • Σxy = 20,485
  • Σx2 = 11,409
  • Σy2 = 40,022
  • n is the sample size, in our case = 6

The correlation coefficient =

  • 6(20,485) – (247 × 486) / [√[[6(11,409) - (2472)] × [6(40,022) - 4862]]]
  • =0.5298

The range of the correlation coefficient is from -1 to 1. Our result is 0.5298 or 52.98%, which means the variables have a moderate positive correlation.
Back to Top.

Like the explanation? Check out the Practically Cheating Statistics Handbook, which has hundreds more step-by-step explanations, just like this one!

How to test correlation coefficients


If you can read a table–you can test for correlation coefficient.

Sample problem: test the significance of the correlation coefficient r = 0.565 using the critical values for PPMC table. Test at α = 0.01 for a sample size of 9.

Step 1: Subtract two from the sample size to get df, degrees of freedom.
9 – 7 = 2

Step 2: Look the values up in the PPMC Table. With df = 7 and α = 0.01, the table value is = 0.798

Step 3: Draw a graph, so you can more easily see the relationship.
ppm

r = 0.565 does not fall into the “reject” region (above 0.798), so there isn’t enough evidence to state a strong linear relationship exists in the data.

What Does the Correlation Coefficient Mean?

Pearson’s Correlation Coefficient returns a value of between -1 and +1. A -1 means there is a strong negative correlation and +1 means that there is a strong positive correlation. This can initially be a little hard to wrap your head around (who likes to deal with negative numbers?). The Political Science Department at Quinnipiac University posted this useful list of the meaning of Pearson’s Correlation coefficients. They note that these are “crude estimates” for interpreting strengths of correlations using Pearson’s Correlation:

r value =
+.70 or higher Very strong positive relationship
+.40 to +.69 Strong positive relationship
+.30 to +.39 Moderate positive relationship
+.20 to +.29 weak positive relationship
+.01 to +.19 No or negligible relationship
0 No relationship
-.01 to -.19 No or negligible relationship
-.20 to -.29 weak negative relationship
-.30 to -.39 Moderate negative relationship
-.40 to -.69 Strong negative relationship
-.70 or higher Very strong negative relationship

It may be helpful to see graphically what these correlations look like:

Graphs showing a correlation of -1, 0 and +1

Graphs showing a correlation of -1 (a negative correlation), 0 and +1 (a positive correlation)

The images show that a strong negative correlation means that the graph has a downward slope from left to right: as the x-values increase, the y-values get smaller. A strong positive correlation means that the graph has an upward slope from left to right: as the x-values increase, the y-values get larger.
Back to top.

Cramer’s V Correlation

Cramer’s V Correlation is similar to the Pearson Correlation coefficient. While the Pearson correlation is used to test the strength of linear relationships, Cramer’s V is used to calculate correlation in tables with more than 2 x 2 columns and rows. Cramer’s V correlation varies between 0 and 1. A value close to 0 means that there is very little association between the variables. A Cramer’s V of close to 1 indicates a very strong association.

Cramer’s V
.25 or higher Very strong relationship
.15 to .25 Strong relationship
.11 to .15 Moderate relationship
.06 to .10 weak relationship
.01 to .05 No or negligible relationship

Back to Top.

Where did the Correlation Coefficient Come From?

A correlation coefficient gives you an idea of how well data fits a line or curve. Pearson wasn’t the original inventor of the term correlation but his use of it became one of the most popular ways to measure correlation.

Francis Galton (who was also involved with the development of the interquartile range) was the first person to measure correlation, originally termed “co-relation,” which actually makes sense considering you’re studying the relationship between a couple of different variables. In Co-Relations and Their Measurement, he said “The statures of kinsmen are co-related variables; thus, the stature of the father is correlated to that of the adult son,..and so on; but the index of co-relation … is different in the different cases.” It’s worth noting though that Galton mentioned in his paper that he had borrowed the term from biology, where “Co-relation and correlation of structure” was being used but until the time of his paper it hadn’t been properly defined.

In 1892, British statistician Francis Ysidro Edgeworth published a paper called “Correlated Averages,” Philosophical Magazine, 5th Series, 34, 190-204 where he used the term “Coefficient of Correlation.” It wasn’t until 1896 that British mathematician Karl Pearson used “Coefficient of Correlation” in two papers: Contributions to the Mathematical Theory of Evolution and Mathematical Contributions to the Theory of Evolution. III. Regression, Heredity and Panmixia. It was the second paper that introduced the Pearson product-moment correlation formula for estimating correlation.

The Pearson Product-Moment Correlation equation.

The Pearson Product-Moment Correlation equation.


Back to Top.

Want to pass your statistics class? Find out how to get a letter grade higher, the easy way!

65 thoughts on “Correlation Coefficients: Find Pearson’s Correlation Coefficient

  1. Bill Bryan

    I think this is the part of the course that you can feel your brain growing larger. The Correlation Coefficient equation is a long process, if only there was a way to shorten the problem.

  2. Donna Allen

    I too wish there was a shorter way to do this problem. I’m just thankful that I actually understand how to work the problem. Your explanation was helpful and easy to follow. Thank you!

  3. Vanessa

    This example was really helpful and I understand how to calculate the problem and how to do all of the steps but the only problem I am having is how did you get the final answer which in your example it says 1.44281 … in mathzone I did the whole problem like it said and I even saw the example and it was right but the final answer I dont know how they got to. i got 2.14866937 E -4 , but the answer was 0.947. please help me .. im I missing something?

  4. Vanessa

    I understand, I just figured out the right answer now. I didnt know I had to square root the bottom part, and even though this helped me alot, i used google and they helped me figure out the last part by explaining everything step by step and unfortunately thats what i need.

  5. Alison Bryant

    I have found that it is easiest, and you get the same answer by going through the Linreg function on the calculator, it gives you the correlation coefficent as well as the correlation of determination.

  6. Tony

    Excellent example. A couple of mistakes though! 6×11409 = 68454
    Also you must take the square-root of the denominator. I make the answer 0.5298

  7. Ronak

    On this page you showed that r’s denominator is a square root

    http://www.statisticshowto.com/articles/how-to-compute-pearsons-correlation-coefficients/

    but on this page, you didn’t do it.

    http://www.statisticshowto.com/help-with-statistics-equations/

    Also, step 5 is wrong.
    6 * 11409 = 68,454 not 66,294

    In step 7, you used 68,454 which is correct but when you subtracted 61009 from it you got the incorrect value of 5,285.

    68,454 – 61,009 = 7,445

    In step 11, it becomes 7,445 * 3936 = 29303520.

    The final answer should be 2896 / 29303520 = 9.78722e-05 = 0.000097822

    Regards,

  8. Stephanie

    Thanks for spotting the error in the formula! An update is on the way for the long step from the book. In the meantime, this page has been updated with the correct answer (thanks, Tony!).

  9. Seema Dessai

    This explanation i personally found to be the best after going through many explanation based on the same formula.Thank you very much for such a simple and understanding method of explanation of Pearson’s Correlation Coefficients.

  10. paresh

    A B C D E F G H
    10.8 49.9 99.65 200.8 1.004 5.005 10.004 50.01
    10.8 49.9 99.65 200.8 1.004 5.005 10.004 50.01
    10.8 49.9 99.65 200.8 1.004 5.005 10.004 50.01
    AVg
    10.8 49.9 99.65 200.8 1.004 5.005 10.004 50.01
    How to calculate corelation coefficient of above .

  11. jonathan D Hantapat

    Thank you so much for the easy self explanatory examples given. I really appreciate it. I was given an assignment for analytical chemistry on statistics and so thankful that this website help solve my 75% of my assignment.

  12. Andale

    Great! That’s what the site is here for. It’s always nice to hear it helped out :)

    Stephanie

  13. Genardo_27

    will some one help us how to solve genetic correlation problems? it is our report yet were not ready>> we are totally dead to our very owned teacher… S O S..

    pls. add link to my f.b page. genard_perias27@yahoo.com

    thank you!

  14. Erika P

    Super easy to follow, loved the whole “step-by-step” thing! I’m seriously mathematically challenged, so I was super happy to have found such a helpful website!

  15. Pixie

    Must bookmark this site! Extremely helpful in taking online classes and trying to teach myself statistics. I kept getting numbers like 38 for r until I read this article v.v

  16. kobby maloiso

    help me to answer this please its urgent:(a researcher correlated the MTAI scores of a group of 100 experienced secondary school teachers with the number of students each teacher failed in a year.He obtained an r of -0.39.He concluded that teachers tend to fail students because they do not have “accepting”attitudes towards students. Comment on the researcher’s methods and conclusions.

  17. CW

    Remember to get the square root of the denominator before dividing!

    The formala as shown should show:

    6(20,485) – (247 × 486) /

    [ [6(11,409) - (2472)] × [6(40,022) - 4862]] <– square root!

    =0.5298

  18. MARIA

    Can someone help me tell about level of significance at 0.05/0.01 from Chai square table,with reference to hopothesis H0/HA ?? thx

  19. Ria

    Hi everyone, help please.
    I am very new to stats but need to grasp it quickly to analyze data in my thesis.

    I want to find out the nature of relationship between pop love songs themes and imagined interactions (Imagined Interaction Theory. The instrument to measure imagined interactions is a 7point interval scale ranging from strongly disagree to strongly agree.

    Can I use Pearson’s R to test the coefficient between my variables: love songs themes and imagined interactions?

    Thanks in advance for help

  20. Justin

    can anyone help me do this stat homework?
    1) Watson & Watson Repair Inc. provides maintenance service for a large apartment complex in downtown
    Saint Petersburg, Florida. W & W managers are evaluating the possibility of hiring another maintenance
    person because it seems maintenance calls are increasing. Rafael Roddick and Andy Nadal are currently
    responsible for maintenance tasks. To investigate “what” drives Repair Time, the managers hire you as
    statistician to conduct a regression analysis. The table below provides data from a random selected sample of
    10 maintenance calls.
    a. (1pt)How would you include “responsible for maintenance” in your regression? (How would you define
    it?)
    STEP 1: use the dummy variable REPAIRPERSON = 1 IF responsible = RAFAEL
    REPAIRPERSON = 0 IF responsible = ANDY
    A regression model is set up using ONLY repairperson as variable to explain REPAIRTIME
    a. (1pt) Comment on the “correlation” between Repairperson and Repairtime.
    b. (1pt) Comment on goodness of fit of the model

    Maintenance
    Call
    Repair
    Time
    (hours)
    Months
    Since
    Last Service
    Responsible for
    maintenance
    1 2.9 3 Rafael Roddick
    2 3 3.9 Rafael Roddick
    3 4.8 8.2 Andy Nadal
    4 1.8 3 Rafael Roddick
    5 2.9 2 Rafael Roddick
    6 4.9 7 Andy Nadal
    7 4.4 9 Andy Nadal
    8 4.5 8.5 Andy Nadal
    9 4.4 4 Andy Nadal
    10 4.5 6 Rafael Roddick
    Correlations
    Repairtime Repair
    Pearson
    Correlation
    Repairtime 1.000 -.783
    Repairperson -.783 1.000
    Model Summary
    b
    Model R R Square
    Adjusted R
    Square
    Std. Error of the
    Estimate
    1 .783
    a
    .614 .565 .70071
    a. Predictors: (Constant), Repairperson
    b. Dependent Variable: Repairtime2
    c. (2pt) Report the statistical significance of the coefficients.
    STEP 2: Use MONTHS SINCE LAST SERVICE AND REPAIRPERSON in a regression to explain REPAIRTIME
    a. (1pt) Comment on the scatter diagram for Months-since-last-service and Repairtime.
    b. (2pt) Comment on goodness of fit of the model. Do you find any difference with respect to the goodness of
    fit of the model in STEP 1?
    c. (1pt) Comment on the normality assumptions of the model.
    Coefficients
    a
    Model
    Unstandardized Coefficients
    Standardized
    Coefficients
    B Std. Error Beta t Sig.
    1 (Constant) 4.600 .313 14.679 .000
    Repairperson -1.580 .443 -.783 -3.565 .007
    a. Dependent Variable: Repairtime
    Model Summary
    b
    Model R R Square Adjusted R Square
    Std. Error of the
    Estimate
    1 .839
    a
    .705 .620 .65498
    a. Predictors: (Constant), monthslastservice, Repairperson
    b. Dependent Variable: Repairtime3
    d. (3pt) Report the statistical significance of the coefficients.

    e. (1pt) Why do you think the statistical significance of the coefficient for repairperson has changed from step 1
    to step 2?
    In Step 1, repairperson was the only variable explaining repair time. It seems that the
    combining this variable with months since last service the, repairperson loses explanatory
    power, which is reflected in the SS of the coefficient.
    f. (1pt) Write down the estimated regression equation.

    g. (2pt) Interpret the intercept for this model
    h. (2pt) Provide an interpretation for the slope coefficients of the model.
    Coefficients
    a
    Unstandardized Coefficients
    Standardized
    Coefficients
    B Std. Error Beta t Sig.
    (Constant) 3.195 1.001 3.192 .015
    Repairperson -.860 .642 -.426 -1.340 .222
    monthslastservice .191 .130 .467 1.468 .18
    STEP 3: Use MONTHS SINCE LAST SERVICE to capture the curvature explaining REPAIRTIME
    1) (2pt) From all models bellow, which you think is best?
    Model Summary and Parameter Estimates
    Dependent Variable:Repairtime
    Equation
    Model Summary Parameter Estimates
    R Square F df1 df2 Sig. Constant b1 b2 b3
    Linear .629 13.558 1 8 .006 2.036 .325
    Quadratic .709 8.531 2 7 .013 .213 1.130 -.072
    Cubic .765 6.515 3 6 .026 3.639 -1.227 .405 -.029
    The independent variable is monthslastservice.
    The cubic model has a good fit as 76.5% so it represents a
    better fit for the model
    2) (10pt) Given the following estimated regression equation and SPSS output from regression, fill in the
    missing values. Show your calculations.

    ANOVA
    Model Sum of Squares df Mean Square F
    1 Regression
    Residual
    Total 25.5 7
    Coefficients
    Model
    Unstandardized Coefficients
    B Std. Error t
    1 (Constant) 83.23 1.574 52.882
    X1 0.304
    X2 1.301 0.321 4.057

  21. Andale

    Hi, Justin,

    Please post your question on the forums. One of our mods will be able to help you (but please post one question at a time :) ).

    Regards,
    Stephanie

  22. anum

    i am unable to find the correct coefficient of correlation when it gives the negative value in the square root.

  23. lilian richard

    can you assist me in choosing the test statistic tools in analyzing my hypotheses such as follows,

    1.there is a relationship between m-pesa and the economic and social outcomes in the society.

    2.there is a relationship between strategies and approaches used by m-pesa and customer satisfaction.

    3.there is relationship between transaction cost and the extent of use of m-pesa

  24. Andale

    Lilian,

    Time constraints prevent me from answering stats questions in the comments…but post on our forums and our mod will be happy to help :)

    Stephanie

  25. Derek

    Thank you so much for the step by step approach. Now if only I get get my college professors to explain things this way!

  26. Sherry

    When constructing the data table, do you use the percent or decimal? For example, x = the number of jobs in a particular state and y = the percent of poverty in that state. Would y = 15.2% or would y = 0.152 for the calculation? Thanks

  27. Andale

    Sherry,

    Use decimal. That makes multiplication possible. For example, if you were to multiply 10% by 10%, you would first have to convert them to decimals anyway (.1 * .1).

    Regards,
    Stephanie

  28. deno

    Folks,

    We are in a grp project for our research class in medical informatics. We are to present the coorelation data in class , any suggestions on how to present this data graphically ? Any software any thing ? Can excel do it ?

    Thanks in advance for your help

    regards
    DenO
    f

  29. Andale

    Hi, Joshua,
    Thank you for your question. Unfortunately, time constraints prevent me from answering math questions in the comments. Could you post your question on our forums? One of our mods would be glad to help.
    Stephanie

  30. Andale

    Hi, Guarang,
    Thank you for your question. Unfortunately, time constraints prevent me from answering math questions in the comments. Could you post your question on our forums? One of our mods would be glad to help.
    Stephanie