Statistics How To

Correlation in Statistics: Correlation Analysis Explained

Contents:
What is Correlation?
The Correlation Coefficient
Correlation in Excel

Definition

Correlation is used to test relationships between quantitative variables or categorical variables. In other words, it’s a measure of how things are related. The study of how variables are correlated is called correlation analysis.

Some examples of data that have a high correlation:

  • Your caloric intake and your weight.
  • Your eye color and your relatives’ eye colors.
  • The amount of time your study and your GPA.

Some examples of data that have a low correlation (or none at all):

  • Your sexual preference and the type of cereal you eat.
  • A dog’s name and the type of dog biscuit they prefer.
  • The cost of a car wash and how long it takes to buy a soda inside the station.

Correlations are useful because if you can find out what relationship variables have, you can make predictions about future behavior. Knowing what the future holds is very important in the social sciences like government and healthcare. Businesses also use these statistics for budgets and business plans.

The Correlation Coefficient

A correlation coefficient is a way to put a value to the relationship. Correlation coefficients have a value of between -1 and 1. A “0” means there is no relationship between the variables at all, while -1 or 1 means that there is a perfect negative or positive correlation (negative or positive correlation here refers to the type of graph the relationship will produce).

what is correlation

Graphs showing a correlation of -1, 0 and +1

Types

The most common correlation coefficient is the Pearson Correlation Coefficient. It’s used to test for linear relationships between data. In AP stats or elementary stats, the Pearson is likely the only one you’ll be working with. However, you may come across others, depending upon the type of data you are working with. Goodman and Kruskal’s lambda coefficient is a fairly common coefficient. The Goodman and Kruskal lambda coefficient can be symmetric, where you do not have to specify which variable is dependent, and asymmetric where the dependent variable is specified.

Goodman and Kruskal lambda coefficient

ε1 is the overall non-modal frequency and ε2 is the sum of the non-modal frequencies for each value of the independent variable.

Correlation in Excel

Correlation in Excel 2013Finding Pearson’s correlation coefficients by hand is ugly and involves a lot of lengthy math. However, Excel can make those calculations for you in a fraction of a second. You have two options in Excel (2013 and later): The CORREL function or the Data Analysis Toolpak.

If you’re familiar with entering functions in Excel you could enter the CORREL command:
=CORREL(array 1, array 2)
For example, =CORREL(A2:A6,B2:B6)

However, the Data Analysis Toolpak is much easier overall, because you don’t have to remember (or hunt for) an array of functions; They are all just listed in the Data Analysis list. If Data Analysis isn’t showing to the far right of the data tab, make sure you have loaded the Data Analysis Toolpak. The Data Analysis Toolpak is an optional add-in to Excel which gives you access to many functions, including:

Step 1: Type your data into a worksheet in Excel. The best format is two columns. Place your x-values in column A and your y-values in column B.

Step 2: Click the “Data” tab and then click “Data Analysis.”

Step 3: Click “Correlation” and then click “OK.”

Step 4: Type the location for your x-y variables in the Input
Range box. Or, use your cursor to highlight the area where your variables are located.

Step 5: Click either the “columns” or “rows” option to let Excel know how your data is laid out. In most cases, you’ll click “columns” as that’s the standard way to lay out data in Excel.

Step 6: Check the “Labels in first row” if you have column headers.

Step 7: Click the “Output Range” text box and then select an area on the worksheet where you want your output to go.

That’s it!

Check out our YouTube channel for more Excel tips and help!

------------------------------------------------------------------------------

Need help with a homework or test question? With Chegg Study, you can get step-by-step solutions to your questions from an expert in the field. If you'd rather get 1:1 study help, Chegg Tutors offers 30 minutes of free tutoring to new users, so you can try them out before committing to a subscription.

If you prefer an online interactive environment to learn R and statistics, this free R Tutorial by Datacamp is a great way to get started. If you're are somewhat comfortable with R and are interested in going deeper into Statistics, try this Statistics with R track.

Comments? Need to post a correction? Please post a comment on our Facebook page.

Check out our updated Privacy policy and Cookie Policy

Correlation in Statistics: Correlation Analysis Explained was last modified: August 14th, 2018 by Stephanie