Statistics Definitions > What is Correlation?

## What is Correlation?

Correlation is used to test relationships between quantitative variables or categorical variables. In other words, it’s a measure of how things are related. The study of how variables are correlated is called **correlation analysis.**

Some examples of data that have a **high correlation:**

- Your caloric intake and your weight.
- Your eye color and your relatives’ eye colors.
- The amount of time your study and your GPA.

Some examples of data that have a **low correlation **(or none at all):

- Your sexual preference and the type of cereal you eat.
- A dog’s name and the type of dog biscuit they prefer.
- The cost of a car wash and how long it takes to buy a soda inside the station.

Correlations are useful because if you can find out what relationship variables have, you can make **predictions about future behavior**. Knowing what the future holds is very important in the social sciences like government and healthcare. Businesses also use these statistics for budgets and business plans.

### What is Correlation: The Correlation Coefficient.

A correlation coefficient is a way to put a value to the relationship. Correlation coefficients have a value of between -1 and 1. A “0” means there is **no relationship** between the variables at all, while -1 or 1 means that there is a **perfect negative or positive correlation** (negative or positive correlation here refers to the type of graph the relationship will produce).

### What is Correlation: Types of correlation coefficients.

The most common correlation coefficient is the Pearson Correlation Coefficient. It’s used to test for linear relationships between data. In AP stats or elementary stats, the Pearson is likely the only one you’ll be working with. However, you may come across others, depending upon the type of data you are working with. For example, the Goodman and Kruskal lambda coefficient is a fairly common coefficient. The **Goodman and Kruskal lambda coefficient **can be symmetric, where you do not have to specify which variable is dependent, and asymmetric where the dependent variable is specified.

ε_{1} is the overall non-modal frequency and ε_{2} is the sum of the non-modal frequencies for each value of the independent variable.