Statistics How To

What is Correlation in Statistics?

Main > Definitions > What is Correlation?

What is Correlation?

Correlation is used to test relationships between quantitative variables or categorical variables. In other words, it’s a measure of how things are related. Some examples of data that have a high correlation:

  • Your caloric intake and your weight.
  • Your eye color and your relatives’ eye colors.
  • The amount of time your study and your GPA.

Some examples of data that have a low correlation (or none at all):

  • Your sexual preference and the type of cereal you eat.
  • A dog’s name and the type of dog biscuit they prefer.
  • The cost of a car wash and how long it takes to buy a soda inside the station.

Correlations are useful because if you can find out what relationship variables have, you can make predictions about future behavior. Knowing what the future holds is very important in the social sciences like government and healthcare. Businesses also use these statistics for budgets and business plans.

What is Correlation: The Correlation Coefficient.

A correlation coefficient is a way to put a value to the relationship. Correlation coefficients have a value of between -1 and 1. A “0″ means there is no relationship between the variables at all, while -1 or 1 means that there is a perfect negative or positive correlation (negative or positive correlation here refers to the type of graph the relationship will produce).

what is correlation

Graphs showing a correlation of -1, 0 and +1

What is Correlation: Types of correlation coefficients.

The most common correlation coefficient is the Pearson Correlation Coefficient. It’s used to test for linear relationships between data. In AP stats or elementary stats, the Pearson is likely the only one you’ll be working with. However, you may come across others, depending upon the type of data you are working with. For example, the Goodman and Kruskal lambda coefficient is a fairly common coefficient. The Goodman and Kruskal lambda coefficient can be symmetric, where you do not have to specify which variable is dependent, and asymmetric where the dependent variable is specified.

Goodman and Kruskal lambda coefficient

ε1 is the overall non-modal frequency and ε2 is the sum of the non-modal frequencies for each value of the independent variable.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>