Statistics How To

Covariance in Statistics: What is it? Example

Statistics Definitions > Covariance

Covariance is a measure of how much two random variables vary together. It’s similar to variance, but where variance tells you how a single variable varies, co variance tells you how two variables vary together.

Covariance

Image from U of Wisconsin.

The Covariance Formula

The formula is:
Cov(X,Y) = Σ E((X-μ)E(Y-ν)) / n-1 where:
X is a random variable
E(X) = μ is the expected value (the mean) of the random variable X and
E(Y) = ν is the expected value (the mean) of the random variable Y
n = the number of items in the data set

Example


Calculate covariance for the following data set:
x: 2.1, 2.5, 3.6, 4.0 (mean = 3.1)
y: 8, 10, 12, 14 (mean = 11)

Substitute the values into the formula and solve:
Cov(X,Y) = ΣE((X-μ)(Y-ν)) / n-1
= (2.1-3.1)(8-11)+(2.5-3.1)(10-11)+(3.6-3.1)(12-11)+(4.0-3.1)(14-11) /(4-1)
= (-1)(-3) + (-0.6)(-1)+(.5)(1)+(0.9)(3) / 3
= 3 + 0.6 + .5 + 2.7 / 3
= 6.8/3
= 2.267

The result is positive, meaning that the variables are positively related.

Note on dividing by n or n-1:
When dealing with samples, there are n-1 terms that have the freedom to vary (see: Degrees of Freedom). If you are finding the covariance of just two random variables, just divide by n.

Problems with Interpretation

A large covariance can mean a strong relationship between variables. However, you can’t compare variances over data sets with different scales (like pounds and inches). A weak covariance in one data set may be a strong one in a different data set with different scales.

The main problem with interpretation is that the wide range of results that it takes on makes it hard to interpret. For example, your data set could return a value of 3, or 3,000. This wide range of values is cause by a simple fact; The larger the X and Y values, the larger the covariance. A value of 300 tells us that the variables are correlated, but unlike the correlation coefficient, that number doesn’t tell us exactly how strong that relationship is. The problem can be fixed by dividing the covariance by the standard deviation to get the correlation coefficient.
Corr(X,Y) = Cov(X,Y) / σXσY

Advantages of the Correlation Coefficient

The Correlation Coefficient has several advantages over covariance for determining strengths of relationships:

  • Covariance can take on practically any number while a correlation is limited: -1 to +1.
  • Because of it’s numerical limitations, correlation is more useful for determining how strong the relationship is between the two variables.
  • Correlation does not have units. Covariance always has units
  • Correlation isn’t affected by changes in the center (i.e. mean) or scale of the variables

Questions? Post a comment and I’ll do my best to help!

If you prefer an online interactive environment to learn R and statistics, this free R Tutorial by Datacamp is a great way to get started. If you’re are somewhat comfortable with R and are interested in going deeper into Statistics, try this Statistics with R track.

Covariance in Statistics: What is it? Example was last modified: June 25th, 2017 by Andale

16 thoughts on “Covariance in Statistics: What is it? Example

  1. Claudio Delpino

    detail (although its clear):

    = (2.1-3.1)(8-11)+(2.5-3.1)(10-11)+(3.6-3.1)(12-11)+(4.0-3.1)(14-11) /4-1

    is missing a parenthesis on the denominator (4-1)

  2. Dave

    Can covariance or correlation be calculated using binary variable and numerical variable?

  3. Andale Post author

    Possibly. I’d need some more info about exactly what your variables are and what they represent. For example, you might be able to use eta or Cramer’s V to find correlation.
    Can you post more detail about your data?

  4. Jet

    Could anybody help with Cov(X – Y), where Cov is covariance?
    Cov(X – Y) not Cov(X,Y). That is covariance of difference of two random variables like we have as Var(X – Y) = Var(X) + Var(Y) – 2Cov(X,Y)

  5. Andale Post author

    Cov(X-Y) doesn’t make any sense. Could there be a typo in your question (where you got the X-Y from)?

  6. Ramadhani Adam Chumbi

    If X is chosen random variable among 1 2 3 4 and Y chosen at those greater than X. Find covariation

  7. Ramadhani Adam Chumbi

    If a number X is chosen at random from among intergers 1 2 3 4 and number Y is chosen from among those at least as large as X. Prove that cov(X,Y) = 5/8

  8. Ramadhani Adam Chumbi

    Am stuck on how to find X and Y N also comparison to formula cov ( X, y) = sum (x-x’)(y-y’)/N where X’and Y’all means mean

  9. Ogolla George

    I didn’t know how to compute covariance of x and y and now i know through your examples, thanks a lot

Leave a Reply

Your email address will not be published. Required fields are marked *