Statistics Definitions > Variance
What is Variance?
Variance measures how far a data set is spread out. The technical definition is “The average of the squared differences from the mean,” but all it really does is to give you a very general idea of the spread of your data. A value of zero means that there is no variability; All the numbers in the data set are the same.
- The data set 12, 12, 12, 12, 12 has a var. of zero (the numbers are identical).
- The data set 12, 12, 12, 12, 13 has a var. of 0.167; a small change in the numbers equals a very small var.
- The data set 12, 12, 12, 12, 13,013 has a var. of 28171000; a large change in the numbers equals a very large number.
How Much Can Data Vary?
The smallest a variance gets is zero, but technically, it can be infinite with numbers in the millions or even billions and beyond.
How do I calculate it?
The variance for a population is calculated by:
- Finding the mean(the average).
- Subtracting the mean from each number in the data set and then squaring the result. The results are squared to make the negatives positive. Otherwise negative numbers would cancel out the positives in the next step. It’s the distance from the mean that’s important, not positive or negative numbers.
- Averaging the squared differences.
However, it’s more usual in statistics to find the variance for a sample. When you calculate it for a sample, divide by the sample size minus one (n-1) when calculating the average squared difference in Step 3 above.
Use our online var. and standard deviation calculator, which shows you the step-by-step calculations for your individual data set.
The square root of the variance is the standard deviation. While variance gives you a rough idea of spread, the standard deviation is more concrete, giving you exact distances from the mean.
Variance of a Binomial Distribution
A binomial distribution is a simple experiment where there is “success” or “failure.” For example, choosing a winning lottery ticket could be a binomial experiment (you either win or lose!). Tossing a coin to try and get heads is also binomial (with tossing a heads being a “success” and a tails a “failure”). The formula for the variance of binomial distribution is n*p (1-p) or n*p*q. The two formulas are equivalent because q = (1-p).
Sample problem:If you flip a coin 50 times and try to get heads, what is the variance of binomial distribution?
Step 1: Find “p”. The first step to solving this problem is to realize that the probability of getting a heads is 50 percent, or .5. Therefore, “p” (the probability) is .5.
Step 2: Find “q”, or 1-p. These two are equivalent. They are the probability of not getting a heads (in other words, the probability of getting a tails). 1 – 0.5 = 0.5. Therefore, “q” (or 1 – p) = 0.5.
Step 3: Multiply Step 1 (p) by Step 2 (q) by “n” (the number of trials). We are flipping the coin 50 times, so the number of trials is 50 (n = 50).
N * p * q = 50 * .5 * .5 = 12.5.
The variance of binomial distribution for flipping a coin 50 times is 12.5.
OK, So what does the Binomial Variance mean?
In essence, not a lot! The variance isn’t used for much at all, except for calculating standard deviation. For example, the standard deviation for this particular binomial distribution is:
√12.5 = 3.54.
You’ll use the variance for things like calculating z-scores (this typically comes later in a stats class, after normal distributions), which has a standard deviation in the bottom of the formula:
The population variance is a type of parameter. If you aren’t sure what a parameter is, you may want to review:
What is the Difference Between a Statistic and a Parameter?
How to find the Population Variance.
Most of the time in statistics, you’ll want to find the sample variance, not the population variance. Why? Because statistics is usually all about making inferences from samples, not populations. If you had all of the data from a population, there would be no need for statistics at all! That said, there really is very little difference between the formula for the population variance and the formula for the sample variance. If you have sample data, you can still use this formula. You’d just need to insert your data into the columns instead of your population data. If you prefer to plug the numbers straight into the formula, just make sure you use the population mean and not the sample mean(). In addition, the most common sample variance formula uses n-1 in the denominator instead of n.
Sample problem: Find the population variance for the following set of numbers: 28, 29, 30, 31, 32.
Step 1: Draw a table. Label the columns as shown and then write down your X values (the items in your population) in column 1:
Step 2: Find the mean. The mean for this set of data is (28 + 29 + 30 + 31 + 32) / 5 = 30.
Step 3: Fill in column 2. This column is your X value minus the mean. For example, the first entry is 28 – 30 = -2.
Step 4: Square the values from Step 3 and place those squares in the third column:
Step 5: Add up all of the numbers in column 3 (this is the summation Σ part of the formula):
4 + 1 + 0 + 1 + 4 = 10
Step 6: Divide by the number of items in your data set:
10 / 5 = 2
The population variance for this set of data is 2.
Check out our YouTube channel for hundreds of step-by-step statistics videos.
If you prefer an online interactive environment to learn R and statistics, this free R Tutorial by Datacamp is a great way to get started. If you’re are somewhat comfortable with R and are interested in going deeper into Statistics, try this Statistics with R track.