Regression Analysis > Homoscedasticity / Homogeneity of Variance / Assumption of Equal Variance

Simply put, homoscedasticity means “having the same scatter.” For it to exist in a set of data, the points must be about the same distance from the line, as shown in the picture above. The opposite is *hetero*scedasticity (“different scatter”), where points are at widely varying distances from the regression line.

Note that I said “distance” here and not variance. When viewing a graph, it’s easier to look at the distances from the points to the line to determine if a set of data shows homoscedasticity. Technically, it’s the *variance* that counts, and that’s what you’d use in calculations. However, as variance requires a formula, it’s impossible to eyeball on a graph.

As variance is just the standard deviation squared, you might also see homoscedasticity described as a condition where the standard deviations are equal for all points.

## In more formal terms

You’re rarely going to come across a set of data that has a variance of zero. You’re more likely to see variances ranging anywhere from 0.01 to 101.01. So when is a data set classified as having homoscedasticity? The general rule of thumb^{1} is:

If the ratio of the largest variance to the smallest variance is 1.5 or below, the data is homoscedastic.

## What is the Assumption of Equal Variance?

The assumption of equal variances (i.e. assumption of homoscedasticity) assumes that different samples have the same variance, even if they came from different populations. The assumption is found in many statistical tests, including Analysis of Variance (ANOVA) and Student’s T-Test. Other tests, like Welch’s T-Test, don’t require equal variances at all.

Running a test without checking for equal variances can have a significant impact on your results and may even invalidate them completely. How much your results are affected depends on which test you use and how sensitive that test is to unequal variances. For example, while a fixed-factor ANOVA test with equal sample sizes is only affected a tiny amount, an ANOVA with *unequal *sample sizes might give you completely invalid results.

The assumption of equal variances is also used in linear regression, which assumes that data is homoscedastic. In simple terms, if your data is widely spread about (like to cone shape in the heteroscedastic image above), regression isn’t going to work that well. For more on this topic, see Assumptions & Conditions for Regression.

## Testing for Homogeneity of Variance

Tests that you can run to check your data meets this assumption include:

If you prefer an online interactive environment to learn R and statistics, this free R Tutorial by Datacamp is a great way to get started. If you're are somewhat comfortable with R and are interested in going deeper into Statistics, try this Statistics with R track.

Comments are now closed for this post. Need help or want to post a correction? Please post a comment on our Facebook page and I'll do my best to help!