Regression Analysis > Homoscedasticity / Homogeneity of Variance / Assumption of Equal Variance
Simply put, homoscedasticity means “having the same scatter.” For it to exist in a set of data, the points must be about the same distance from the line, as shown in the picture above. The opposite is heteroscedasticity (“different scatter”), where points are at widely varying distances from the regression line.
Note that I said “distance” here and not variance. When viewing a graph, it’s easier to look at the distances from the points to the line to determine if a set of data shows homoscedasticity. Technically, it’s the variance that counts, and that’s what you’d use in calculations. However, as variance requires a formula, it’s impossible to eyeball on a graph.
In more formal terms
You’re rarely going to come across a set of data that has a variance of zero. You’re more likely to see variances ranging anywhere from 0.01 to 101.01. So when is a data set classified as having homoscedasticity? The general rule of thumb1 is:
What is the Assumption of Equal Variance?
The assumption of equal variances (i.e. assumption of homoscedasticity) assumes that different samples have the same variance, even if they came from different populations. The assumption is found in many statistical tests, including Analysis of Variance (ANOVA) and Student’s T-Test. Other tests, like Welch’s T-Test, don’t require equal variances at all.
Running a test without checking for equal variances can have a significant impact on your results and may even invalidate them completely. How much your results are affected depends on which test you use and how sensitive that test is to unequal variances. For example, while a fixed-factor ANOVA test with equal sample sizes is only affected a tiny amount, an ANOVA with unequal sample sizes might give you completely invalid results.
The assumption of equal variances is also used in linear regression, which assumes that data is homoscedastic. In simple terms, if your data is widely spread about (like to cone shape in the heteroscedastic image above), regression isn’t going to work that well. For more on this topic, see Assumptions & Conditions for Regression.
Testing for Homogeneity of Variance
Tests that you can run to check your data meets this assumption include:
If you prefer an online interactive environment to learn R and statistics, this free R Tutorial by Datacamp is a great way to get started. If you're are somewhat comfortable with R and are interested in going deeper into Statistics, try this Statistics with R track.Comments? Need to post a correction? Please post on our Facebook page.