Statistics Definitions > ANCOVA
What is ANCOVA?
ANCOVA is a blend of ANOVA and regression. It’s commonly used to compare two regression lines although it’s possible to compare multiple regression lines. If the slopes for all lines are the same, ANCOVA can test them to see which lines have significantly different Y intercepts.
Like regression analysis, ANCOVA enables you to look at how an independent variable acts on a dependent variable. However, ANCOVA takes this one step further by removing any effect of covariates. For example, confounding variables are covariates. In other words, ANCOVA will tell you if the independent variable has an effect on the dependent variable after the confounding variables have been removed. The steps the procedure performs are:
- A regression between the independent and dependent variables.
- Identify the residual values from the regression analysis.
- Perform an ANOVA on the residuals.
For example, you might be interested in finding out if a particular therapy works for depressed patients in three treatment groups versus one control group. An ANOVA can tell if if the treatment works; an ANCOVA can control for other factors that might influence the outcome such as family life, job status, or drug use.
ANCOVA is also a useful tool to explain within-group variance. It takes the unexplained variances from the ANOVA test and tries to explain those variances with confounding variables (or other covariates). You can use multiple possible covariates, but the more you enter, the fewer degrees of freedom you’ll have. What this means for your test is that entering a weak covariate isn’t a good idea as it will reduce the statistical power. The lower the power, the less likely you’ll be able to rely on the results from your test. If you enter strong covariates, this has the opposite effect: it can increase the power of your test.
Assumptions for ANCOVA
Assumptions for ANCOVA are basically the same as assumptions for ANOVA. You should check that the following are true before considering ANCOVA.
- Independent variables (minimum of two) should be categorical variables.
- The dependent variable and covariate should be continuous variables (measured on an interval scale or ratio scale.)
- Observations should be independent. In other words, don’t put participants into more than one group.
The following assumptions can usually be checked with software.
- Normality: the dependent variable should be roughly normally distributed for each of category of independent variables.
- Data should show homogeneity of variance.
- The covariate and dependent variable (at each level of the independent variable) should be linearly related.
- Your data should be homoscedastic of Y for each value of X.
- There should be no interaction between the covariate and the independent variable. In other words, there should be homogeneity of regression slopes.