What is Explained Variance?
Explained variance (also called explained variation) is used to measure the discrepancy between a model and actual data. In other words, it’s the part of the model’s total variance that is explained by factors that are actually present and isn’t due to error variance.
Higher percentages of explained variance indicates a stronger strength of association. It also means that you make better predictions (Rosenthal & Rosenthal, 2011).
r2 = R2 = η2
Explained variance can be denoted with r2. In ANOVA, it’s called eta squared (η2) and in regression analysis, it’s called the Coefficient of Determination (R2). The three terms are basically synonymous, except that R2 assumes that changes in the dependent variable are due to a linear relationship with the independent variable; Eta2 does not have this underlying assumption.
R2 in regression has a similar interpretation: what proportion of variance in Y can be explained by X (Warner, 2013).
The Problems with Multiple Predictors
In general, the more predictor variables you add, the higher the explained variance. The amount of overlapping variance (the variance explained by more than one predictors) also increases. However, there comes a point of diminishing returns when new predictors in the model result in an inability to tell which predictor is producing what result. Furthermore, if you add two highly correlated predictors to a model, you introduce the possibility of multicollinearity .
On the other hand, adding too few predictors can also pose a problem: Omitting a predictor variable that can potentially explain some of the variance results in bias. Therefore, a careful balance must be made between too many predictors and too few.
Rosenthal, G. & Rosenthal, J. (2011). Statistics and Data Interpretation for Social Work. Springer Publishing Company.
Warner, R. (2013). Applied Statistics: From Bivariate Through Multivariate Techniques. SAGE.