Statistics How To

Adjusted R2 / Adjusted R-Squared: What is it used for?

Statistics Definitions > Adjusted r2

Watch the video or read the article below:

Adjusted R2: Overview

Adjusted R2 is a special form of R2, the coefficient of determination.

adjusted r2

The adjusted R2 has many applications in real life. Image: USCG

R2 shows how well terms (data points) fit a curve or line. Adjusted R2 also indicates how well terms fit a curve or line, but adjusts for the number of terms in a model. If you add more and more useless variables to a model, adjusted r-squared will decrease. If you add more useful variables, adjusted r-squared will increase.
Adjusted R2 will always be less than or equal to R2. You only need R2 when working with samples. In other words, R2 isn’t necessary when you have data from an entire population.

The formula is:
r squared adjusted

  • N is the number of points in your data sample.
  • K is the number of independent regressors, i.e. the number of variables in your model, excluding the constant.

If you already know R2 then it’s a fairly simple formula to work. However, if you do not already have R2 then you’ll probably not want to calculate this by hand! (If you must, see How to Calculate the Coefficient of Determination). There are many statistical packages that can calculated adjusted r squared for you. Adjusted r squared is given as part of Excel regression output. See: Excel regression analysis output explained.

Meaning of Adjusted R2

Both R2 and the adjusted R2 give you an idea of how many data points fall within the line of the regression equation. However, there is one main difference between R2 and the adjusted R2: R2 assumes that every single variable explains the variation in the dependent variable. The adjusted R2 tells you the percentage of variation explained by only the independent variables that actually affect the dependent variable.

How Adjusted R2 Penalizes You

The adjusted R2 will penalize you for adding independent variables (K in the equation) that do not fit the model. Why? In regression analysis, it can be tempting to add more variables to the data as you think of them. Some of those variables will be significant, but you can’t be sure that significance is just by chance. The adjusted R2 will compensate for this by that penalizing you for those extra variables.

While values are usually positive, they can be negative as well. This could happen if your R2 is zero; After the adjustment, the value can dip below zero. This usually indicates that your model is a poor fit for your data. Other problems with your model can also cause sub-zero values, such as not putting a constant term in your model.

Problems with R2 that are corrected with an adjusted R2

  1. R2 increases with every predictor added to a model. As R2 always increases and never decreases, it can appear to be a better fit with the more terms you add to the model. This can be completely misleading.
  2. Similarly, if your model has too many terms and too many high-order polynomials you can run into the problem of over-fitting the data. When you over-fit data, a misleadingly high R2 value can lead to misleading projections.

If you prefer an online interactive environment to learn R and statistics, this free R Tutorial by Datacamp is a great way to get started. If you’re are somewhat comfortable with R and are interested in going deeper into Statistics, try this Statistics with R track.

Adjusted R2 / Adjusted R-Squared: What is it used for? was last modified: August 18th, 2017 by Andale

13 thoughts on “Adjusted R2 / Adjusted R-Squared: What is it used for?

  1. michael

    Very well presented, simple and clear. And with a great accent! Statistics is more fun to learn when accents are involved. Thanks

  2. hidayah

    Thank you very much for the useful explanation regarding R squared and adjusted r squared. I’m not quite sure how to interpret the adjusted r squared if its required to just compare in one estimated regression model. I know how to tell the difference of adjusted r squared for two different regression model because r squared is about dof but I am totally do not understand how to comment the difference between small variation in r squared and adjusted r squared for only one estimated regression model without changing any independent variables. you make it clear the difference between r squared and adjusted r squared. tqvm!

  3. twesigye chrispus

    Using the formula for adjusted r.square,I have obtained a different answer from the one computed using spss package,however the diff is negligable,I dont know why?

  4. Andale Post author

    Without seeing your data, I can’t really say. I’d have a colleague check your inputs.

  5. Fiona

    Hi ! I have a question. I have a set of four variables, and a lot of observations. When I suppress a single extreme observation of a variable, my r square falls for every variables, as the adjusted R squared does.

    What could explain that ?

  6. Andale Post author

    I’d have to look at your data to say for sure. But, I’d say that observation makes for a better model at first glance. But just because the model fits your data really well doesn’t mean that it’s a good model :)

  7. avaneesh Kumar

    so this means ( what I infer ) if we add more and more values to the data for fitting then fit of model for linear regression will adjust the value of coefficient of determination and becomes adjusted R^2.

    Am I right?

  8. Andale Post author

    Hello, Avaneesh,
    That’s not correct. The adjustment doesn’t happen automatically. The formulas are different, so you either choose to used r-squared or you choose to use adjusted-r-squared.

  9. Haruna Abdul Rauf

    In a two model situation if the coefficient of determination of one model is greater than the other model how will this be interpreted

Leave a Reply

Your email address will not be published. Required fields are marked *