Statistics How To

Coefficient of Determination (R Squared): Definition, Calculation

Statistics Definitions > Coefficient of Determination

Contents:

Coefficient of Determination (R Squared)

The coefficient of determination, R2, is used to analyze how differences in one variable can be explained by a difference in a second variable. For example, when a person gets pregnant has a direct relation to when they give birth. The coefficient of determination is similar to the correlation coefficient, R. The correlation coefficient formula will tell you how strong of a linear relationship there is between two variables. R Squared is the square of the correlation coefficient, r (hence the term r squared). Watch this video for a short definition of r squared and how to find it:

Finding R Squared / The Coefficient of Determination

Step 1: Find the correlation coefficient, r (it may be given to you in the question). Example, r = 0.543.
coefficient of determination


Step 2: Square the correlation coefficient.
0.5432 = .295

Step 3: Convert the correlation coefficient to a percentage.
.295 = 29.5%


That’s it!

Meaning of the Coefficient of Determination

The coefficient of determination can be thought of as a percent. It gives you an idea of how many data points fall within the results of the line formed by the regression equation. The higher the coefficient, the higher percentage of points the line passes through when the data points and line are plotted. If the coefficient is 0.80, then 80% of the points should fall within the regression line. Values of 1 or 0 would indicate the regression line represents all or none of the data, respectively. A higher coefficient is an indicator of a better goodness of fit for the observations.

Usefulness of R2

The usefulness of R2 is its ability to find the likelihood of future events falling within the predicted outcomes. The idea is that if more samples are added, the coefficient would show the probability of a new point falling on the line.
Even if there is a strong connection between the two variables, determination does not prove causality. For example, a study on birthdays may show a large number of birthdays happen within a time frame of one or two months. This does not mean that the passage of time or the change of seasons causes pregnancy.

Syntax

The coefficient of determination is usually written as R2_p. The “p” indicates the number of columns of data, which is useful when comparing the R2 of different data sets.

Back to Top

What is the Adjusted Coefficient of Determination?


The Adjusted Coefficient of Determination (Adjusted R-squared) is an adjustment for the Coefficient of Determination that takes into account the number of variables in a data set. It also penalizes you for points that don’t fit the model.

You might be aware that few values in a data set (a too-small sample size) can can lead to misleading statistics, but you may not be aware that too many data points can also lead to problems. Every time you add a data point in regression analysis, R2 will increase. R2 never decreases. Therefore, the more points you add, the better the regression will seem to “fit” your data. If your data doesn’t quite fit a line, it can be tempting to keep on adding data until you have a better fit.

Some of the points you add will be significant (fit the model) and others will not. R2 doesn’t care about the insignificant points. The more you add, the higher the coefficient of determination.

The adjusted R2 can be used to include a more appropriate number of variables, thwarting your temptation to keep on adding variables to your data set. The adjusted R2 will increase only if a new data point improves the regression more than you would expect by chance. R2 doesn’t include all data points, is always lower than R2 and can be negative (although it’s usually positive).

For more, see: Adjusted R-Squared.

Questions? Post a comment and I’ll do my best to help!

Check out my Youtube Channel for more stats tips and help!

Coefficient of Determination (R Squared): Definition, Calculation was last modified: March 23rd, 2017 by Andale

13 thoughts on “Coefficient of Determination (R Squared): Definition, Calculation

  1. Michael Goldstein

    I think you are either missing a square root operator on the entire denominator of the equation listed (if you meant to give the equation for the correlation coefficient r) or you are missing a square operator for the entire numerator (if you meant to give the equation for R^2 (R-squared). Please see your link in step 1 for “correlation coefficient”. There, you will find the equation for correlation, which has the same numerator as shown on this page, but the denominator has the square root operator. You just cannot square the entire equation and unsquare the denominator but not square the numerator as in the equation here. I suspect something got dropped somewhere and it is a quick fix.

    In any case, the equation shown on this page is neither the formula for the correlation coefficient (r) or the coefficient of determination which is its square (R-squared).

  2. Andale Post author

    No, because the CoD is R-squared. So for example a Pearson of .25 would be .25 * .25.

  3. saif

    if we have r^2 and we need to find correlation coefficient, what should we do and how to find the correct sign for the coefficient.

  4. Andale Post author

    Take the square root.
    Look a a graph of your data to see if it is positive or negative correlation.

  5. Abeer

    Hello, Please tell me How much should be the acceptable difference between the coefficient of determination value and the adjusted coefficient of determination value? I mean if there is a gap between the two values, what does this mean and what is the solution? ( R squared= 51.37%, Adjusted R-Squared= 44.03%)

  6. Andale Post author

    There’s no “acceptable” difference. It just means you have some useless points in your regression model…that’s why adjusted r2 is lower. If you keep on adding useless variables, r2 can get down to zero ;)

  7. Abeer

    Thank you very much Andale
    But please can you give me a reference to cite in my thesis ?? to prove that there is There’s no “acceptable” difference.
    All the factors – that have been added to the model – have been previously tested that they are correlated to the dependent variable and so I added them to the model.

  8. Andale Post author

    Abeer,
    Sorry, I do not have a reference in hand that states that explicitly. That’s just what I know…
    Regards,
    S

Leave a Reply

Your email address will not be published. Required fields are marked *