Statistics Definitions > Coefficient of Determination

**Contents**:

## Coefficient of Determination (R Squared)

The coefficient of determination, R^{2}, is used to analyze how differences in one variable can be explained by a difference in a second variable. For example, *when *a person gets pregnant has a direct relation to when they give birth.

More specifically, R-squared gives you the percentage variation in y explained by x-variables. The range is 0 to 1 (i.e. 0% to 100% of the variation in y can be explained by the x-variables.

The coefficient of determination, R^{2}, is similar to the **correlation coefficient**, R. The correlation coefficient formula will tell you how strong of a linear relationship there is between two variables. R Squared is the square of the correlation coefficient, *r* (hence the term r squared). Watch this video for a short definition of r squared and how to find it:

## Finding R Squared / The Coefficient of Determination

**Step 1: ** *Find the correlation coefficient, r (it may be given to you in the question).* Example, r = **0.543**.

**Step 2: ** *Square the correlation coefficient.*

0.543^{2} = **.295**

**Step 3: ***Convert the correlation coefficient to a percentage*.

.295 = **29.5%**

That’s it!

## Meaning of the Coefficient of Determination

The coefficient of determination can be thought of as a percent. It gives you an idea of how many data points fall within the results of the line formed by the regression equation. The higher the coefficient, the higher percentage of points the line passes through when the data points and line are plotted. If the coefficient is 0.80, then 80% of the points should fall within the regression line. Values of 1 or 0 would indicate the regression line represents all or none of the data, respectively. A higher coefficient is an indicator of a better goodness of fit for the observations.

The CoD can be **negative**, although this usually means that your model is a poor fit for your data. It can also become negative if you didn’t set an intercept.

## Usefulness of R^{2}

The usefulness of R^{2} is its ability to find the likelihood of future events falling within the predicted outcomes. The idea is that if more samples are added, the coefficient would show the probability of a new point falling on the line.

Even if there is a strong connection between the two variables, determination does not prove causality. For example, a study on birthdays may show a large number of birthdays happen within a time frame of one or two months. This does not mean that the passage of time or the change of seasons causes pregnancy.

## Syntax

The coefficient of determination is usually written as R^{2}_p. The “p” indicates the number of columns of data, which is useful when comparing the R^{2} of different data sets.

## What is the Adjusted Coefficient of Determination?

The Adjusted Coefficient of Determination (Adjusted R-squared) is an adjustment for the Coefficient of Determination that takes into account **the number of variables in a data set.** It also penalizes you for points that don’t fit the model.

You might be aware that few values in a data set (a too-small sample size) can can lead to misleading statistics, but you may not be aware that **too many **data points can also lead to problems. Every time you add a data point in regression analysis, R^{2} will increase. **R ^{2} never decreases.** Therefore, the more points you add, the better the regression will seem to “fit” your data. If your data doesn’t quite fit a line, it can be tempting to keep on adding data until you have a better fit.

Some of the points you add will be significant (fit the model) and others will not. R^{2} doesn’t care about the insignificant points. **The more you add, the higher the coefficient of determination**.

The adjusted R^{2} can be used to include a **more appropriate** number of variables, thwarting your temptation to keep on adding variables to your data set. The adjusted R^{2} will increase only if a new data point improves the regression more than you would expect by chance. R^{2} doesn’t include all data points, is always lower than R^{2} and can be negative (although it’s usually positive). Negative values will likely happen if R^{2} is close to zero — after the adjustment, the value will dip below zero a little.

For more, see: Adjusted R-Squared.

Check out my Youtube Channel for more stats tips and help!

If you prefer an online interactive environment to learn R and statistics, this *free R Tutorial by Datacamp* is a great way to get started. If you're are somewhat comfortable with R and are interested in going deeper into Statistics, try *this Statistics with R track*.

*Facebook page*and I'll do my best to help!

I think you are either missing a square root operator on the entire denominator of the equation listed (if you meant to give the equation for the correlation coefficient r) or you are missing a square operator for the entire numerator (if you meant to give the equation for R^2 (R-squared). Please see your link in step 1 for “correlation coefficient”. There, you will find the equation for correlation, which has the same numerator as shown on this page, but the denominator has the square root operator. You just cannot square the entire equation and unsquare the denominator but not square the numerator as in the equation here. I suspect something got dropped somewhere and it is a quick fix.

In any case, the equation shown on this page is neither the formula for the correlation coefficient (r) or the coefficient of determination which is its square (R-squared).

Thanks, Michael. The formula is now fixed (not sure how that square root got dropped!).

the higher the Pearson r, the higher is the coefficient of determination? is this true?

No, because the CoD is R-squared. So for example a Pearson of .25 would be .25 * .25.

if we have r^2 and we need to find correlation coefficient, what should we do and how to find the correct sign for the coefficient.

Take the square root.

Look a a graph of your data to see if it is positive or negative correlation.

Simple and clear definition.Thanks !

All questions I had solved…

Hello, Please tell me How much should be the acceptable difference between the coefficient of determination value and the adjusted coefficient of determination value? I mean if there is a gap between the two values, what does this mean and what is the solution? ( R squared= 51.37%, Adjusted R-Squared= 44.03%)

There’s no “acceptable” difference. It just means you have some useless points in your regression model…that’s why adjusted r2 is lower. If you keep on adding useless variables, r2 can get down to zero ;)

Thank you very much Andale

But please can you give me a reference to cite in my thesis ?? to prove that there is There’s no “acceptable” difference.

All the factors – that have been added to the model – have been previously tested that they are correlated to the dependent variable and so I added them to the model.

Abeer,

Sorry, I do not have a reference in hand that states that explicitly. That’s just what I know…

Regards,

S

thank you very much

Sorry;! How can I come up with a model if I have more than 1000 data and how to use the coefficient of determination in analysis (interpretation)

Please, Give an example how it can be negative.

It’s usually a result of a poor model fit.