Statistics Definitions > Residual Sum of Squares, Total and ExplainedThe residual sum of squares is used to help you decide if a statistical model is a good fit for your data. It measures the overall difference between your data and the values predicted by your estimation model (a “residual” is a measure of the distance from a data point to a regression line). Total SS is related to the total sum and explained sum with the following formula:
Total SS = Explained SS + Residual Sum of Squares.
The Total SS tells you how much variation there is in the dependent variable.
Total SS = Σ(Yi – mean of Y)2.
Note: Sigma (Σ) is a mathematical term for summation or “adding up.” It’s telling you to add up all the possible results from the rest of the equation.
Sum of squares is a measure of how a data set varies around a central number (like the mean). You might realize by the phrase that you’re summing (adding up) squares — but squares of what? You’ll sometimes see this formula:
Other times you might see actual “squares”, like in this regression line:
Squares of numbers, as in 42 and 102 can be represented with actual geometric squares (image courtesy of UMBC.edu):
So the square shapes you see on regression lines are just representations of square numbers, like 52 or 92. When you’re looking for a sum of squares, use the formula ; to find the actual number that represents a sum of squares. A diagram (like the regression line above) is optional, and can supply a visual representation of what you’re calculating.
Find the Sum of Sq. for the following numbers: 3,5,7.
Step 1: Find the mean by adding the numbers together and dividing by the number of items in the set:
(3 + 5 + 7) / 3 = 15 / 3 = 5
Step 2: Subtract the mean from each of your data items:
3 – 5 = -2
5 – 5 = 0
7 – 5 = 2
Step 3: Square your results from Step 3:
-2 x -2 = 4
0 x 0 = 0
2 x 2 = 4
Step 4: Sum (add up) all of your numbers:
4 + 4 + 0 = 8.
Sum of Sq. in ANOVA and Regression
As you can probably guess, things get a little more complicated when you’re calculating sum of squares in regression analysis or hypothesis testing. It is rarely calculated by hand; instead, software like Excel or SPSS is usually used to calculate the result for you.
And in ANOVA it is calculated with:
The total SS = treatment sum of squares (SST) + SS of the residual error (SSE)
The Explained SS tells you how much of the variation in the dependent variable your model explained.
Explained SS = Σ(Y-Hat – mean of Y)2.
The residual sum of squares tells you how much of the dependent variable’s variation your model did not explain. It is the sum of the squared differences between the actual Y and the predicted Y:
Residual Sum of Squares = Σ e2
If all those formulas look confusing, don’t worry! It’s very, very unusual for you to want to use them. Finding the sum by hand is tedious and time-consuming. It involves a lot of subtracting, squaring and summing. Your calculations will be prone to errors, so you’re much better off using software like Excel to do the calculations. You won’t even need to know the actual formulas, as Excel works them behind the scenes.
The smaller the residual sum of squares, the better your model fits your data; The greater the residual sum of squares, the poorer your model fits your data. A value of zero means your model is a perfect fit. One major use is in finding the coefficient of determination (R2). The coefficient of determination is a ratio of the explained sum of squares to the total sum of squares.------------------------------------------------------------------------------
If you prefer an online interactive environment to learn R and statistics, this free R Tutorial by Datacamp is a great way to get started. If you're are somewhat comfortable with R and are interested in going deeper into Statistics, try this Statistics with R track.Comments are now closed for this post. Need help or want to post a correction? Please post a comment on our Facebook page and I'll do my best to help!