This tutorial includes the formula and examples for Sum of Squares (SS). Sum of Squares (SS) is a measure of deviation from the mean and the following is its formula. It is to square the distance between each data point and the mean then add them together.
Formula of Sum of Squares (SS)
The following is the formula for Sum of Squares (SS).
\[ SS= \sum_{i=1}^{n} (x_i-\bar{x})^2 \]
where,
- \( x_i \) is the ith element in the set.
- \( \bar{x} \) is the mean of all the elements in the set.
Example for Sum of Squares (SS)
Suppose that you have the following set of 5 numbers, which are the sales number in City 1. You can calculate the Sum of squares of it.
Sales | |
City 1 | 10 |
City 1 | 20 |
City 1 | 30 |
City 1 | 20 |
City 1 | 30 |
We need to first calculate its mean, which is 22. Then, each number in the set minus the mean. Finally, square the difference and add them up. We can get the SS is 2700.
\[ SS= (10-22)^2+2(20-22)^2+2(30-22)^2=2700 \]
Relationship between Sum of Squares and Sample Variance
Sample variance \( S^2\) is the ratio between sum of squares (ss) and degree of freedom. Sample variance is to calculate how varied a sample is. The following is the formula of sample variance \( S^2\).
\[ S^2= \frac{SS}{n-1}=\frac{ \sum_{i=1}^{n} (x_i-\bar{x})^2 }{n-1}\]
where,
- SS is the Sum of Squares.
- n is the number of the observation, and n-1 is the digree of freedom of sample variance.
The following is the example of calculating sample variance, which is 675.
\[ S^2= \frac{ \sum_{i=1}^{n} (x_i-\bar{x})^2 }{n-1} = \frac{2700}{5-1}= 675\]
Difference between Sum of Squares (SS) and Sum of Squared Residuals (SSR)
Sum of Squares (SS) is a measure of deviation from the mean, whereas Sum of Squared Residuals (SSR) is to compare estimated values and observed values. I have tutorials on how to use R and Python to calculate SSR.
The key difference is that Sum of Squares (SS) is for a set of data, and it does not matter what that set is or what the nature of the data is. In contrast, Sum of Squared Residuals (SSR) is to compare predicted values and observed values. For instance, in linear regression models, it calculates the difference between predicted y values and observed y values.