Introduction
Sum of Squared Residuals SSR is also known as residual sum of squares (RSS) or sum of squared errors (SSE). The following is the formula.
\[ SSR=\sum_{i=1}^{n} (\hat{y_i}-y_i)^2 \]
SSR can be used compare our estimated values and observed values for regression models. R can be used to calculate SSR, and the following is the core R syntax.
sum(residuals(fit)^2)
The following are 2 examples showing how to calculate SSR for linear regression models in R.
Example 1: Use data of mtcars
Step 1: calculate model fit
mtcarts is a built-in sample dataset in R. We can have a linear regression model of mpg as the DV and hp as the IV. We can use lm() to estimate the regression coefficients.
# use lm() to estimate regression coefficinets
fit <- lm(mpg~hp, data=mtcars)
# print the fit
summary(fit)
Output:
Call:
lm(formula = mpg ~ hp, data = mtcars)
Residuals:
Min 1Q Median 3Q Max
-5.7121 -2.1122 -0.8854 1.5819 8.2360
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 30.09886 1.63392 18.421 < 2e-16 ***
hp -0.06823 0.01012 -6.742 1.79e-07 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 3.863 on 30 degrees of freedom
Multiple R-squared: 0.6024, Adjusted R-squared: 0.5892
F-statistic: 45.46 on 1 and 30 DF, p-value: 1.788e-07
Step 2: Calculate SSR
After getting the fit, we use sum(residuals(fit)^2) to calculate SSR.
# calculate Sum of Squared Residuals (SSR)
sum(residuals(fit)^2)
Output:
[1] 447.6743
Thus, the Sum of Squared Residuals (SSR) is 447.67.
Example 2: Hypothetical data
Step 1: calculate model fit
The following hypothetical data has cities and stores as the IVs and sales as the DV. We write them in a linear model in lm() to estimate the regression coefficients.
After getting the fit, we use the sum(residuals(fit)^2) to calculate SSR.
x_1 = rep(c('City1','City2'),each=5)
x_2 = rep(c('store1','store2'), 5)
sales=c(10,20,20,50,30,10,5,4,12,4)
df <- data.frame (cities = x_1,
stores = x_2,
sales=sales)
# use lm() to estimate regression coefficinets
fit <- lm(sales~x_1*x_2, data=df)
# print the fit
summary(fit)
Output:
Show in New Window
Call:
lm(formula = mpg ~ hp, data = mtcars)
Residuals:
Min 1Q Median 3Q Max
-5.7121 -2.1122 -0.8854 1.5819 8.2360
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 30.09886 1.63392 18.421 < 2e-16 ***
hp -0.06823 0.01012 -6.742 1.79e-07 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 3.863 on 30 degrees of freedom
Multiple R-squared: 0.6024, Adjusted R-squared: 0.5892
F-statistic: 45.46 on 1 and 30 DF, p-value: 1.788e-07
[1] 447.6743
Show in New Window
Call:
lm(formula = sales ~ x_1 * x_2, data = df)
Residuals:
Min 1Q Median 3Q Max
-15.000 -3.125 -1.000 3.875 15.000
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 20.000 6.229 3.211 0.0184 *
x_1City2 -11.500 9.850 -1.168 0.2873
x_2store2 15.000 9.850 1.523 0.1786
x_1City2:x_2store2 -17.500 13.929 -1.256 0.2557
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 10.79 on 6 degrees of freedom
Multiple R-squared: 0.6282, Adjusted R-squared: 0.4422
F-statistic: 3.379 on 3 and 6 DF, p-value: 0.09539
Step 2: Calculate SSR
# calculate Sum of Squared Residuals (SSR)
sum(residuals(fit)^2)
Output:
[1] 698.5
Thus, the Sum of Squared Residuals (SSR) is 698.5. Thus, it is consistent with my other tutorial about ANOVA. Specifically, it is consistent with the residuals in the type 3 ANOVA.