Introduction
Sum of Squared Residuals SSR is also known as residual sum of squares (RSS) or sum of squared errors (SSE). The following is the formula.
\[ SSR=\sum_{i=1}^{n} (\hat{y_i}-y_i)^2 \]
SSR can be used compare our estimated values and observed values for regression models. R can be used to calculate SSR, and the following is the core R syntax.
sum(residuals(fit)^2)
The following are 2 examples showing how to calculate SSR for linear regression models in R.
Example 1: Use data of mtcars
Step 1: calculate model fit
mtcarts
is a built-in sample dataset in R. We can have a linear regression model of mpg as the DV
and hp as the IV
. We can use lm()
to estimate the regression coefficients.
# use lm() to estimate regression coefficinets
fit <- lm(mpg~hp, data=mtcars)
# print the fit
summary(fit)
Output:
Call: lm(formula = mpg ~ hp, data = mtcars) Residuals: Min 1Q Median 3Q Max -5.7121 -2.1122 -0.8854 1.5819 8.2360 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 30.09886 1.63392 18.421 < 2e-16 *** hp -0.06823 0.01012 -6.742 1.79e-07 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 3.863 on 30 degrees of freedom Multiple R-squared: 0.6024, Adjusted R-squared: 0.5892 F-statistic: 45.46 on 1 and 30 DF, p-value: 1.788e-07
Step 2: Calculate SSR
After getting the fit
, we use sum(residuals(fit)^2)
to calculate SSR.
# calculate Sum of Squared Residuals (SSR)
sum(residuals(fit)^2)
Output:
[1] 447.6743
Thus, the Sum of Squared Residuals (SSR) is 447.67.
Example 2: Hypothetical data
Step 1: calculate model fit
The
following hypothetical data has cities and stores as the IVs
and sales as the DV
. We write them in a linear model in lm()
to estimate the regression coefficients.
After getting the fit
, we use the sum(residuals(fit)^2)
to calculate SSR.
x_1 = rep(c('City1','City2'),each=5)
x_2 = rep(c('store1','store2'), 5)
sales=c(10,20,20,50,30,10,5,4,12,4)
df <- data.frame (cities = x_1,
stores = x_2,
sales=sales)
# use lm() to estimate regression coefficinets
fit <- lm(sales~x_1*x_2, data=df)
# print the fit
summary(fit)
Output:
Show in New Window Call: lm(formula = mpg ~ hp, data = mtcars) Residuals: Min 1Q Median 3Q Max -5.7121 -2.1122 -0.8854 1.5819 8.2360 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 30.09886 1.63392 18.421 < 2e-16 *** hp -0.06823 0.01012 -6.742 1.79e-07 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 3.863 on 30 degrees of freedom Multiple R-squared: 0.6024, Adjusted R-squared: 0.5892 F-statistic: 45.46 on 1 and 30 DF, p-value: 1.788e-07 [1] 447.6743 Show in New Window Call: lm(formula = sales ~ x_1 * x_2, data = df) Residuals: Min 1Q Median 3Q Max -15.000 -3.125 -1.000 3.875 15.000 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 20.000 6.229 3.211 0.0184 * x_1City2 -11.500 9.850 -1.168 0.2873 x_2store2 15.000 9.850 1.523 0.1786 x_1City2:x_2store2 -17.500 13.929 -1.256 0.2557 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 10.79 on 6 degrees of freedom Multiple R-squared: 0.6282, Adjusted R-squared: 0.4422 F-statistic: 3.379 on 3 and 6 DF, p-value: 0.09539
Step 2: Calculate SSR
# calculate Sum of Squared Residuals (SSR)
sum(residuals(fit)^2)
Output:
[1] 698.5
Thus, the Sum of Squared Residuals (SSR) is 698.5. Thus, it is consistent with my other tutorial about ANOVA. Specifically, it is consistent with the residuals in the type 3 ANOVA.