Calculate Sum of Squared Residuals (SSR) in R

Introduction

Sum of Squared Residuals SSR is also known as residual sum of squares (RSS) or sum of squared errors (SSE). The following is the formula.

\[ SSR=\sum_{i=1}^{n} (\hat{y_i}-y_i)^2 \]

SSR can be used compare our estimated values and observed values for regression models. R can be used to calculate SSR, and the following is the core R syntax.

sum(residuals(fit)^2)

The following are 2 examples showing how to calculate SSR for linear regression models in R.

Example 1: Use data of mtcars

Step 1: calculate model fit

mtcarts is a built-in sample dataset in R. We can have a linear regression model of mpg as the DV and hp as the IV. We can use lm() to estimate the regression coefficients.

# use lm() to estimate regression coefficinets
fit <- lm(mpg~hp, data=mtcars)

# print the fit
summary(fit)

Output:

Call:
lm(formula = mpg ~ hp, data = mtcars)

Residuals:
    Min      1Q  Median      3Q     Max 
-5.7121 -2.1122 -0.8854  1.5819  8.2360 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 30.09886    1.63392  18.421  < 2e-16 ***
hp          -0.06823    0.01012  -6.742 1.79e-07 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 3.863 on 30 degrees of freedom
Multiple R-squared:  0.6024,	Adjusted R-squared:  0.5892 
F-statistic: 45.46 on 1 and 30 DF,  p-value: 1.788e-07

Step 2: Calculate SSR

After getting the fit, we use sum(residuals(fit)^2) to calculate SSR.

# calculate Sum of Squared Residuals (SSR)
sum(residuals(fit)^2)

Output:

[1] 447.6743

Thus, the Sum of Squared Residuals (SSR) is 447.67.

Example 2: Hypothetical data

Step 1: calculate model fit

The following hypothetical data has cities and stores as the IVs and sales as the DV. We write them in a linear model in lm() to estimate the regression coefficients.

After getting the fit, we use the sum(residuals(fit)^2) to calculate SSR.

x_1 = rep(c('City1','City2'),each=5)
x_2 = rep(c('store1','store2'), 5)
sales=c(10,20,20,50,30,10,5,4,12,4)

df <- data.frame (cities  = x_1,
                  stores = x_2,
                  sales=sales)

# use lm() to estimate regression coefficinets
fit <- lm(sales~x_1*x_2, data=df)

# print the fit
summary(fit)

Output:

Show in New Window

Call:
lm(formula = mpg ~ hp, data = mtcars)

Residuals:
    Min      1Q  Median      3Q     Max 
-5.7121 -2.1122 -0.8854  1.5819  8.2360 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 30.09886    1.63392  18.421  < 2e-16 ***
hp          -0.06823    0.01012  -6.742 1.79e-07 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 3.863 on 30 degrees of freedom
Multiple R-squared:  0.6024,	Adjusted R-squared:  0.5892 
F-statistic: 45.46 on 1 and 30 DF,  p-value: 1.788e-07

[1] 447.6743
Show in New Window

Call:
lm(formula = sales ~ x_1 * x_2, data = df)

Residuals:
    Min      1Q  Median      3Q     Max 
-15.000  -3.125  -1.000   3.875  15.000 

Coefficients:
                   Estimate Std. Error t value Pr(>|t|)  
(Intercept)          20.000      6.229   3.211   0.0184 *
x_1City2            -11.500      9.850  -1.168   0.2873  
x_2store2            15.000      9.850   1.523   0.1786  
x_1City2:x_2store2  -17.500     13.929  -1.256   0.2557  
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 10.79 on 6 degrees of freedom
Multiple R-squared:  0.6282,	Adjusted R-squared:  0.4422 
F-statistic: 3.379 on 3 and 6 DF,  p-value: 0.09539

Step 2: Calculate SSR

# calculate Sum of Squared Residuals (SSR)
sum(residuals(fit)^2)

Output:

[1] 698.5

Thus, the Sum of Squared Residuals (SSR) is 698.5. Thus, it is consistent with my other tutorial about ANOVA. Specifically, it is consistent with the residuals in the type 3 ANOVA.


Further Reading