How to Calculate Sum of Squared Residuals in Python

This tutorial shows how you calculate Sum of Squared Residuals in Python with detailed steps. Sum of Squared Residuals (SSR) is also known as residual sum of squares (RSS) or sum of squared errors (SSE).

The following is the formula to calculate SSR. SSR can be used compare our estimated values and observed values for regression models.

\[ SSR=\sum_{i=1}^{n} (\hat{y_i}-y_i)^2 \]

The following is the core syntax, and it uses statsmodels to estimate the regression coefficients. The output from statsmodels also has the parameter of SSR and thus we can directly print it out.

# import statsmodels.api as sm
import statsmodels.api as sm

# fit the proposed model
model = sm.OLS(DV, IVs).fit()

# print out sum of squared residuals (SSR)
print(model.ssr)

4 steps to calculate Sum of Squared Residuals in Python using statsmodels

Step 1: Prepare Data

We are going to use the Penguins data from seaborn (for more information, see this post). Below, we load the data and print it out.

Note that, if your data is ready, you can basically skip step 1 and directly go to step 2.

# Built-in sample dataset of penguins in seaborn
import seaborn as sns

# load the penguins dataset from seaborn
penguins = sns.load_dataset("penguins")

# dummy coding the column of sex
penguins['sex_dummy']=penguins['sex'].apply(lambda x: 1 if x=='Male' else 0)

# drop rows of nan
penguins = penguins.dropna()

# print out the updated dataframe of penguins
print(penguins)

Output:

    species     island  bill_length_mm  bill_depth_mm  flipper_length_mm  \
0    Adelie  Torgersen            39.1           18.7              181.0   
1    Adelie  Torgersen            39.5           17.4              186.0   
2    Adelie  Torgersen            40.3           18.0              195.0   
4    Adelie  Torgersen            36.7           19.3              193.0   
5    Adelie  Torgersen            39.3           20.6              190.0   
..      ...        ...             ...            ...                ...   
338  Gentoo     Biscoe            47.2           13.7              214.0   
340  Gentoo     Biscoe            46.8           14.3              215.0   
341  Gentoo     Biscoe            50.4           15.7              222.0   
342  Gentoo     Biscoe            45.2           14.8              212.0   
343  Gentoo     Biscoe            49.9           16.1              213.0   

     body_mass_g     sex  sex_dummy  
0         3750.0    Male          1  
1         3800.0  Female          0  
2         3250.0  Female          0  
4         3450.0  Female          0  
5         3650.0    Male          1  
..           ...     ...        ...  
338       4925.0  Female          0  
340       4850.0  Female          0  
341       5750.0    Male          1  
342       5200.0  Female          0  
343       5400.0    Male          1  

[333 rows x 8 columns]

Step 2: select the IVs and DV

The following is the linear model, in which we use flipper length and sex as IVs to predict penguins’ body weight (DV).

Body Weight = b0+b1 flipper length+b2 sex

The second step to calculate Sum of Squared Residuals in Python is to select IVs and DV. Below is the Python code to select related columns as IVs and DV.

# select the columns of 'flipper_length_mm','sex' as the IVs
IVs = penguins[['flipper_length_mm','sex_dummy']]

# adding a constant column to IVs
IVs = sm.add_constant(IVs)

# select the column of 'body_mass_g' as the DV
DV = penguins['body_mass_g']

Step 3: fit the model

The third step to calculate Sum of Squared Residuals in Python is to fit the model using statsmodels. The following code fits a model and print out the model summary.

# import statsmodels.api
import statsmodels.api as sm

# fit the proposed model
model = sm.OLS(DV, IVs).fit()

# print out the model summary  
print(model.summary())

Output:

                            OLS Regression Results                            
==============================================================================
Dep. Variable:            body_mass_g   R-squared:                       0.806
Model:                            OLS   Adj. R-squared:                  0.805
Method:                 Least Squares   F-statistic:                     684.8
Date:                Thu, 09 Jun 2022   Prob (F-statistic):          3.53e-118
Time:                        17:36:24   Log-Likelihood:                -2427.2
No. Observations:                 333   AIC:                             4860.
Df Residuals:                     330   BIC:                             4872.
Df Model:                           2                                         
Covariance Type:            nonrobust                                         
=====================================================================================
                        coef    std err          t      P>|t|      [0.025      0.975]
-------------------------------------------------------------------------------------
const             -5410.3002    285.798    -18.931      0.000   -5972.515   -4848.085
flipper_length_mm    46.9822      1.441     32.598      0.000      44.147      49.817
sex_dummy           347.8503     40.342      8.623      0.000     268.491     427.209
==============================================================================
Omnibus:                        0.262   Durbin-Watson:                   1.701
Prob(Omnibus):                  0.877   Jarque-Bera (JB):                0.376
Skew:                           0.051   Prob(JB):                        0.829
Kurtosis:                       2.870   Cond. No.                     2.95e+03
==============================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The condition number is large, 2.95e+03. This might indicate that there are
strong multicollinearity or other numerical problems.

Step 4: print out the SSR

The last step is to print out the SSR in model.

# print out sum of squared residuals (SSR)
print(model.ssr)

Output:

41795373.64945871

Based on the output above, we can conclude that SSR is 41795373.65.

Conclusion

This tutorial shows 4 steps to calculate Sum of Squared Residuals in Python using statsmodels. If your data is ready, you can basically skip step 1 and directly do steps 2 to 4.


Further Reading