This tutorial shows how you calculate Sum of Squared Residuals in Python with detailed steps. Sum of Squared Residuals (SSR) is also known as residual sum of squares (RSS) or sum of squared errors (SSE).
The following is the formula to calculate SSR. SSR can be used compare our estimated values and observed values for regression models.
\[ SSR=\sum_{i=1}^{n} (\hat{y_i}-y_i)^2 \]
The following is the core syntax, and it uses statsmodels to estimate the regression coefficients. The output from statsmodels also has the parameter of SSR and thus we can directly print it out.
# import statsmodels.api as sm import statsmodels.api as sm # fit the proposed model model = sm.OLS(DV, IVs).fit() # print out sum of squared residuals (SSR) print(model.ssr)
4 steps to calculate Sum of Squared Residuals in Python using statsmodels
Step 1: Prepare Data
We are going to use the Penguins data from seaborn (for more information, see this post). Below, we load the data and print it out.
Note that, if your data is ready, you can basically skip step 1 and directly go to step 2.
# Built-in sample dataset of penguins in seaborn import seaborn as sns # load the penguins dataset from seaborn penguins = sns.load_dataset("penguins") # dummy coding the column of sex penguins['sex_dummy']=penguins['sex'].apply(lambda x: 1 if x=='Male' else 0) # drop rows of nan penguins = penguins.dropna() # print out the updated dataframe of penguins print(penguins)
Output:
species island bill_length_mm bill_depth_mm flipper_length_mm \ 0 Adelie Torgersen 39.1 18.7 181.0 1 Adelie Torgersen 39.5 17.4 186.0 2 Adelie Torgersen 40.3 18.0 195.0 4 Adelie Torgersen 36.7 19.3 193.0 5 Adelie Torgersen 39.3 20.6 190.0 .. ... ... ... ... ... 338 Gentoo Biscoe 47.2 13.7 214.0 340 Gentoo Biscoe 46.8 14.3 215.0 341 Gentoo Biscoe 50.4 15.7 222.0 342 Gentoo Biscoe 45.2 14.8 212.0 343 Gentoo Biscoe 49.9 16.1 213.0 body_mass_g sex sex_dummy 0 3750.0 Male 1 1 3800.0 Female 0 2 3250.0 Female 0 4 3450.0 Female 0 5 3650.0 Male 1 .. ... ... ... 338 4925.0 Female 0 340 4850.0 Female 0 341 5750.0 Male 1 342 5200.0 Female 0 343 5400.0 Male 1 [333 rows x 8 columns]
Step 2: select the IVs and DV
The following is the linear model, in which we use flipper length and sex as IVs to predict penguins’ body weight (DV).
Body Weight = b0+b1 flipper length+b2 sex
The second step to calculate Sum of Squared Residuals in Python is to select IVs and DV. Below is the Python code to select related columns as IVs and DV.
# select the columns of 'flipper_length_mm','sex' as the IVs IVs = penguins[['flipper_length_mm','sex_dummy']] # adding a constant column to IVs IVs = sm.add_constant(IVs) # select the column of 'body_mass_g' as the DV DV = penguins['body_mass_g']
Step 3: fit the model
The third step to calculate Sum of Squared Residuals in Python is to fit the model using statsmodels. The following code fits a model and print out the model summary.
# import statsmodels.api import statsmodels.api as sm # fit the proposed model model = sm.OLS(DV, IVs).fit() # print out the model summary print(model.summary())
Output:
OLS Regression Results ============================================================================== Dep. Variable: body_mass_g R-squared: 0.806 Model: OLS Adj. R-squared: 0.805 Method: Least Squares F-statistic: 684.8 Date: Thu, 09 Jun 2022 Prob (F-statistic): 3.53e-118 Time: 17:36:24 Log-Likelihood: -2427.2 No. Observations: 333 AIC: 4860. Df Residuals: 330 BIC: 4872. Df Model: 2 Covariance Type: nonrobust ===================================================================================== coef std err t P>|t| [0.025 0.975] ------------------------------------------------------------------------------------- const -5410.3002 285.798 -18.931 0.000 -5972.515 -4848.085 flipper_length_mm 46.9822 1.441 32.598 0.000 44.147 49.817 sex_dummy 347.8503 40.342 8.623 0.000 268.491 427.209 ============================================================================== Omnibus: 0.262 Durbin-Watson: 1.701 Prob(Omnibus): 0.877 Jarque-Bera (JB): 0.376 Skew: 0.051 Prob(JB): 0.829 Kurtosis: 2.870 Cond. No. 2.95e+03 ============================================================================== Notes: [1] Standard Errors assume that the covariance matrix of the errors is correctly specified. [2] The condition number is large, 2.95e+03. This might indicate that there are strong multicollinearity or other numerical problems.
Step 4: print out the SSR
The last step is to print out the SSR in model.
# print out sum of squared residuals (SSR)
print(model.ssr)
Output:
41795373.64945871
Based on the output above, we can conclude that SSR is 41795373.65.
Conclusion
This tutorial shows 4 steps to calculate Sum of Squared Residuals in Python using statsmodels. If your data is ready, you can basically skip step 1 and directly do steps 2 to 4.
Further Reading
- Difference between Mean Squared Residuals (MSR) and Mean Square Error (MSE)
- Difference between MSD and MSE
- Calculate Sum of Squared Residuals (SSR) in R
- Calculate Mean Squared Residuals (MSR) in R (R, Python)
- Calculate Mean Squared Error (MSE) (R, Python)
- How to Calculate Mean Squared Deviation in R