This tutorial shows how to use sklearn to calculate SSR, which stands for Sum of Squared Residuals. SSR is also known as residual sum of squares (RSS) or sum of squared errors (SSE).
\[ SSR=\sum_{i=1}^{n} (\hat{y_i}-y_i)^2 \]
Steps of Using sklearn to Calculate SSR in Python
Step 1: Prepare data
We are going to use a built-in dataset called penguins data from seaborn.
import seaborn as sns
penguins = sns.load_dataset("penguins")
# dummy coding the column of sex
penguins['sex_dummy']=penguins['sex'].apply(lambda x: 1 if x=='Male' else 0)
# drop rows of nan
penguins = penguins.dropna()
# print out the final data
print(penguins)
Output:
species island bill_length_mm bill_depth_mm flipper_length_mm \ 0 Adelie Torgersen 39.1 18.7 181.0 1 Adelie Torgersen 39.5 17.4 186.0 2 Adelie Torgersen 40.3 18.0 195.0 4 Adelie Torgersen 36.7 19.3 193.0 5 Adelie Torgersen 39.3 20.6 190.0 .. ... ... ... ... ... 338 Gentoo Biscoe 47.2 13.7 214.0 340 Gentoo Biscoe 46.8 14.3 215.0 341 Gentoo Biscoe 50.4 15.7 222.0 342 Gentoo Biscoe 45.2 14.8 212.0 343 Gentoo Biscoe 49.9 16.1 213.0 body_mass_g sex sex_dummy 0 3750.0 Male 1 1 3800.0 Female 0 2 3250.0 Female 0 4 3450.0 Female 0 5 3650.0 Male 1 .. ... ... ... 338 4925.0 Female 0 340 4850.0 Female 0 341 5750.0 Male 1 342 5200.0 Female 0 343 5400.0 Male 1
Step 2: determine IVs and DV
IVs will be flipper length and sex, and DV will be penguins’ body weight.
Body Weight = b0+b1 flipper length+b2 sex
# select the columns of 'flipper_length_mm','sex' as the IVs
IVs = penguins[['flipper_length_mm','sex_dummy']]
# select the column of 'body_mass_g' as the DV
DV = penguins['body_mass_g']
Step 3: apply linearregression() from sklearn
# import packages
import pandas as pd
import numpy as np
from sklearn.linear_model import LinearRegression
lm = LinearRegression()
# save it to result
result = lm.fit(IVs, DV)
print("Result is as follows:")
print("Intercept:\n",result.intercept_)
print("Regression Coefficients:\n", result.coef_)
Output:
Result is as follows: Intercept: -5410.300224143296 Regression Coefficients: [ 46.98217525 347.85025373]
Step 4: calculate SSR
# Use the result to calculate the estimated Y values
Y_estimated = result.predict(IVs)
# combine observed and estimated Y into the same dataframe
df = pd.DataFrame({'Observed': DV, 'Estimated':Y_estimated})
# calculate SSR
print('SSR :\n ', np.sum(np.square(df['Estimated'] - df['Observed'])))
Output:
SSR : 41795373.64945871
Thus, the Sum of Squared Residuals (SSR) is 41795373.65.
Further Reading
- Calculate Sum of Squared Residuals (SSR) in R (R, Python)
- Calculate Mean Squared Residuals (MSR) in R (R, Python)
- Calculate Mean Squared Error (MSE) (R, Python)
- Difference between Mean Squared Residuals (MSR) and Mean Square Error (MSE)
- Difference between MSD and MSE
- How to Calculate Mean Squared Deviation in R