MSE stands for Mean Squared Error, and can be used to compare our estimated values and observed values in a model. The following is the formula of MSE.
\[ MSE=\frac{SSR}{n-p-1}=\frac{\sum_{i=1}^{n} (\hat{y_i}-y_i)^2 }{n-p-1}\]
How to Calculate MSE in R
R can be used to calculate Mean Squared Error (MSE). The following is the core syntax, which calculates the ratio of sum of the squared residuals and the degree of freedom in residuals.
sum(residuals(fit)^2)/fit$df.residual
The following are 2 examples showing how to calculate MSE for linear regression models in R.
Example 1: Use data of mtcars
mtcarts is a built-in sample dataset in R. We can have a linear regression model of mpg as the DV
and hp as the IV
. We can use lm()
to estimate the regression coefficients.
After getting the fit
, we use the sum(residuals(fit)^2)/fit$df.residual
to calculate MSE.
# use lm() to estimate regression coefficinets
fit <- lm(mpg~hp, data=mtcars)
# calculate Mean Squared Error (MSE)
sum(residuals(fit)^2)/fit$df.residual
Output:
[1] 14.92248
Thus, the Mean Squared Error (MSE) for the regression model is 14.92.
Example 2: Hypothetical data
The following hypothetical data has cities and stores as the IVs
and sales as the DV
. We write them in a linear model in lm()
to estimate the regression coefficients.
After getting the fit
, we use the sum(residuals(fit)^2)/fit$df.residual
to calculate MSE.
x_1 = rep(c('City1','City2'),each=5)
x_2 = rep(c('store1','store2'), 5)
sales=c(10,20,20,50,30,10,5,4,12,4)
df <- data.frame (cities = x_1,
stores = x_2,
sales=sales)
# use lm() to estimate regression coefficinets
fit <- lm(sales~x_1*x_2, data=df)
# calculate Mean Squared Error (MSE)
sum(residuals(fit)^2)/fit$df.residual
Output:
[1]116.4167
Thus, the Mean Squared Error (MSE) for the regression model is 116.42.
MSE denominator: n vs. n-p-1
Note that some people define MSE using n
rather than n-p-1
in the denominator. To better understand the nuanced difference, please refer to my other post on this topic (link below).
In that post, I also explain the difference and connection between MSE (Mean Square Error) and MSR (Mean Squared Residuals). You might find it useful as well.
I also have a post showing how to calculate MSE in Python (link below), in which I show how to calculate both biased MSE and unbiased MSE using Python.
Reference
- Mean squared error and the residual sum of squares function (Stack Exchange)
- R – Confused on Residual Terminology (Stack Exchange)