To use maximum likelihood estimation (MLE) in linear regression in R, you can use either optim() or mle() function. The following provides detailed R code examples.
Data and Model
Suppose we want to test the effect of teaching method (new vs. old) on students’ test scores. The following is the data. In the R code, 0 represents ‘old’ and 1 represents ‘new’ for Teaching_method.
Student ID | Teaching Method | Test Score |
---|---|---|
1 | Old | 90 |
2 | Old | 90 |
3 | New | 100 |
4 | Old | 70 |
5 | New | 90 |
6 | New | 95 |
7 | New | 75 |
8 | New | 75 |
9 | Old | 60 |
10 | Old | 65 |
11 | Old | 65 |
12 | New | 70 |
The following is the model statement.
Test Score = β0 + β1 * Teaching Method + ϵ
where:
- β0 and β1 are the coefficients representing the intercept and the effect of the teaching method, respectively.
- ϵ is the error term.
Note that, we typically assume ε to follow a normal distribution with mean 0, namely N(0, σ2). We can extend this as follows.
ϵ ∼ N(0, σ2)
Based on that, we can get the following.
Test Score ∼ N(β0 + β1 * Teaching Method, σ2)
Method 1: optim()
In method 1, We first define a function to calculate negative log likelihood. Then, we use the optim() function to optimize it by minimizing the negative log likelihood.
# input data for x and y
Teaching_Method <- c(0,0,1,0,1,1,1,1,0,0,0,1)
Test_Score <- c(90,90,100,70,90,95,75,75,60,65,65,70)
# Define the model using the dnorm() function
lm_MLE <- function(par, y, x1, x2){
beta_0 <- par[1]
beta_1 <- par[2]
sigma <- par[3]
-sum(dnorm(Test_Score, beta_0+beta_1*Teaching_Method, sigma, log = TRUE))
}
# use optim() for MLE in linear regression model
result_0<-optim(lm_MLE,par=c(beta_0 = 1, beta_1=1, sigma = 19) )
# print out the estimated parameters
result_0$par
# value output from optim()
result_0$value
# Explain the meaning of "value" from optim() output
-sum(dnorm(Test_Score, result_0$par["beta_0"]+result_0$par["beta_1"]*Teaching_Method, result_0$par["sigma"], log = TRUE))
# print out if it is converged: 0 indicates successful completion
result_0$convergence
The following is the output, including the estimated parameters, calculated value, and convergence status. For the value from optim(), it is the final value defined in the function(). Convergence from optim() indicates if the iteration is converged, and 0 indicates successful completion.
# print out the estimated parameters > result_0$par beta_0 beta_1 sigma 73.33608 10.82952 11.73821 # value output from optim() > result_0$value [1] 46.58414 # Explain the meaning of "value" from optim() output > -sum(dnorm(Test_Score, result_0$par["beta_0"]+result_0$par["beta_1"]*Teaching_Method, result_0$par["sigma"], log = TRUE)) [1] 46.58414 # print out if it is converged: 0 indicates successful completion > result_0$convergence [1] 0
Method 2: mle()
We can also use mle() to estimate regression coefficients and residual variance. The R code includes 2 parts.
First, define a function using function()
. Second, use mle()
function. We need to manually calculate the square of sigma to get the estimated variance.
# input data for x and y
Teaching_Method <- c(0,0,1,0,1,1,1,1,0,0,0,1)
Test_Score <- c(90,90,100,70,90,95,75,75,60,65,65,70)
# Define the model using the dnorm() function
LL <- function(beta_0, beta_1, sigma){
-sum(dnorm(Test_Score,(beta_0+beta_1*Teaching_Method), sigma, log = TRUE))
}
# use mle() function to calculate regression coefficients
library(stats4)
result_1 <- mle(LL, start = list(beta_0 = 1, beta_1=1, sigma = 19))
summary(result_1)
# calculate sigma squared
(result_1@coef[3])^2
We can see that for linear regression, optim() and mle() produce similar results. In particular, When using optim() (i.e., in method 1), the estimated parameters are β0 (73.34), β1(10.83), and σ (11.73). When using mle(), the estimated parameters are β0 (73.43), β1(10.30), and σ (11.93), see below.
> summary(result_1) Maximum likelihood estimation Call: mle(minuslogl = LL, start = list(beta_0 = 1, beta_1 = 1, sigma = 19)) Coefficients: Estimate Std. Error beta_0 73.42516 4.871012 beta_1 10.30336 6.892004 sigma 11.93112 2.495832 -2 log L: 93.18286 > # calculate sigma squared > (result_1@coef[3])^2 sigma 142.3517
Further Reading
If you have questions regarding how to derive the normal distribution for Y in MLE, you can refer my following 2 tutorials.
Further, you can also refer to the following 2 pages. The firstone is about optim() function and the second one provides a tutorial for it.