The inverse of the Hessian matrix from optim() can be used as the asymptotic covariance matrix for estimated parameters. The name “asymptotic” is due to the direct replace the σ2 in the variance of estimated parameters with the estimated one, namely \( \hat{\sigma}^2\). If the sample size is large enough, t-distribution will be very close to a normal distribution, and that is why it is called “asymptotic.”
Example of Hessian in R
The following use the R function of optim() to estimate the parameters, including β and σ2. As you can see, the output includes not only the estimated parameters, but also the Hessian matrix.
# input data for x and y
Teaching_Method <- c(0,0,1,0,1,1,1,1,0,0,0,1)
Test_Score <- c(90,90,100,70,90,95,75,75,60,65,65,70)
# Define the model using the dnorm() function
lm_MLE <- function(par, y, x1, x2){
beta_0 <- par[1]
beta_1 <- par[2]
sigma <- par[3]
-sum(dnorm(Test_Score, beta_0+beta_1*Teaching_Method, sigma, log = TRUE))
}
# use optim() for MLE in linear regression model
result_0<-optim(lm_MLE,par=c(beta_0 = 1, beta_1=1, sigma = 19),hessian=T )
# print out result, including Hessian
print(result_0)
> print(result_0) $par beta_0 beta_1 sigma 73.33608 10.82952 11.73821 $value [1] 46.58414 $counts function gradient 166 NA $convergence [1] 0 $message NULL $hessian beta_0 beta_1 sigma beta_0 8.709186e-02 4.354593e-02 -1.250555e-05 beta_1 4.354593e-02 4.354593e-02 7.894130e-06 sigma -1.250555e-05 7.894130e-06 1.743007e-01
Meaning of Hessian matrix from optim()
The following are the formulas for variances for \( \beta_0\) and \(\beta_1\), refer to my other tutorial.
\( \begin{aligned} Var (\hat{\beta}_1) &= \frac{\sigma^2 }{ \sum_{i=1}^n (x_i-\bar{x})^2 } \\ Var(\hat{\beta}_0) &= \frac{\sigma^2 \sum_{i=1}^n x_i^2}{ n \sum_{i=1}^n (x_i-\bar{x})^2 } \end{aligned}\)
To understand the meaning of the Hessian matrix, the following R code first calculates the inverse of the Hessian matrix. Then, it manually calculates the asymptotic variance. Basically, \( \sigma^2\) is directly replaced with the estimated one, namely \( \hat{\sigma}^2\).
As we can see, the numbers are exactly the same. That is, the asymptotic variance for \( \beta_0 \) is 22.96, whereas for \( \beta_1 \) is 45.93.
# square root of inversed Hessian matrix
solve(result_0$hessian)
# Manually calculate asymptotic variance for beta 0
(result_0$par["sigma"])^2*(sum((Teaching_Method)^2)/(sum((Teaching_Method-mean(Teaching_Method))^2)*12))
# Manually calculate asymptotic variance for beta 1
(result_0$par["sigma"])^2/sum((Teaching_Method-mean(Teaching_Method))^2)
> # square root of inversed Hessian matrix > solve(result_0$hessian) beta_0 beta_1 sigma beta_0 22.964258291 -22.964257842 0.002687675 beta_1 -22.964257842 45.928515550 -0.003727733 sigma 0.002687675 -0.003727733 5.737212614 # Manually calculate asymptotic variance for beta 0 22.96426 # Manually calculate asymptotic variance for beta 1 45.92852
You can even square the root of the asymptotic variance to get the asymptotic standard error (SE) for \( \beta_0 \) and \( \beta_1\). In the following, you can see the asymptotic SE for \( \beta_0\) is 4.79, whereas for \( \beta_1\) is 6.78.
# manually calculate asymptotic standard error for beta 0
sqrt((result_0$par["sigma"])^2*(sum((Teaching_Method)^2)/(sum((Teaching_Method-mean(Teaching_Method))^2)*12)))
# manually calculate asymptotic standard error for beta 1
sqrt((result_0$par["sigma"])^2/sum((Teaching_Method-mean(Teaching_Method))^2))
# manually calculate asymptotic standard error for beta 0 4.792104 # manually calculate asymptotic standard error for beta 1 6.777058
Reference and Further Reading
The following post provides a good discussion on the Hessian matrix and Fisher Information.
Further, you can refer to my other tutorials on the topic of maximum likelihood estimation in linear regression.