Meaning of Hessian Matrix from optim() in R

The inverse of the Hessian matrix from optim() can be used as the asymptotic covariance matrix for estimated parameters. The name “asymptotic” is due to the direct replace the σ2 in the variance of estimated parameters with the estimated one, namely \( \hat{\sigma}^2\). If the sample size is large enough, t-distribution will be very close to a normal distribution, and that is why it is called “asymptotic.”

Example of Hessian in R

The following use the R function of optim() to estimate the parameters, including β and σ2. As you can see, the output includes not only the estimated parameters, but also the Hessian matrix.

# input data for x and y
Teaching_Method <- c(0,0,1,0,1,1,1,1,0,0,0,1)
Test_Score <- c(90,90,100,70,90,95,75,75,60,65,65,70)

# Define the model using the dnorm() function
lm_MLE <- function(par, y, x1, x2){
  
  beta_0 <- par[1]
  beta_1 <- par[2]
  sigma <- par[3]
  
  -sum(dnorm(Test_Score, beta_0+beta_1*Teaching_Method, sigma, log = TRUE))
}

# use optim() for MLE in linear regression model
result_0<-optim(lm_MLE,par=c(beta_0 = 1, beta_1=1, sigma = 19),hessian=T )

# print out result, including Hessian 
print(result_0)
> print(result_0)
$par
  beta_0   beta_1    sigma 
73.33608 10.82952 11.73821 

$value
[1] 46.58414

$counts
function gradient 
     166       NA 

$convergence
[1] 0

$message
NULL

$hessian
              beta_0       beta_1         sigma
beta_0  8.709186e-02 4.354593e-02 -1.250555e-05
beta_1  4.354593e-02 4.354593e-02  7.894130e-06
sigma  -1.250555e-05 7.894130e-06  1.743007e-01

Meaning of Hessian matrix from optim()

The following are the formulas for variances for \( \beta_0\) and \(\beta_1\), refer to my other tutorial.

\( \begin{aligned} Var (\hat{\beta}_1) &= \frac{\sigma^2 }{ \sum_{i=1}^n (x_i-\bar{x})^2 } \\ Var(\hat{\beta}_0) &= \frac{\sigma^2 \sum_{i=1}^n x_i^2}{ n \sum_{i=1}^n (x_i-\bar{x})^2 } \end{aligned}\)

To understand the meaning of the Hessian matrix, the following R code first calculates the inverse of the Hessian matrix. Then, it manually calculates the asymptotic variance. Basically, \( \sigma^2\) is directly replaced with the estimated one, namely \( \hat{\sigma}^2\).

As we can see, the numbers are exactly the same. That is, the asymptotic variance for \( \beta_0 \) is 22.96, whereas for \( \beta_1 \) is 45.93.

# square root of inversed Hessian matrix
solve(result_0$hessian)

# Manually calculate asymptotic variance for beta 0
(result_0$par["sigma"])^2*(sum((Teaching_Method)^2)/(sum((Teaching_Method-mean(Teaching_Method))^2)*12))

# Manually calculate asymptotic variance for beta 1
(result_0$par["sigma"])^2/sum((Teaching_Method-mean(Teaching_Method))^2)
> # square root of inversed Hessian matrix
> solve(result_0$hessian)
              beta_0        beta_1        sigma
beta_0  22.964258291 -22.964257842  0.002687675
beta_1 -22.964257842  45.928515550 -0.003727733
sigma    0.002687675  -0.003727733  5.737212614

# Manually calculate asymptotic variance for beta 0
22.96426 

# Manually calculate asymptotic variance for beta 1
45.92852 
Note: To be honest, I do not how to formula to manually calculate the SE for the \( \hat{\sigma}^2 \). In case you have any ideas, please feel free to comment down below.

You can even square the root of the asymptotic variance to get the asymptotic standard error (SE) for \( \beta_0 \) and \( \beta_1\). In the following, you can see the asymptotic SE for \( \beta_0\) is 4.79, whereas for \( \beta_1\) is 6.78.

# manually calculate asymptotic standard error for beta 0
sqrt((result_0$par["sigma"])^2*(sum((Teaching_Method)^2)/(sum((Teaching_Method-mean(Teaching_Method))^2)*12)))

# manually calculate asymptotic standard error for beta 1
sqrt((result_0$par["sigma"])^2/sum((Teaching_Method-mean(Teaching_Method))^2))
# manually calculate asymptotic standard error for beta 0
4.792104 

# manually calculate asymptotic standard error for beta 1
6.777058 

Reference and Further Reading

The following post provides a good discussion on the Hessian matrix and Fisher Information.

Further, you can refer to my other tutorials on the topic of maximum likelihood estimation in linear regression.

Leave a Comment