The inverse of the Hessian matrix from optim() can be used as the asymptotic covariance matrix for estimated parameters. The name “asymptotic” is due to the direct replace the σ^{2} in the variance of estimated parameters with the estimated one, namely \( \hat{\sigma}^2\). If the sample size is large enough, t-distribution will be very close to a normal distribution, and that is why it is called “asymptotic.”

## Example of Hessian in R

The following use the R function of optim() to estimate the parameters, including β and σ^{2}. As you can see, the output includes not only the estimated parameters, but also the Hessian matrix.

```
# input data for x and y
Teaching_Method <- c(0,0,1,0,1,1,1,1,0,0,0,1)
Test_Score <- c(90,90,100,70,90,95,75,75,60,65,65,70)
# Define the model using the dnorm() function
lm_MLE <- function(par, y, x1, x2){
beta_0 <- par[1]
beta_1 <- par[2]
sigma <- par[3]
-sum(dnorm(Test_Score, beta_0+beta_1*Teaching_Method, sigma, log = TRUE))
}
# use optim() for MLE in linear regression model
result_0<-optim(lm_MLE,par=c(beta_0 = 1, beta_1=1, sigma = 19),hessian=T )
# print out result, including Hessian
print(result_0)
```

> print(result_0) $par beta_0 beta_1 sigma 73.33608 10.82952 11.73821 $value [1] 46.58414 $counts function gradient 166 NA $convergence [1] 0 $message NULL $hessian beta_0 beta_1 sigma beta_0 8.709186e-02 4.354593e-02 -1.250555e-05 beta_1 4.354593e-02 4.354593e-02 7.894130e-06 sigma -1.250555e-05 7.894130e-06 1.743007e-01

## Meaning of Hessian matrix from optim()

The following are the formulas for variances for \( \beta_0\) and \(\beta_1\), refer to my other tutorial.

\( \begin{aligned} Var (\hat{\beta}_1) &= \frac{\sigma^2 }{ \sum_{i=1}^n (x_i-\bar{x})^2 } \\ Var(\hat{\beta}_0) &= \frac{\sigma^2 \sum_{i=1}^n x_i^2}{ n \sum_{i=1}^n (x_i-\bar{x})^2 } \end{aligned}\)

To understand the meaning of the Hessian matrix, the following R code first calculates the inverse of the Hessian matrix. Then, it manually calculates the asymptotic variance. Basically, \( \sigma^2\) is directly replaced with the estimated one, namely \( \hat{\sigma}^2\).

As we can see, the numbers are exactly the same. That is, the asymptotic variance for \( \beta_0 \) is 22.96, whereas for \( \beta_1 \) is 45.93.

```
# square root of inversed Hessian matrix
solve(result_0$hessian)
# Manually calculate asymptotic variance for beta 0
(result_0$par["sigma"])^2*(sum((Teaching_Method)^2)/(sum((Teaching_Method-mean(Teaching_Method))^2)*12))
# Manually calculate asymptotic variance for beta 1
(result_0$par["sigma"])^2/sum((Teaching_Method-mean(Teaching_Method))^2)
```

> # square root of inversed Hessian matrix > solve(result_0$hessian) beta_0 beta_1 sigma beta_0 22.964258291 -22.964257842 0.002687675 beta_1 -22.964257842 45.928515550 -0.003727733 sigma 0.002687675 -0.003727733 5.737212614 # Manually calculate asymptotic variance for beta 0 22.96426 # Manually calculate asymptotic variance for beta 1 45.92852

You can even square the root of the asymptotic variance to get the asymptotic standard error (SE) for \( \beta_0 \) and \( \beta_1\). In the following, you can see the asymptotic SE for \( \beta_0\) is 4.79, whereas for \( \beta_1\) is 6.78.

```
# manually calculate asymptotic standard error for beta 0
sqrt((result_0$par["sigma"])^2*(sum((Teaching_Method)^2)/(sum((Teaching_Method-mean(Teaching_Method))^2)*12)))
# manually calculate asymptotic standard error for beta 1
sqrt((result_0$par["sigma"])^2/sum((Teaching_Method-mean(Teaching_Method))^2))
```

# manually calculate asymptotic standard error for beta 0 4.792104 # manually calculate asymptotic standard error for beta 1 6.777058

## Reference and Further Reading

The following post provides a good discussion on the Hessian matrix and Fisher Information.

Further, you can refer to my other tutorials on the topic of maximum likelihood estimation in linear regression.