# Meaning of Hessian Matrix from optim() in R

The inverse of the Hessian matrix from optim() can be used as the asymptotic covariance matrix for estimated parameters. The name “asymptotic” is due to the direct replace the σ2 in the variance of estimated parameters with the estimated one, namely $$\hat{\sigma}^2$$. If the sample size is large enough, t-distribution will be very close to a normal distribution, and that is why it is called “asymptotic.”

## Example of Hessian in R

The following use the R function of optim() to estimate the parameters, including β and σ2. As you can see, the output includes not only the estimated parameters, but also the Hessian matrix.

# input data for x and y
Teaching_Method <- c(0,0,1,0,1,1,1,1,0,0,0,1)
Test_Score <- c(90,90,100,70,90,95,75,75,60,65,65,70)

# Define the model using the dnorm() function
lm_MLE <- function(par, y, x1, x2){

beta_0 <- par[1]
beta_1 <- par[2]
sigma <- par[3]

-sum(dnorm(Test_Score, beta_0+beta_1*Teaching_Method, sigma, log = TRUE))
}

# use optim() for MLE in linear regression model
result_0<-optim(lm_MLE,par=c(beta_0 = 1, beta_1=1, sigma = 19),hessian=T )

# print out result, including Hessian
print(result_0)
> print(result_0)
$par beta_0 beta_1 sigma 73.33608 10.82952 11.73821$value
[1] 46.58414

$counts function gradient 166 NA$convergence
[1] 0

$message NULL$hessian
beta_0       beta_1         sigma
beta_0  8.709186e-02 4.354593e-02 -1.250555e-05
beta_1  4.354593e-02 4.354593e-02  7.894130e-06
sigma  -1.250555e-05 7.894130e-06  1.743007e-01


## Meaning of Hessian matrix from optim()

The following are the formulas for variances for $$\beta_0$$ and $$\beta_1$$, refer to my other tutorial.

\begin{aligned} Var (\hat{\beta}_1) &= \frac{\sigma^2 }{ \sum_{i=1}^n (x_i-\bar{x})^2 } \\ Var(\hat{\beta}_0) &= \frac{\sigma^2 \sum_{i=1}^n x_i^2}{ n \sum_{i=1}^n (x_i-\bar{x})^2 } \end{aligned}

To understand the meaning of the Hessian matrix, the following R code first calculates the inverse of the Hessian matrix. Then, it manually calculates the asymptotic variance. Basically, $$\sigma^2$$ is directly replaced with the estimated one, namely $$\hat{\sigma}^2$$.

As we can see, the numbers are exactly the same. That is, the asymptotic variance for $$\beta_0$$ is 22.96, whereas for $$\beta_1$$ is 45.93.

# square root of inversed Hessian matrix
solve(result_0$hessian) # Manually calculate asymptotic variance for beta 0 (result_0$par["sigma"])^2*(sum((Teaching_Method)^2)/(sum((Teaching_Method-mean(Teaching_Method))^2)*12))

# Manually calculate asymptotic variance for beta 1
(result_0$par["sigma"])^2/sum((Teaching_Method-mean(Teaching_Method))^2) > # square root of inversed Hessian matrix > solve(result_0$hessian)
beta_0        beta_1        sigma
beta_0  22.964258291 -22.964257842  0.002687675
beta_1 -22.964257842  45.928515550 -0.003727733
sigma    0.002687675  -0.003727733  5.737212614

# Manually calculate asymptotic variance for beta 0
22.96426

# Manually calculate asymptotic variance for beta 1
45.92852 
Note: To be honest, I do not how to formula to manually calculate the SE for the $$\hat{\sigma}^2$$. In case you have any ideas, please feel free to comment down below.

You can even square the root of the asymptotic variance to get the asymptotic standard error (SE) for $$\beta_0$$ and $$\beta_1$$. In the following, you can see the asymptotic SE for $$\beta_0$$ is 4.79, whereas for $$\beta_1$$ is 6.78.

# manually calculate asymptotic standard error for beta 0
sqrt((result_0$par["sigma"])^2*(sum((Teaching_Method)^2)/(sum((Teaching_Method-mean(Teaching_Method))^2)*12))) # manually calculate asymptotic standard error for beta 1 sqrt((result_0$par["sigma"])^2/sum((Teaching_Method-mean(Teaching_Method))^2))
# manually calculate asymptotic standard error for beta 0
4.792104

# manually calculate asymptotic standard error for beta 1
6.777058