Simulate Data for Poisson Regression in R

This tutorial shows how to simulate a dataset for Poisson regression in R.

Step 1: Determine the model

Suppose that the following is the model with known population parameters, namely known regression coefficients of 0.2 and 0.08. Of course, in reality, the most likely result is that we do not know such parameters and we need to estimate.

\[ Y = 0.2 + 0.2 \times M + 0.08 \times K \]

Step 2: Simulate Independent Variables (IVs) in the known model

We are going to randomly generate two normal distribution data of M and K. Note that, you can generate other type of distribution, for instance, binary data for M and/or K.

# set the size of the sample   
n=500

# set seed   
set.seed(123)

# generate M and X  
M<-rnorm(n,2,3)
K<- rnorm(n, 5, 4)

# print out first 6 M and X   
head(M)
head(K)

Step 3: Simulate dependent variable (DV) in the known model

Note that, Poisson regression uses log link, and thus we need to use log link to connect between IVs (or, X) and DV (Y). We are going to use rpois() to generate the data.

# log link being used 
mu_1 <- exp(0.2 + 0.2*M+0.08*K)
Y <- rpois(n, lambda=mu_1)

# combine them into a data frame and pint out first 6 rows  
data <- data.frame(M=M, K=K, Y=Y)
head(data)

The following is the output:

  > head(data)
          M         K Y
1 0.3185731  2.592429 0
2 1.3094675  1.025206 0
3 6.6761249  9.107140 6
4 2.2115252  8.004245 4
5 2.3878632 -1.036666 2
6 7.1451950  4.619410 8

Step 4: Use glm() to check if we simulate Poisson regression correctly

We can use glm() to see if the regression coefficients are close to those in the known model.

result_Poisson<-glm(Y~M+K, data = data, family = poisson(link = log))
result_Poisson

The following is the output. We can see that M is 0.20044 and K is 0.07496. Thus, they are very close to parameters shown in the known model in Step 1. That means that we correctly simulate data for Poisson regression in R.

> result_Poisson

Call:  glm(formula = Y ~ M + K, family = poisson(link = log), data = data)

Coefficients:
(Intercept)            M            K  
    0.24061      0.20044      0.07496  

Degrees of Freedom: 499 Total (i.e. Null);  497 Residual
Null Deviance:	    1307 
Residual Deviance: 545.3 	AIC: 1884

Further Reading

Leave a Comment