Warning:
- The correctness and quality of the R code presented in this tutorial are not guaranteed.
- This tutorial is inspired by a paper called “Accommodating binary and count variables in mediation: A case for conditional indirect effects“. However, the author of this tutorial is not one of the authors of that paper.
- The author of this tutorial cannot guarantee that the R code in this tutorial correctly reflects the idea of that paper. Please do read the paper by yourself and make your own judgment.
- Please do NOT cite this tutorial as a reference, if you are writing an academic paper. Further, the author of this tutorial does not provide any consulting service. But, you can leave a comment down below, and I will try my best to help.
Introduction
This tutorial shows how to do serial mediation for count data. Complete R code is provided.
The following are the theoretical models and conceptual framework, assuming IV, M1, M2 are close to normally distributed, and DV is the count data.
\( IV \sim N(4, 5) \)
\( M1 = 0.3+0.5 \times IV \)
\( M2 = 0.1+0.7 \times IV +0.5 \times M1 \)
\( log(E(DV | IV)) = 0.2+0.3 \times M1 +0.3 \times M2 + 0.01 \times IV \)

Step 1 (Optional): Data Simulation
The following is the code to generate the data based on the models stated above. Note that, if you already have data ready, you do not need this simulation step.
# set sample size n=1000 # set seed set.seed(123) # simulate x (normal distribution), namely IV X <- rnorm(n, 5, 4) # simulate residual for M_1 residual_1<-rnorm(n,0,1) # simulate M_1 M_1<-0.3+0.5*X+residual_1 # simulate residual for M_2 # Note that, residuals for M_1 and M_2 should be different. # Otherwise, there will be problem of singularities. residual_2<-rnorm(n,0,1.1) # simulate M_2 M_2<-0.1+0.7*X+0.5*M_1+residual_2 # mu for Poisson regression via a log link mu_1 <- exp(0.2 + 0.3*M_1+0.3*M_2+0.01*X) # use rpois to generate Y Y <- rpois(n, lambda=mu_1) # combine into a dataframe and print out the first 6 rows data <- data.frame(X=X, M_1=M_1,M_2=M_2, Y=Y)
Step 2: R code for serial mediation for count data
The following is the R code that can be used to calculate the indirect effect.
library(boot) Serial_Mediation_poisson<-function(data_used,i,x_predetermined=0) { # Sample a data data_temp=data_used[i,] # Deciding which X value to use if(x_predetermined==0){x_predetermined=mean(data_temp$X)} else if (x_predetermined==-1){x_predetermined=mean(data_temp$X)-sd(data_temp$X)} else(x_predetermined=mean(data_temp$X)+sd(data_temp$X)) # a path result_a<-lm(M_1~X, data = data_temp) a_0<-result_a$coefficients[1] a_1<-result_a$coefficients[2] # d path result_d<-lm(M_2~X+M_1, data = data_temp) d_0<-result_d$coefficients[1] d_1<-result_d$coefficients[2] d_2<-result_d$coefficients[3] # b path result_b<-glm(Y~M_1+M_2+X, data = data_temp, family = quasipoisson) b_0<-result_b$coefficients[1] b_1<-result_b$coefficients[2] b_2<-result_b$coefficients[3] c_1_apostrophe<-result_b$coefficients[4] #calculating the indirect effect M_1_estimated=a_0+a_1*x_predetermined M_2_estimated=d_0+d_1*x_predetermined+d_2*M_1_estimated indirect_effect<-a_1*d_2*b_2*exp(b_0+b_1*M_1_estimated+b_2*M_2_estimated+c_1_apostrophe*x_predetermined) return(indirect_effect) } # use boot() to do bootstrapping mediation analysis boot_mediation <- boot(data, Serial_Mediation_poisson, R=1000) boot_mediation # print out confidence intervals boot.ci(boot.out = boot_mediation, type = c("norm", "perc"))
The following is the output:
> boot_mediation ORDINARY NONPARAMETRIC BOOTSTRAP Call: boot(data = data, statistic = Serial_Mediation_poisson, R = 1000) Bootstrap Statistics : original bias t1* 1.1202 0.006848951 std. error t1* 0.1039129 > boot.ci(boot.out = boot_mediation, type = c("norm", "perc")) BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS Based on 1000 bootstrap replicates CALL : boot.ci(boot.out = boot_mediation, type = c("norm", "perc")) Intervals : Level Normal Percentile 95% ( 0.910, 1.317 ) ( 0.928, 1.341 ) Calculations and Intervals on Original Scale
Step 3 (Optional): Check the model
We can also check the theoretical value of the indirect effect using the following R code. The value is 1.028849. It is not exactly the same as the the value of 1.1202. This is due to three randomness in the data simulation, 2 in the residuals in normal distribution and 1 in the simulation of Poisson data.
x_mean=mean(data$X) M_1=0.3+0.5*x_mean M_2=0.1+0.7*x_mean+0.5*M_1 0.5*0.5*0.3*exp(0.2 + 0.3*M_1+0.3*M_2+0.01*x_mean)
Appendix: Raw Code in GitHub
You can click this link for complete raw R code stored in GitHub.