Poisson Regression in R

You can set family=poisson in the glm() function to do Poisson regression in R.

glm(model_statement, family = poisson, data = data_file_name)

Data Example

This tutorial will use a dataset for Poisson regression. The following shows the key variables in this dataset.

  • location = where the house is located
  • age = the age of the head of household
  • total = the number of people in the household other than the head
  • numLT5 = the number in the household under 5 years of age
  • roof = the type of roof in the household

We are going to see if age can predict the number of people in a household (i.e., total).

We first can read the data from GitHub.

data_HH <- read.csv("https://raw.githubusercontent.com/proback/BeyondMLR/master/data/fHH1.csv")

The following print the first few lines of the data frame that we read from Github.

> head(data_HH)
  X     location age total numLT5                          roof
1 1 CentralLuzon  65     0      0 Predominantly Strong Material
2 2  MetroManila  75     3      0 Predominantly Strong Material
3 3  DavaoRegion  54     4      0 Predominantly Strong Material
4 4      Visayas  49     3      0 Predominantly Strong Material
5 5  MetroManila  74     3      0 Predominantly Strong Material
6 6      Visayas  59     6      0 Predominantly Strong Material

R Code

The following is the key R code to do the Poisson regression.

result_1 = glm(total ~ age, family = poisson, data = data_HH)

Since the p-value for age is significant (p < 0.05) based on the output below, age is a significant predictor of household size.

> summary(result_1)

Call:
glm(formula = total ~ age, family = poisson, data = data_HH)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-2.9079  -0.9637  -0.2155   0.6092   4.9561  

Coefficients:
              Estimate Std. Error z value Pr(>|z|)    
(Intercept)  1.5499422  0.0502754  30.829  < 2e-16 ***
age         -0.0047059  0.0009363  -5.026 5.01e-07 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for poisson family taken to be 1)

    Null deviance: 2362.5  on 1499  degrees of freedom
Residual deviance: 2337.1  on 1498  degrees of freedom
AIC: 6714

Number of Fisher Scoring iterations: 5

Interpretation

Based on the output above, we can write the following Poisson regression equation. \( \hat{\lambda} \) is the mean of the household size.

\[ log (\hat{\lambda}) =b_0+b_1 Age =1.55 -0.0047 Age \]

We can do a simple math transformation and get the following.

\[ \frac{\lambda_{Age+1}}{\lambda_{Age}} =e^{\beta_1}=e^{-0.0047}=0.995 \]

We can further make some transformations and get the following.

\[ \lambda_{Age+1} =0.995 \lambda_{Age} \]

\[ \lambda_{Age+1} – \lambda_{Age}=0.995 \lambda_{Age}- \lambda_{Age}=-0.005 \lambda_{Age}\]

Thus, the difference in the household size by changing 1 unit of age is \( -0.005 \lambda_{Age} \).

Let’s use the change from 80-year-old old to 81-year-old as the example. The equation above means that, on average, the change in household size changing 80- to 81-year-old is -0.005*80 =-0.4. Note that, 80-year-old is referring to the age of the head of household.