You can set family=poisson
in the glm()
function to do Poisson regression in R.
glm(model_statement, family = poisson, data = data_file_name)
Data Example
This tutorial will use a dataset for Poisson regression. The following shows the key variables in this dataset.
location
= where the house is locatedage
= the age of the head of householdtotal
= the number of people in the household other than the headnumLT5
= the number in the household under 5 years of ageroof
= the type of roof in the household
We are going to see if age can predict the number of people in a household (i.e., total).
We first can read the data from GitHub.
data_HH <- read.csv("https://raw.githubusercontent.com/proback/BeyondMLR/master/data/fHH1.csv")
The following print the first few lines of the data frame that we read from Github.
> head(data_HH) X location age total numLT5 roof 1 1 CentralLuzon 65 0 0 Predominantly Strong Material 2 2 MetroManila 75 3 0 Predominantly Strong Material 3 3 DavaoRegion 54 4 0 Predominantly Strong Material 4 4 Visayas 49 3 0 Predominantly Strong Material 5 5 MetroManila 74 3 0 Predominantly Strong Material 6 6 Visayas 59 6 0 Predominantly Strong Material
R Code
The following is the key R code to do the Poisson regression.
result_1 = glm(total ~ age, family = poisson, data = data_HH)
Since the p-value for age is significant (p < 0.05) based on the output below, age is a significant predictor of household size.
> summary(result_1) Call: glm(formula = total ~ age, family = poisson, data = data_HH) Deviance Residuals: Min 1Q Median 3Q Max -2.9079 -0.9637 -0.2155 0.6092 4.9561 Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) 1.5499422 0.0502754 30.829 < 2e-16 *** age -0.0047059 0.0009363 -5.026 5.01e-07 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 (Dispersion parameter for poisson family taken to be 1) Null deviance: 2362.5 on 1499 degrees of freedom Residual deviance: 2337.1 on 1498 degrees of freedom AIC: 6714 Number of Fisher Scoring iterations: 5
Interpretation
Based on the output above, we can write the following Poisson regression equation. \( \hat{\lambda} \) is the mean of the household size.
\[ log (\hat{\lambda}) =b_0+b_1 Age =1.55 -0.0047 Age \]
We can do a simple math transformation and get the following.
\[ \frac{\lambda_{Age+1}}{\lambda_{Age}} =e^{\beta_1}=e^{-0.0047}=0.995 \]
We can further make some transformations and get the following.
\[ \lambda_{Age+1} =0.995 \lambda_{Age} \]
\[ \lambda_{Age+1} – \lambda_{Age}=0.995 \lambda_{Age}- \lambda_{Age}=-0.005 \lambda_{Age}\]
Thus, the difference in the household size by changing 1 unit of age is \( -0.005 \lambda_{Age} \).
Let’s use the change from 80-year-old old to 81-year-old as the example. The equation above means that, on average, the change in household size changing 80- to 81-year-old is -0.005*80 =-0.4. Note that, 80-year-old is referring to the age of the head of household.