Chapter 10: Models for the Poisson family

Kjell Nygren

2026-06-21

library(glmbayes)

1. Introductory Discussion

Poisson regression is the canonical GLM for modeling count data (McCullagh and Nelder 1989).
The Poisson likelihood is log‑concave under the log link, making it an ideal setting for envelope‑based Bayesian sampling (Nygren and Nygren 2006).
This chapter mirrors the structure used for the Binomial vignette: we begin with the classical glm() fit, then introduce the Bayesian glmb() version, discuss prior specification, and compare posterior summaries to their classical counterparts.

We use the well‑known randomized controlled trial dataset of (Dobson 1990), which appears in many GLM textbooks and is included in the base R documentation for glm().
The dataset contains counts of subjects classified by treatment group and outcome category.

2. Poisson Likelihood and Model Structure

Poisson regression models count data of the form

\[ Y_i \in \{0,1,2,\ldots\}, \qquad Y_i \sim \text{Poisson}(\mu_i), \]

where \(\mu_i > 0\) is the expected count.
Observation weights \(w_i\) (e.g., exposure times) enter the likelihood through

\[ Y_i \sim \text{Poisson}(w_i \mu_i). \]

The mean is linked to a linear predictor via

\[ \eta_i = x_i^\top \beta, \qquad \mu_i = g^{-1}(\eta_i). \]

2.1 Weighted Poisson Log‑Likelihood

Up to constants, the weighted log‑likelihood is

\[ \ell(\beta) = \sum_{i=1}^n \left[ w_i y_i \log(\mu_i) - w_i \mu_i \right]. \]

Under the log link, the mean is

\[ \mu_i = e^{\eta_i}, \]

and the log‑likelihood simplifies to

\[ \ell(\beta) = \sum_{i=1}^n \left[ w_i y_i \eta_i - w_i e^{\eta_i} \right]. \]

This is the form used by both glm() and the Bayesian functions glmb() and rglmb().

2.2 Exponential‑Family Representation

The Poisson distribution belongs to the exponential family with canonical parameter

\[ \theta_i = \log(\mu_i), \]

and cumulant function

\[ b(\theta_i) = e^{\theta_i}. \]

The variance is

\[ \mathrm{Var}(Y_i) = \mu_i. \]

Under the log link, the linear predictor equals the canonical parameter:

\[ \eta_i = \theta_i. \]

This identity ensures that the Poisson log‑likelihood is globally log‑concave in the linear predictor, which is ideal for envelope construction and accept–reject sampling in glmbayes.

2.4 Likelihood, Prior, and Posterior (Normal Prior on beta)

With the log link, the weighted log‑likelihood is

\[ \ell(\beta) = \sum_{i=1}^n \left[ w_i y_i \eta_i - w_i e^{\eta_i} \right]. \]

A multivariate normal prior is specified as

\[ \log p(\beta) = -\tfrac{1}{2}(\beta - \mu_0)^\top \Sigma_0^{-1} (\beta - \mu_0) + \text{const}. \]

Combining likelihood and prior yields the log‑posterior:

\[ \log p(\beta \mid y) = \sum_{i=1}^n \left[ w_i y_i \eta_i - w_i e^{\eta_i} \right] - \tfrac{1}{2}(\beta - \mu_0)^\top \Sigma_0^{-1} (\beta - \mu_0) + \text{const}. \]

Because both terms are concave in \(\beta\), the posterior is log‑concave, enabling efficient iid sampling via the envelope‑based accept–reject sampler implemented in glmbayes.

3. Data Setup

We reproduce Dobson’s example exactly.
The data consist of 9 observations: three outcome levels crossed with three treatment groups.

    ## Dobson (1990) Page 93: Randomized Controlled Trial :
    set.seed(333)
    counts <- c(18,17,15,20,10,20,25,13,12)
    outcome <- gl(3,1,9)
    treatment <- gl(3,3)
    print(d.AD <- data.frame(treatment, outcome, counts))
#>   treatment outcome counts
#> 1         1       1     18
#> 2         1       2     17
#> 3         1       3     15
#> 4         2       1     20
#> 5         2       2     10
#> 6         2       3     20
#> 7         3       1     25
#> 8         3       2     13
#> 9         3       3     12

The table shows the observed counts for each treatment–outcome combination.

4. Classical Poisson Regression

The classical glm() call uses the Poisson family with the canonical log link:


    glm.D93 <- glm(counts ~ outcome + treatment,
                   family = poisson(link = "log"))
    summary(glm.D93)
#> 
#> Call:
#> glm(formula = counts ~ outcome + treatment, family = poisson(link = "log"))
#> 
#> Coefficients:
#>               Estimate Std. Error z value Pr(>|z|)    
#> (Intercept)  3.045e+00  1.709e-01  17.815   <2e-16 ***
#> outcome2    -4.543e-01  2.022e-01  -2.247   0.0246 *  
#> outcome3    -2.930e-01  1.927e-01  -1.520   0.1285    
#> treatment2   1.217e-15  2.000e-01   0.000   1.0000    
#> treatment3   8.438e-16  2.000e-01   0.000   1.0000    
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> (Dispersion parameter for poisson family taken to be 1)
#> 
#>     Null deviance: 10.5814  on 8  degrees of freedom
#> Residual deviance:  5.1291  on 4  degrees of freedom
#> AIC: 56.761
#> 
#> Number of Fisher Scoring iterations: 4

4.1 Interpretation of the Classical Output

The glm summary reports:

The coefficients represent multiplicative effects on the expected count.
For example, \(\exp(\beta)\) gives the rate ratio associated with a one-unit change in a predictor.

The residual deviance (approximately 5.13) indicates a good fit relative to the null deviance (approximately 10.58), consistent with Dobson’s original analysis.

5. Bayesian Poisson Regression with glmb()

To fit the Bayesian analogue, we must specify a prior distribution for the regression coefficients.
The recommended workflow uses Prior_Setup(), which constructs a Zellner‑type g‑prior aligned with the model matrix.

5.1 Setting the Prior


    ps <- Prior_Setup(counts ~ outcome + treatment,
                      family = poisson())
#> Using default pwt = 0.01 (low-d default).
    mu <- ps$mu
    V  <- ps$Sigma

The printed Prior_Setup() output shows:

With the default pwt = 0.01, the prior is weakly informative.

5.2 Calling glmb()


    glmb.D93 <- glmb(counts ~ outcome + treatment,
                     family = poisson(),
                     pfamily = dNormal(mu = mu, Sigma = V))

This call mirrors glm() but adds the pfamily argument.

6. Printing and Summarizing the Bayesian Output

    print(glmb.D93)
#> 
#> Call:  glmb(formula = counts ~ outcome + treatment, family = poisson(), 
#>     pfamily = dNormal(mu = mu, Sigma = V))
#> 
#> Posterior Mean Coefficients:
#> (Intercept)     outcome2     outcome3   treatment2   treatment3  
#>    3.031406    -0.454932    -0.298435    -0.009492     0.002938  
#> 
#> Effective Number of Parameters: 4.776027 
#> Expected Residual Deviance: 9.954241 
#> DIC: 56.36245
    summary(glmb.D93)
#> Call
#> glmb(formula = counts ~ outcome + treatment, family = poisson(), pfamily = dNormal(mu = mu, Sigma = V)) 
#> 
#> Expected Residuals:
#>        Min         1Q     Median         3Q        Max 
#> -0.9288665 -0.6337924 -0.1207802  0.8731528  1.1425877 
#> 
#> Prior and Maximum Likelihood Estimates with Standard Deviations
#> 
#>              Null Mode Prior Mean   Prior.sd  Max Like. Like.sd
#> (Intercept)  2.813e+00  2.813e+00  1.700e+00  3.045e+00   0.171
#> outcome2     0.000e+00  0.000e+00  2.012e+00 -4.543e-01   0.202
#> outcome3     0.000e+00  0.000e+00  1.918e+00 -2.930e-01   0.193
#> treatment2   0.000e+00  0.000e+00  1.990e+00  1.217e-15   0.200
#> treatment3   0.000e+00  0.000e+00  1.990e+00  8.438e-16   0.200
#> 
#> Bayesian Estimates Based on 1000 iid draws
#> 
#>              Post.Mode  Post.Mean    Post.Sd   MC Error Pr(Null_tail) SE(tail)
#> (Intercept)  3.042e+00  3.031e+00  1.667e-01  5.272e-03     1.010e-01    0.010
#> outcome2    -4.497e-01 -4.549e-01  1.951e-01  6.168e-03     1.000e-02    0.003
#> outcome3    -2.901e-01 -2.984e-01  1.947e-01  6.156e-03     6.500e-02    0.008
#> treatment2  -1.395e-05 -9.492e-03  1.987e-01  6.283e-03     4.790e-01    0.016
#> treatment3   1.209e-06  2.938e-03  2.029e-01  6.416e-03     4.870e-01    0.016
#>             Pr(Prior_tail)  
#> (Intercept)          0.101  
#> outcome2             0.010 *
#> outcome3             0.065 .
#> treatment2           0.479  
#> treatment3           0.487  
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>   Directional Tail Summaries:
#> 
#>                Metric vs Null vs Prior
#>  Mahalanobis Distance  2.3666   2.3666
#>      Tail Probability  0.0120   0.0120
#>   [Tail probabilities are P(delta^T * Z <= 0) in whitened space]
#> 
#> 
#> Distribution Percentiles
#> 
#>                  1.0%      2.5%      5.0%    Median     95.0%     97.5%  99.0%
#> (Intercept)  2.645322  2.689869  2.750789  3.035127  3.299951  3.362434  3.426
#> outcome2    -0.884898 -0.829345 -0.769260 -0.455517 -0.145054 -0.083062 -0.001
#> outcome3    -0.751209 -0.676707 -0.621690 -0.303178  0.027935  0.082337  0.144
#> treatment2  -0.469011 -0.419073 -0.341970 -0.007344  0.318454  0.383573  0.461
#> treatment3  -0.455859 -0.400620 -0.335693  0.008109  0.340731  0.391496  0.447
#> 
#> Effective Number of Parameters: 4.776027 
#> Expected Residual Deviance: 9.954241 
#> DIC: 56.36245 
#> 
#> Expected Mean dispersion: 1 
#> Sq.root of Expected Mean dispersion: 1 
#> 
#> Mean Likelihood Subgradient Candidates Per iid sample: 1.875

The summary includes:

6.1 Posterior Mean Coefficients

Posterior means are close to the classical MLEs because:

6.2 Effective Number of Parameters

The pD value is typically slightly below the nominal number of coefficients, reflecting mild shrinkage from the prior.

6.3 DIC

DIC is approximately 56.4, very close to the classical AIC value of 56.76.
This agreement is expected when the prior is weak and the posterior distribution is close to Gaussian.

7. Comparison of Classical and Bayesian Estimates

Quantity Classical glm Bayesian glmb
Point estimate MLE Posterior mean
Uncertainty SE Posterior SD
Model fit Deviance, AIC \(\bar{D}\), \(p_D\), DIC
Interpretation log-rate ratios same, with shrinkage

The Bayesian model provides richer diagnostics (tail probabilities, posterior quantiles) while preserving the familiar GLM structure.

8. Credible Intervals and Posterior Interpretation

Posterior percentiles (2.5%, 50%, 97.5%) provide Bayesian credible intervals for each coefficient.
These intervals closely track the classical Wald intervals because the posterior is nearly normal.

9. Concluding Remarks

This vignette demonstrates that:

In later chapters, we extend these ideas to:


Appendix A. Bayes Rules! companion — equality_index Poisson regression

Book: (Johnson et al. 2022), Chapter 12 — Poisson regression
Data: bayesrules::equality_index (California removed, as in the book)
Model: laws ~ percent_urban + historical (log link)
Priors in this appendix: Prior_Setup() defaults only. The book uses an informative intercept prior plus weak / autoscaling priors on slopes (partially informative specification); we do not mix in the book intercept prior here.

Requires bayesrules. The table below compares book posterior summaries (tidy(equality_model), Ch. 12) to glmb() under glmbayes default priors.

library(bayesrules)
equality <- bayesrules::equality_index
equality <- equality[equality$laws < max(equality$laws), ]

ps_eq <- Prior_Setup(laws ~ percent_urban + historical,
                     family = poisson(), data = equality)
#> Using default pwt = 0.01 (low-d default).

## Bayes Rules! Ch. 12 tidy(equality_model) posterior estimates (log link)
book_br10 <- data.frame(
  parameter = c("(Intercept)", "percent_urban", "historicalgop", "historicalswing"),
  book_mean = c(1.71, 0.0164, -1.52, -0.610),
  book_sd   = c(0.303, 0.00353, 0.134, 0.103),
  check.names = FALSE
)
set.seed(2026)
glmb_eq <- glmb(
  laws ~ percent_urban + historical,
  family  = poisson(),
  pfamily = dNormal(mu = ps_eq$mu, Sigma = ps_eq$Sigma),
  data    = equality,
  n       = 2000
)
#> [glmb_Standardize_Model][WARNING] Posterior Hessian is severely ill-conditioned.
#>   kappa(H) = 335305
#>   Expect sensitivity to rounding and potential instability.
print(glmb_eq)
#> 
#> Call:  glmb(formula = laws ~ percent_urban + historical, family = poisson(), 
#>     pfamily = dNormal(mu = ps_eq$mu, Sigma = ps_eq$Sigma), n = 2000, 
#>     data = equality)
#> 
#> Posterior Mean Coefficients:
#>     (Intercept)    percent_urban    historicalgop  historicalswing  
#>         1.72312          0.01613         -1.49835         -0.60375  
#> 
#> Effective Number of Parameters: 3.922409 
#> Expected Residual Deviance: 188.3049 
#> DIC: 372.6788
br10_compare <- data.frame(
  parameter = book_br10$parameter,
  `Book mean` = book_br10$book_mean,
  `Book SD`   = book_br10$book_sd,
  `glmb Post.Mean` = as.numeric(glmb_eq$coef.means[book_br10$parameter]),
  `glmb Post.Sd`   = sapply(book_br10$parameter, function(p)
    sd(glmb_eq$coefficients[, p, drop = TRUE])),
  check.names = FALSE
)
knitr::kable(br10_compare, digits = 4,
  caption = "Bayes Rules! Ch. 12 (informative + weak book priors) vs. glmb() with Prior_Setup() defaults")
Bayes Rules! Ch. 12 (informative + weak book priors) vs. glmb() with Prior_Setup() defaults
parameter Book mean Book SD glmb Post.Mean glmb Post.Sd
(Intercept) (Intercept) 1.7100 0.3030 1.7231 0.3037
percent_urban percent_urban 0.0164 0.0035 0.0161 0.0036
historicalgop historicalgop -1.5200 0.1340 -1.4983 0.1361
historicalswing historicalswing -0.6100 0.1030 -0.6038 0.1046

Posterior means will differ from the book because the book’s fit combines an informative intercept with weakly informative slope priors, while this fit uses a single weak Prior_Setup() specification throughout.

References

Dobson, A. J. 1990. An Introduction to Generalized Linear Models. Chapman; Hall.
Johnson, Alicia A., Miles Q. Ott, and Mine Dogucu. 2022. Bayes Rules! An Introduction to Applied Bayesian Modeling. CRC Press. https://www.bayesrulesbook.com.
McCullagh, P., and J. A. Nelder. 1989. Generalized Linear Models. Chapman; Hall.
Nygren, K. N., and L. M. Nygren. 2006. Likelihood Subgradient Densities.” Journal of the American Statistical Association 101 (475): 1144–56. https://doi.org/10.1198/016214506000000357.

mirror server hosted at Truenetwork, Russian Federation.