Title: Empirical Bayes Methods for Pharmacovigilance
Version: 0.2.0
Maintainer: Yihao Tan <yihaotan@buffalo.edu>
Description: A suite of empirical Bayes methods to use in pharmacovigilance. Contains various model fitting and post-processing functions. For more details see Tan et al. (2025) <doi:10.48550/arXiv.2502.09816>, <doi:10.48550/arXiv.2512.01057>; Koenker and Mizera (2014) <doi:10.1080/01621459.2013.869224>; Efron (2016) <doi:10.1093/biomet/asv068>.
License: GPL-3
Encoding: UTF-8
RoxygenNote: 7.3.2
Depends: R (≥ 3.6.0)
Imports: data.table, ggdist, ggfittext, ggplot2, glue, graphics, magrittr, methods, SobolSequence, stats, wacolors, Rcpp, splines, CVXR
LinkingTo: Rcpp, RcppEigen
LazyData: true
Suggests: knitr, rmarkdown, testthat (≥ 3.0.0)
Config/testthat/edition: 3
URL: https://github.com/YihaoTancn/pvEBayes, https://yihaotancn.github.io/pvEBayes/
BugReports: https://github.com/YihaoTancn/pvEBayes/issues
VignetteBuilder: knitr
NeedsCompilation: yes
Packaged: 2025-12-15 19:17:09 UTC; yihao
Author: Yihao Tan ORCID iD [aut, cre], Marianthi Markatou ORCID iD [aut], Saptarshi Chakraborty ORCID iD [aut], Raktim Mukhopadhyay ORCID iD [aut]
Repository: CRAN
Date/Publication: 2025-12-15 20:10:08 UTC

A suite of empirical Bayes methods to use in pharmacovigilance.

Description

pvEBayes provides a collection of parametric and non-parametric empirical Bayes methods implementation for pharmacovigilance (including signal detection and signal estimation) on spontaneous reporting systems (SRS) data.

An SRS dataset catalogs AE reports on I AE rows across J drug columns. Let N_{ij} denote the number of reported cases for the i-th AE and the j-th drug, where i = 1,..., I and j = 1,..., J. We assume that for each AE-drug pair, N_{ij} \sim \text{Poisson}(\lambda_{ij} E_{ij}), where E_{ij} is expected baseline value measuring the expected count of the AE-drug pair when there is no association between i-th AE and j-th drug. The parameter \lambda_{ij} \geq 0 represents the relative reporting ratio, the signal strength, for the (i, j)-th pair measuring the ratio of the actual expected count arising due to dependence to the null baseline expected count. Current disproportionality analysis mainly focuses on signal detection which seeks to determine whether the observation N_{ij} is substantially greater than the corresponding null baseline E_{ij}. Under the Poisson model, that is to say, its signal strength \lambda_{ij} is significantly greater than 1.

In addition to signal detection, Tan et al. (Stat. in Med., 2025) broaden the role of disproportionality to signal estimation. The use of the flexible non-parametric empirical Bayes models enables more nuanced empirical Bayes posterior inference (parameter estimation and uncertainty quantification) on signal strength parameter \{ \lambda_{ij} \}. This allows researchers to distinguish AE-drug pairs that would appear similar under a binary signal detection framework. For example, the AE-drug pairs with signal strengths of 1.5 and 4.0 could both be significantly greater than 1 and detected as a signal. Such differences in signal strength may have distinct implications in medical and clinical contexts.

The methods included in pvEBayes differ by their assumptions on the prior distribution. Implemented methods include the Gamma-Poisson Shrinker (GPS), Koenker-Mizera (KM) method, Efron’s nonparametric empirical Bayes approach, the K-gamma model, and the general-gamma model.

The GPS model uses two gamma mixture prior by assuming the signal/non-signal structure in SRS data. However, in real-world setting, signal strengths (\lambda_{ij}) are often heterogeneous and thus follows a multi-modal distribution, making it difficult to assume a parametric prior. Non-parametric empirical Bayes models (KM, Efron, K-gamma and general-gamma) address this challenge by utilizing a flexible prior with general mixture form and estimating the prior distribution in a data-driven way.

pvEBayes offers the first implemention of the bi-level Expectation Conditional Maximization (ECM) algorithm proposed by Tan et al. (2025) for efficient parameter estimation in gamma mixture prior based models: GPS K-gamma and general-gamma.

The KM method has an existing implementation in the REBayes package, but it relies on Mosek, a commercial convex optimization solver, which may limit accessibility due to licensing issue. pvEBayes provides a alternative fully open-source implementation of the KM method using CVXR.

Efron’s method also has a general nonparametric empirical Bayes implementation in the deconvolveR package; however, that implementation does not support an exposure or offset parameter in the Poisson model, which corresponds to the expected null value E_{ij}. In pvEBayes, the implementation of the Efron's method is adapted and modified from deconvolveR to support E_{ij} in Poisson model.

For an introduction to pvEBayes, see the vignette.

Author(s)

Yihao Tan, Marianthi Markatou, Saptarshi Chakraborty and Raktim Mukhopadhyay.

Maintainer: Yihao Tan yihaotan@buffalo.edu

References

Tan Y, Markatou M and Chakraborty S. Flexible Empirical Bayesian Approaches to Pharmacovigilance for Simultaneous Signal Detection and Signal Strength Estimation in Spontaneous Reporting Systems Data. Statistics in Medicine. 2025; 44: 18-19, https://doi.org/10.1002/sim.70195.

Tan Y, Markatou M and Chakraborty S. pvEBayes: An R Package for Empirical Bayes Methods in Pharmacovigilance. arXiv:2512.01057 (stat.AP). https://doi.org/10.48550/arXiv.2512.01057

Koenker R, Mizera I. Convex Optimization, Shape Constraints, Compound Decisions, and Empirical Bayes Rules. Journal of the American Statistical Association 2014; 109(506): 674–685, https://doi.org/10.1080/01621459.2013.869224

Efron B. Empirical Bayes Deconvolution Estimates. Biometrika 2016; 103(1); 1-20, https://doi.org/10.1093/biomet/asv068

DuMouchel W. Bayesian data mining in large frequency tables, with an application to the FDA spontaneous reporting system. The American Statistician. 1999; 1;53(3):177-90.

See Also

Useful links:


Pipe operator

Description

See magrittr::%>% for details.

Usage

lhs %>% rhs

Arguments

lhs

A value or the magrittr placeholder.

rhs

A function call using the magrittr semantics.

Value

The result of calling rhs(lhs).


Fit an Efron model for a contingency table.

Description

Fit an Efron model for a contingency table.

Usage

.E_fit(N, E, c0 = 1, pDegree = 5, aStart = 1, rel.tol)

Arguments

N

an IxJ contingency table showing pairwise counts of adverse events for I AEs (along the rows) and J drugs (along the columns).

E

A matrix of expected counts under the null model for the SRS frequency table.

c0

numeric and greater than 0. It is a hyperparameter in "efron" model.

pDegree

integer greater than or equal to 2. It is a hyperparameter in Efron model.

aStart

initial value for parameter alpha in Efron model.

Details

The implementation of the "efron" model is adapted from the deconvolveR package, developed by Bradley Efron and Balasubramanian Narasimhan. The original implementation in deconvolveR does not support an exposure or offset parameter in the Poisson model, which corresponds to the expected null value (E_{ij}) for each AE-drug combination. To address this, we modified the relevant code to allow for the inclusion of E_{ij} in the Poisson likelihood. In addition, we implemented a method for estimating the degrees of freedom, enabling AIC- or BIC-based hyperparameter selection for the "efron" model (Tan et al. 2025). See pvEBayes_tune for details.

Value

a list of optimizer outputs

References

Narasimhan B, Efron B. deconvolveR: A G-modeling program for deconvolution and empirical Bayes estimation. Journal of Statistical Software. 2020; 2;94:1-20.

Tan Y, Markatou M and Chakraborty S. Flexible Empirical Bayesian Approaches to Pharmacovigilance for Simultaneous Signal Detection and Signal Strength Estimation in Spontaneous Reporting Systems Data. Statistics in Medicine. 2025; 44: 18-19, https://doi.org/10.1002/sim.70195.


Fit a Koenker-Mizera (KM) model for a contingency table.

Description

Fit a Koenker-Mizera (KM) model for a contingency table.

Usage

.KM_fit(N, E, rtol_KM = 1e-06)

Arguments

N

an IxJ contingency table showing pairwise counts of adverse events for I AEs (along the rows) and J drugs (along the columns).

E

A matrix of expected counts under the null model for the SRS frequency table.

rtol_KM

The relative tolerance on the duality gap.

Details

Parameter estimation for the "KM" model is formulated as a convex optimization problem. The objective function and constraints used in pvEBayes follow the same construction as in REBayes. Parameter estimation is performed using the open-source convex optimization package CVXR. The grid value generation follows the recommendation of Tan et al. (2025).

Value

a list of CVXR optimizer outputs

References

Koenker R, Gu J. REBayes: an R package for empirical Bayes mixture methods. Journal of Statistical Software. 2017; 4;82:1-26.

Tan Y, Markatou M and Chakraborty S. Flexible Empirical Bayesian Approaches to Pharmacovigilance for Simultaneous Signal Detection and Signal Strength Estimation in Spontaneous Reporting Systems Data. Statistics in Medicine. 2025; 44: 18-19, https://doi.org/10.1002/sim.70195.

Fu, A, Narasimhan, B, Boyd, S. CVXR: An R Package for Disciplined Convex Optimization. Journal of Statistical Software. 2020; 94;14:1-34.


Fit gamma mixture based empirical Bayes models using ECM algorithm.

Description

Fit gamma mixture based empirical Bayes models using ECM algorithm.

Usage

.NBmix_EM(
  N,
  E,
  dirichlet = TRUE,
  alpha = NULL,
  K = NULL,
  maxi = NULL,
  h = NULL,
  eps = 1e-04
)

Arguments

N

an IxJ contingency table showing pairwise counts of adverse events for I AEs (along the rows) and J drugs (along the columns).

E

A matrix of expected counts under the null model for the SRS frequency table.

dirichlet

logical. Used for "general-gamma" model. If is TRUE, a dirichlet hyperprior for weights of gamma mixture prior is applied.

alpha

numeric between 0 and 1. The hyperparameter of "general-gamma" model. It is needed if "general-gamma" model is used.

K

integer greater than or equal to 2. It is needed if "K-gamma" model is used.

maxi

upper limit of iteration for the ECM algorithm.

h

a vector of initialization of parameter h.

eps

a tolerance parameter for ECM algorithm.

Details

This function implements the ECM algorithm proposed by Tan et al. (2025), providing a stable and efficient implementation of Gamma-Poisson Shrinker(GPS), K-gamma and "general-gamma" methods for signal estimation and signal detection in Spontaneous Reporting System (SRS) data table.

Value

a list of optimizer outputs

References

Tan Y, Markatou M and Chakraborty S. Flexible Empirical Bayesian Approaches to Pharmacovigilance for Simultaneous Signal Detection and Signal Strength Estimation in Spontaneous Reporting Systems Data. Statistics in Medicine. 2025; 44: 18-19, https://doi.org/10.1002/sim.70195.

DuMouchel W. Bayesian data mining in large frequency tables, with an application to the FDA spontaneous reporting system. The American Statistician. 1999; 1;53(3):177-90.


Obtain Akaike Information Criterion (AIC) for a pvEBayes object

Description

This function defines the S3 AIC method for objects of class pvEBayes. It extracts the Akaike Information Criterion (AIC) from a fitted model.

Usage

## S3 method for class 'pvEBayes'
AIC(object, ..., k = 2)

Arguments

object

a pvEBayes object, which is the output of the function pvEBayes or pvEBayes_tune.

...

other input parameters. Currently unused.

k

numeric, the penalty per parameter to be used; the default k = 2 is the classical AIC.

Value

numeric, AIC score for the resulting model.

Examples


fit <- pvEBayes(
  contin_table = statin2025_44, model = "general-gamma",
  alpha = 0.3, n_posterior_draws = NULL
)

AIC_score <- AIC(fit)


Obtain Bayesian Information Criterion (BIC) for a pvEBayes object

Description

This function defines the S3 BIC method for objects of class pvEBayes. It extracts the Bayesian Information Criterion (BIC) from a fitted model.

Usage

## S3 method for class 'pvEBayes'
BIC(object, ...)

Arguments

object

a pvEBayes object, which is the output of the function pvEBayes or pvEBayes_tune.

...

other input parameters. Currently unused.

Value

numeric, BIC score for the resulting model.

Examples


fit <- pvEBayes(
  contin_table = statin2025_44, model = "general-gamma",
  alpha = 0.3, n_posterior_draws = NULL
)

BIC_score <- BIC(fit)


Estimate expected null baseline count based on reference row and column

Description

This function estimates the expected null baseline count (E_{ij}) for each AE-drug combination under the assumption of independence between rows and columns. The expected count is calculated using a reference row (other AEs) and reference column (other drugs). This null baseline is typically used in the empirical Bayes modeling of pvEBayes package for signal detection and estimation in spontaneous reporting system (SRS) data.

Usage

calculate_tilde_e(contin_table)

Arguments

contin_table

an IxJ contingency table showing pairwise counts of adverse events for I AEs (along the rows) and J drugs (along the columns). The reference row "Other AEs" and the reference column "Other drugs" need to be the I-th row and J-th column respectively.

Details

This null value estimator is proposed by Tan et al. (2025).

Value

an nrow(contin_table) by ncol(contin_table) matrix.

References

Tan Y, Markatou M and Chakraborty S. Flexible Empirical Bayesian Approaches to Pharmacovigilance for Simultaneous Signal Detection and Signal Strength Estimation in Spontaneous Reporting Systems Data. Statistics in Medicine. 2025; 44: 18-19, https://doi.org/10.1002/sim.70195.


Estimate expected null baseline count based on reference row and column

Description

This function estimates the expected null baseline count (E_{ij}) for each AE-drug combination under the assumption of independence between rows and columns. The expected count is calculated using a reference row (other AEs) and reference column (other drugs). This null baseline is typically used in empirical Bayes modeling of pvEBayes package for signal detection and estimation in spontaneous reporting system (SRS) data.

Usage

estimate_null_expected_count(contin_table)

Arguments

contin_table

an IxJ contingency table showing pairwise counts of adverse events for I AEs (along the rows) and J drugs (along the columns). The reference row "Other AEs" and the reference column "Other drugs" need to be the I-th row and J-th column respectively.

Details

This null value estimator is proposed by Tan et al. (2025).

Value

an nrow(contin_table) by ncol(contin_table) matrix.

References

Tan Y, Markatou M and Chakraborty S. Flexible Empirical Bayesian Approaches to Pharmacovigilance for Simultaneous Signal Detection and Signal Strength Estimation in Spontaneous Reporting Systems Data. Statistics in Medicine. 2025; 44: 18-19, https://doi.org/10.1002/sim.70195.

Examples



estimate_null_expected_count(statin2025_44)


Extract all fitted models from a tuned pvEBayes Object

Description

This function retrieves the list of all fitted models from a pvEBayes_tuned object, which is the output of the pvEBayes_tune() function.

Usage

extract_all_fitted_models(object)

Arguments

object

An object of class pvEBayes_tuned, usually returned by pvEBayes_tune. This function will throw an error if the input is not of the correct class.

Value

A list containing the results of each model fitted during the tuning process.

Examples


valid_matrix <- matrix(c(1, 2, 3, 4, 5, 6, 7, 8), nrow = 2)
rownames(valid_matrix) <- c("AE_1", "AE_2")
colnames(valid_matrix) <- c("drug_1", "drug_2", "drug_3", "drug_4")

tuned_object <- pvEBayes_tune(valid_matrix,
  model = "general-gamma",
  return_all_fit = TRUE
)
extract_all_fitted_models(tuned_object)


Generate an eyeplot showing the distribution of posterior draws for selected drugs and adverse events

Description

This function creates an eyeplot to visualize the posterior distributions of \lambda_{ij} for selected AEs and drugs. The plot displays posterior median, 90 percent credible interval for each selected AE-drug combination.

Usage

eyeplot_pvEBayes(
  x,
  num_top_AEs = 10,
  num_top_drugs = 8,
  specified_AEs = NULL,
  specified_drugs = NULL,
  N_threshold = 1,
  text_shift = 4,
  x_lim_scalar = 1.3,
  text_size = 3,
  log_scale = FALSE
)

Arguments

x

a pvEBayes object, which is the output of the function pvEBayes or pvEBayes_tune.

num_top_AEs

a number of most significant AEs appearing in the plot. Default to 10.

num_top_drugs

a number of most significant drugs appearing in the plot. Default to 7.

specified_AEs

a vector of AE names that are specified to appear in the plot. If a vector of AEs is given, argument num_top_AEs will be ignored.

specified_drugs

a vector of drug names that are specified to appear in the plot. If a vector of drugs is given, argument num_top_drugs will be ignored.

N_threshold

a integer greater than 0. Any AE-drug combination with observation smaller than N_threshold will be filtered out.

text_shift

numeric. Controls the relative position of text labels, (e.g., "N = 1", "E = 2"). A larger value shifts the "E = 2" further away from its original position.

x_lim_scalar

numeric. An x-axis range scalar that ensures text labels are appropriately included in the plot.

text_size

numeric. Controls the size of text labels, (e.g., "N = 1", "E = 2").

log_scale

logical. If TRUE, the eye plot displays the posterior distribution of \log(\lambda_{ij}) for the selected AEs and drugs.

Value

a ggplot2 object.

Examples

fit <- pvEBayes(
  contin_table = statin2025_44, model = "general-gamma",
  alpha = 0.3, n_posterior_draws = 1000
)

AE_names <- rownames(statin2025_44)[1:6]
drug_names <- colnames(statin2025_44)[-7]

eyeplot_pvEBayes(
  x = fit
)


FDA GBCA dataset with 1328 adverse events

Description

An adverse event-drug count dataset (contingency table) obtained from the FDA FAERS database for the quarters 2021Q1 - 2024Q4.

Usage

gbca2025

Format

An object of class matrix (inherits from array) with 1328 rows and 8 columns.

Details

Data are stored in the form of a contingency table, with drugs listed across the columns and the 1328 AEs presented across the rows. Each cell in the contingency table represents the total report counts associated with that (AE, drug) pair and detected in the FDA FAERS database during 2021Q1 - 2024Q4.

The dataset catalogs 7 Gadolinium-Based Contrast Agents (GBCAs):

Gadobenate, Gadobutrol, Gadodiamide, Gadopentetate, Gadoterate, Gadoteridol, Gadoxetate

The 1328 adverse events presented across the rows are AEs that contain at least one report for the 7 GBCA drugs during 2021Q1 - 2024Q4.

This dataset is an updated version of gbca from the pvLRT package which collects the same scope of AEs for 7 gbca drugs for quarters 2014Q3 - 2020Q4.

Source

https://fis.fda.gov/extensions/FPD-QDE-FAERS/FPD-QDE-FAERS.html


FDA GBCA dataset with 69 adverse events

Description

An adverse event-drug count dataset (contingency table) obtained from the FDA FAERS database for the quarters 2021Q1 - 2024Q4

Usage

gbca2025_69

Format

An object of class matrix (inherits from array) with 70 rows and 8 columns.

Details

Data are stored in the form of a contingency table, with drugs listed across the columns and the 69 AEs presented across the rows. Each cell in the contingency table represents the total report counts associated with that (AE, drug) pair and detected in the FDA FAERS database during 2021Q1 - 2024Q4.

The dataset catalogs 7 Gadolinium-Based Contrast Agents (GBCAs) (across columns):

Gadobenate, Gadobutrol, Gadodiamide, Gadopentetate, Gadoterate, Gadoteridol, Gadoxetate.

The 69 adverse events presented across the rows are selected from 1328 AEs of gbca2025 which are related to the brain or neural system. Other AEs are collapsed to one reference row: "Other AEs".

Source

https://fis.fda.gov/extensions/FPD-QDE-FAERS/FPD-QDE-FAERS.html


Generate random contingency tables based on a reference table embedded signals,and possibly with zero inflation

Description

This function generates random contingency tables that resemble a given reference table, with the option to embed signals and zero-inflation.

Usage

generate_contin_table(
  n_table = 1,
  ref_table,
  signal_mat = NULL,
  Variation = FALSE,
  zi_indic_mat = NULL
)

Arguments

n_table

a number of random matrices to generate.

ref_table

a reference table used as the basis for generating random tables.

signal_mat

a numeric matrix of the same dimension as the reference table (ref_table). The entry at position (i, j) in signal_mat represents the signal strength between the i-th adverse event and the j-th drug. By default, each pair is assigned a value of 1, indicating no signal for that pair.

Variation

logical. Include random noises to sig_mat while generating random tables. Default to FALSE. If set to TRUE, n_table of sig_mat incorporating the added noise will also be returned.

zi_indic_mat

logical matrix of the same size as ref_table indicating the positions of structural zero.

Value

A list of length n_table, with each entry being a nrow(ref_table) by ncol(ref_table) matrix.

References

Tan Y, Markatou M and Chakraborty S. Flexible Empirical Bayesian Approaches to Pharmacovigilance for Simultaneous Signal Detection and Signal Strength Estimation in Spontaneous Reporting Systems Data. Statistics in Medicine. 2025; 44: 18-19, https://doi.org/10.1002/sim.70195.

Examples


set.seed(1)
ref_table <- statin2025_44


# set up signal matrix with one signal at entry (1,1)
sig_mat <- matrix(1, nrow(ref_table), ncol(ref_table))
sig_mat[1, 1] <- 2

# set up structural zero matrix
Z <- matrix(0, nrow(ref_table), ncol(ref_table))
Z[5, 1] <- 1

simu_table <- generate_contin_table(ref_table,
  signal_mat = sig_mat,
  n_table = 1,
  Variation = TRUE,
  zi_indic_mat = Z
)[[1]][[1]]


Generate a heatmap plot visualizing posterior probabilities for selected drugs and adverse events

Description

This function generates a heatmap to visualize the posterior probabilities of being a signal for selected AEs and drugs.

Usage

heatmap_pvEBayes(
  x,
  num_top_AEs = 10,
  num_top_drugs = 8,
  specified_AEs = NULL,
  specified_drugs = NULL,
  cutoff_signal = NULL
)

Arguments

x

a pvEBayes object, which is the output of the function pvEBayes or pvEBayes_tune.

num_top_AEs

number of most significant AEs appearing in the plot. Default to 10.

num_top_drugs

number of most significant drugs appearing in the plot. Default to 7.

specified_AEs

a vector of AE names that are specified to appear in the plot. If a vector of AEs is given, argument num_top_AEs will be ignored.

specified_drugs

a vector of drug names that are specified to appear in the plot. If a vector of drugs is given, argument num_top_drugs will be ignored.

cutoff_signal

numeric. Threshold for signal detection. An AE-drug combination is classified as a detected signal if its 5th posterior percentile exceeds this threshold.

Value

a ggplot2 object.

Examples


fit <- pvEBayes(
  contin_table = statin2025_44, model = "general-gamma",
  alpha = 0.3, n_posterior_draws = 1000
)


heatmap_pvEBayes(
  x = fit,
  num_top_AEs = 10,
  num_top_drugs = 8,
  specified_AEs = NULL,
  specified_drugs = NULL,
  cutoff_signal = 1.001
)


Extract log marginal likelihood for a pvEBayes object

Description

This function defines the S3 logLik method for objects of class pvEBayes. It extracts the log marginal likelihood from a fitted model.

Usage

## S3 method for class 'pvEBayes'
logLik(object, ...)

Arguments

object

a pvEBayes object, which is the output of the function pvEBayes or pvEBayes_tune.

...

other input parameters. Currently unused.

Value

returns log marginal likelihood of a pvEBayes object.

Examples


fit <- pvEBayes(
  contin_table = statin2025_44, model = "general-gamma",
  alpha = 0.3, n_posterior_draws = NULL
)

logLik(fit)


Plotting method for a pvEBayes object

Description

This function defines the S3 plot method for objects of class pvEBayes.

Usage

## S3 method for class 'pvEBayes'
plot(x, type = "eyeplot", ...)

Arguments

x

a pvEBayes object, which is the output of the function pvEBayes or pvEBayes_tune.

type

character string determining the type of plot to show. Available choices are "eyeplot" which calls eyeplot_pvEBayes and "heatmap" which calls heatmap_pvEBayes. Note that the input for 'type' is case-sensitive.

...

additional arguments passed to heatmap_pvEBayes or eyeplot_pvEBayes.

Value

A ggplot object.

Examples


obj <- pvEBayes(statin2025_44, model = "general-gamma", alpha = 0.5)
plot(obj, type = "eyeplot")


Generate posterior draws for each AE-drug combination

Description

This function generates posterior draws from the posterior distribution of \lambda_{ij} for each AE-drug combination, based on a fitted empirical Bayes model. The posterior draws can be used to compute credible intervals, visualize posterior distributions, or support downstream inference.

Usage

posterior_draws(obj, n_posterior_draws = 1000)

Arguments

obj

a pvEBayes object, which is the output of the function pvEBayes or pvEBayes_tune.

n_posterior_draws

number of posterior draws for each AE-drug combination.

Value

The function returns an S3 object of class pvEBayes with posterior draws.

Examples


fit <- pvEBayes(
  contin_table = statin2025_44, model = "general-gamma",
  alpha = 0.3, n_posterior_draws = NULL
)

fit_with_draws <- posterior_draws(fit, n_posterior_draws = 1000)


Print method for a pvEBayes object

Description

This function defines the S3 print method for objects of class pvEBayes. It displays a concise description of the fitted model.

Usage

## S3 method for class 'pvEBayes'
print(x, ...)

Arguments

x

a pvEBayes object, which is the output of the function pvEBayes or pvEBayes_tune.

...

other input parameters. Currently unused.

Value

Invisibly returns the input pvEBayes object.

Examples


obj <- pvEBayes(
  contin_table = statin2025_44, model = "general-gamma",
  alpha = 0.5, n_posterior_draws = 10000
)

print(obj)

Fit a general-gamma, GPS, K-gamma, KM or efron model for a contingency table.

Description

This function fits a non-parametric empirical Bayes model to an AE-drug contingency table using one of several empirical Bayes approaches with specified hyperparameter, if is required. Supported models include the "general-gamma", "GPS", "K-gamma", "KM", and "efron".

Usage

pvEBayes(
  contin_table,
  model = "general-gamma",
  alpha = NULL,
  K = NULL,
  p = NULL,
  c0 = NULL,
  maxi = NULL,
  tol_ecm = 1e-04,
  rtol_efron = 1e-10,
  rtol_KM = 1e-06,
  n_posterior_draws = 1000,
  E = NULL
)

Arguments

contin_table

an IxJ contingency table showing pairwise counts of adverse events for I AEs (along the rows) and J drugs (along the columns).

model

the model to fit. Available models are "general-gamma", "K-gamma", "GPS", "KM" and "efron". Default to "general-gamma". Note that the input for model is case-sensitive.

alpha

numeric between 0 and 1. The hyperparameter of "general-gamma" model. It is needed if "general-gamma" model is used. Small 'alpha' encourages shrinkage on mixture weights of the estimated prior distribution. See Tan et al. (2025) for further details.

K

a integer greater than or equal to 2 indicating the number of mixture components in the prior distribution. It is needed if "K-gamma" model is used. See Tan et al. (2025) for further details.

p

a integer greater than or equal to 2. It is needed if "efron" mode is used. Larger p leads to smoother estimated prior distribution. See Narasimhan and Efron (2020) for detail.

c0

numeric and greater than 0. It is needed if "efron" mode is used. Large c0 encourage estimated prior distribution shrink toward discrete uniform. See Narasimhan and Efron (2020) for detail.

maxi

a upper limit of iteration for the ECM algorithm.

tol_ecm

a tolerance parameter used for the ECM stopping rule, defined as the absolute change in the joint marginal likelihood between two consecutive iterations. It is used when 'GPS', 'K-gamma' or 'general-gamma' model is fitted. Default to be 1e-4.

rtol_efron

a tolerance parameter used when 'efron' model is fitted. Default to 1e-10. See 'stats::nlminb' for detail.

rtol_KM

a tolerance parameter used when 'KM' model is fitted. Default to be 1e-6. See 'CVXR::solve' for detail.

n_posterior_draws

a number of posterior draws for each AE-drug combination.

E

A matrix of expected counts under the null model for the SRS frequency table. If NULL (default), the expected counts are estimated from the SRS data using 'estimate_null_expected_count()'.

Details

This function implements the ECM algorithm proposed by Tan et al. (2025), providing a stable and efficient implementation of Gamma-Poisson Shrinker(GPS), K-gamma and "general-gamma" methods for signal estimation and signal detection in Spontaneous Reporting System (SRS) data table.

Method "GPS" is proposed by DuMouchel (1999) and it is a parametric empirical Bayes model with a two gamma mixture prior distribution.

Methods "K-gamma" and "general-gamma" are non-parametric empirical Bayes models, introduced by Tan et al. (2025). The number of mixture components "K" needs to be prespecified when fitting a "K-gamma" model. For "general-gamma", the mixture weights are regularized by a Dirichlet hyper prior with hyperparameter 0 \leq \alpha < 1 that controls the shrinkage strength. As "alpha" approaches 0, less non-empty mixture components exist in the fitted model. When \alpha = 0, the Dirichlet distribution is an improper prior still offering a reasonable posterior inference that represents the strongest shrinkage of the "general-gamma" model.

Parameter estimation for the "KM" model is formulated as a convex optimization problem. The objective function and constraints follow the same construction as in REBayes. Parameter estimation is performed using the open-source convex optimization package CVXR.

The implementation of the "efron" model in this package is adapted from the deconvolveR package, developed by Bradley Efron and Balasubramanian Narasimhan. The original implementation in deconvolveR does not support an exposure or offset parameter in the Poisson model, which corresponds to the expected null value (E_{ij}) for each AE-drug combination. To address this, we modified the relevant code to allow for the inclusion of E_{ij} in the Poisson likelihood. In addition, we implemented a method for estimating the degrees of freedom, enabling AIC- or BIC-based hyperparameter selection for the "efron" model (Tan et al. 2025). See pvEBayes_tune for details.

Value

The function returns an S3 object of class pvEBayes containing the estimated model parameters as well as posterior draws for each AE-drug combination if the number of posterior draws is specified.

References

DuMouchel W. Bayesian data mining in large frequency tables, with an application to the FDA spontaneous reporting system. The American Statistician. 1999; 1;53(3):177-90.

Tan Y, Markatou M and Chakraborty S. Flexible Empirical Bayesian Approaches to Pharmacovigilance for Simultaneous Signal Detection and Signal Strength Estimation in Spontaneous Reporting Systems Data. Statistics in Medicine. 2025; 44: 18-19, https://doi.org/10.1002/sim.70195.

Narasimhan B, Efron B. deconvolveR: A G-modeling program for deconvolution and empirical Bayes estimation. Journal of Statistical Software. 2020; 2;94:1-20.

Koenker R, Gu J. REBayes: an R package for empirical Bayes mixture methods. Journal of Statistical Software. 2017; 4;82:1-26.

Fu, A, Narasimhan, B, Boyd, S. CVXR: An R Package for Disciplined Convex Optimization. Journal of Statistical Software. 2020; 94;14:1-34.

Examples


set.seed(1)
ref_table <- statin2025_44

# set up signal matrix with one signal at entry (1,1)
sig_mat <- matrix(1, nrow(ref_table), ncol(ref_table))
sig_mat[c(1, 5), 1] <- 2


simu_table <- generate_contin_table(
  ref_table = ref_table,
  signal_mat = sig_mat,
  n_table = 1
)[[1]]


# fit general-gamma model with a specified alpha
fit <- pvEBayes(
  contin_table = simu_table,
  model = "general-gamma",
  alpha = 0.3,
  n_posterior_draws = 1000,
  E = NULL,
  maxi = NULL
)

# fit K-gamma model with K = 3
fit_Kgamma <- pvEBayes(
  contin_table = simu_table, model = "K-gamma",
  K = 3, n_posterior_draws = 1000
)


# fit Efron model with specified hyperparameters
# p = 40, c0 = 0.05

## Not run: 
fit_efron <- pvEBayes(
  contin_table = simu_table,
  model = "efron",
  p = 40,
  c0 = 0.05,
  n_posterior_draws = 1000
)

## End(Not run)

# fit GPS model and comapre with 'openEBGM'


fit_gps <- pvEBayes(simu_table, model = "GPS")

## Not run: 

## Optional comparison with openEBGM (only if installed)

## tol_ecm is the absolute tolerance for ECM stopping rule.
## It is set to ensure comparability to `openEBGM`.

fit_gps <- pvEBayes(simu_table, model = "GPS", tol_ecm = 1e-2)

if (requireNamespace("openEBGM", quietly = TRUE)) {
  E <- estimate_null_expected_count(simu_table)
  simu_table_stacked <- as.data.frame(as.table(simu_table))
  simu_table_stacked$E <- as.vector(E)
  colnames(simu_table_stacked) <- c("var1", "var2", "N", "E")
  simu_table_stacked_squash <- openEBGM::autoSquash(simu_table_stacked)

  hyper_estimates <- openEBGM::hyperEM(simu_table_stacked_squash,
    theta_init = c(2, 1, 2, 2, 0.2),
    method = "nlminb",
    N_star = NULL,
    zeroes = TRUE,
    param_upper = Inf,
    LL_tol = 1e-2,
    max_iter = 10000
  )
}

theta_hat <- hyper_estimates$estimates
qn <- openEBGM::Qn(theta_hat,
  N = simu_table_stacked$N,
  E = simu_table_stacked$E
)

simu_table_stacked$q05 <- openEBGM::quantBisect(5,
  theta_hat = theta_hat,
  N = simu_table_stacked$N,
  E = simu_table_stacked$E,
  qn = qn
)

## obtain the detected signal provided by openEBGM
simu_table_stacked %>%
  subset(q05 > 1.001)

## detected signal from pvEBayes presented in the same way as openEBGM
fit_gps %>%
  summary(return = "posterior draws") %>%
  apply(c(2, 3), quantile, prob = 0.05) %>%
  as.table() %>%
  as.data.frame() %>%
  subset(Freq > 1.001)

## End(Not run)


Select hyperparameter and obtain the optimal general-gamma or efron model based on AIC and BIC

Description

This function performs hyperparameter tuning for the general-gamma or Efron model using AIC or BIC. For a given AE-drug contingency table, the function fits a series of models across a grid of candidate hyperparameter values and computes their AIC and BIC. The models with the lowest AIC or BIC values are selected as the optimal fits.

Usage

pvEBayes_tune(
  contin_table,
  model = "general-gamma",
  alpha_vec = NULL,
  p_vec = NULL,
  c0_vec = NULL,
  use_AIC = TRUE,
  n_posterior_draws = 1000,
  return_all_fit = FALSE,
  return_all_AIC = TRUE,
  return_all_BIC = TRUE,
  tol_ecm = 1e-04,
  rtol_efron = 1e-10,
  E = NULL
)

Arguments

contin_table

an IxJ contingency table showing pairwise counts of adverse events for I AEs (along the rows) and J drugs (along the columns).

model

the model to be tuned. Available models are "general-gamma" and "efron". Default to "general-gamma". Note that the input for model is case-sensitive.

alpha_vec

vector of hyperparameter alpha values to be selected. Alpha is a hyperparameter in general-gamma model which is numeric and between 0 and 1. If is NULL, a default set of alpha values (0, 0.1, 0.3, 0.5, 0.7, 0.9) will be used.

p_vec

vector of hyperparameter p values to be selected. p is a hyperparameter in "efron" model which should be a positive integer. If is NULL, a default set of p values (80, 100, 120, 150, 200) will be used.

c0_vec

vector of hyperparameter c0 values to be selected. c0 is a hyperparameter in "efron" model which should be a positive number. If is NULL, a default set of c0 values (0.001, 0.01, 0.1, 0.2, 0.5) will be used.

use_AIC

logical, indicating whether AIC or BIC is used. Default to be TRUE.

n_posterior_draws

number of posterior draws for each AE-drug combination.

return_all_fit

logical, indicating whether to return all the fitted model under the selection. Default to be FALSE.

return_all_AIC

logical, indicating whether to return AIC values for each fitted model under the selection. Default to be TRUE.

return_all_BIC

logical, indicating whether to return BIC values for each fitted model under the selection. Default to be TRUE.

tol_ecm

a tolerance parameter used for the ECM stopping rule, defined as the absolute change in the joint marginal likelihood between two consecutive iterations. It is used when 'GPS', 'K-gamma' or 'general-gamma' model is fitted. Default to be 1e-4.

rtol_efron

a tolerance parameter used when 'efron' model is fitted. Default to 1e-10. See 'stats::nlminb' for detail.

E

A matrix of expected counts under the null model for the SRS frequency table. If NULL (default), the expected counts are estimated from the SRS data using 'estimate_null_expected_count()'.

Value

The function returns an S3 object of class pvEBayes containing the selected estimated model parameters as well as posterior draws for each AE-drug combination if the number of posterior draws is specified.

References

Akaike H. A new look at the statistical model identification. IEEE Transactions on Automatic Control. 2003; 19(6):716-23.

Schwarz G. Estimating the dimension of a model. The Annals of Statistics. 1978; 1:461-4.

Tan Y, Markatou M and Chakraborty S. Flexible Empirical Bayesian Approaches to Pharmacovigilance for Simultaneous Signal Detection and Signal Strength Estimation in Spontaneous Reporting Systems Data. Statistics in Medicine. 2025; 44: 18-19, https://doi.org/10.1002/sim.70195.

Examples


fit <- pvEBayes_tune(statin2025_44,
  model = "general-gamma",
  alpha_vec = c(0, 0.1, 0.3, 0.5, 0.7, 0.9)
)


FDA statin dataset with 5119 adverse events

Description

An adverse event-drug count dataset (contingency table) obtained from the FDA FAERS database for the quarters 2021Q1 - 2024Q4.

Usage

statin2025

Format

An object of class matrix (inherits from array) with 5119 rows and 7 columns.

Details

The dataset catalogs 6 statin drugs (across columns): Atorvastatin, Fluvastatin, Lovastatin, Pravastatin, Rosuvastatin and Simvastatin.

Data are stored in the form of a contingency table, with drugs listed across the columns and the 5119 AEs presented across the rows. Each cell in the contingency table represents the total report counts associated with that (AE, drug) pair and detected in the FDA FAERS database during 2021Q1 - 2024Q4.

The dataset catalogs 6 statin drugs (across columns):

Atorvastatin, Fluvastatin, Lovastatin, Pravastatin, Rosuvastatin and Simvastatin.

The 5119 adverse events presented across the rows are AEs that contain at least one report for 6 statin drugs during 2021Q1 - 2024Q4.

This dataset is an updated version of statin from the pvLRT package which collects the same scope of AEs for 6 statin drugs for quarters 2014Q3 - 2020Q4.

Source

https://fis.fda.gov/extensions/FPD-QDE-FAERS/FPD-QDE-FAERS.html


FDA statin dataset with 44 adverse events

Description

An adverse event-drug count dataset (contingency table) obtained from the FDA FAERS database for the quarters 2021Q1 - 2024Q4.

Usage

statin2025_44

Format

An object of class matrix (inherits from array) with 45 rows and 7 columns.

Details

Data are stored in the form of a contingency table, with drugs listed across the columns and the 44 AEs presented across the rows. Each cell in the contingency table represents the total report counts associated with that (AE, drug) pair and detected in the FDA FAERS database during 2021Q1 - 2024Q4.

The dataset catalogs 6 statin drugs (across columns):

Atorvastatin, Fluvastatin, Lovastatin, Pravastatin, Rosuvastatin and Simvastatin.

The 44 adverse events presented across the rows are considered significant by FDA.

This dataset is an updated version of statin46 from the pvLRT package which collect the same scope of AEs for 6 statin drugs for quarters 2014Q3 - 2020Q4.

During 2021Q1 - 2024Q4, there was no AE report for "BLOOD CREATINE PHOSPHOKINASE MM INCREASED" and "MYOGLOBIN BLOOD PRESENT". Therefore, these two AEs are not presented in the statin2025_44 dataset.

Source

https://fis.fda.gov/extensions/FPD-QDE-FAERS/FPD-QDE-FAERS.html


FDA statin dataset with 42 adverse events

Description

An adverse event-drug count dataset (contingency table) obtained from the FDA FAERS database for the quarters 2014Q3 - 2020Q4.

Usage

statin42

Format

An object of class matrix (inherits from array) with 43 rows and 7 columns.

Details

Data are stored in the form of a contingency table, with drugs listed across the columns and the 42 AEs presented across the rows. Each cell in the contingency table represents the total report counts associated with that (AE, drug) pair and detected in the FDA FAERS database during 2014Q3 - 2020Q4.

The dataset catalogs 6 statin drugs (across columns):

Atorvastatin, Fluvastatin, Lovastatin, Pravastatin, Rosuvastatin and Simvastatin.

This dataset is derived from the statin46 dataset in the pvLRT package, with four AEs removed.

The excluded AEs are: "Blood Creatine Phosphokinase Mm Increased", "Myoglobin Blood Present", "Myoglobin Urine Present", and "Myoglobinaemia".

Source

https://fis.fda.gov/extensions/FPD-QDE-FAERS/FPD-QDE-FAERS.html


Summary method for a pvEBayes object

Description

This function defines the S3 summary method for objects of class pvEBayes. It provides a detailed summary of the fitted model.

Usage

## S3 method for class 'pvEBayes'
summary(object, return = NULL, ...)

Arguments

object

a pvEBayes object, which is the output of the function pvEBayes or pvEBayes_tune.

return

a character string specifying which component the summary function should return.Valid options include: "prior parameters", "likelihood", "detected signal" and "posterior draws". If set to NULL (default), all components will be returned in a list. Note that the input for 'return' is case-sensitive.

...

other input parameters. Currently unused.

Value

a list including estimated prior parameters, log_marginal_likelihood, indicator matrix of detected signal and posterior_draws for each AE-drug pair.

Examples


obj <- pvEBayes(
  contin_table = statin2025_44, model = "general-gamma",
  alpha = 0.5, n_posterior_draws = 10000
)

summary(obj)


Select hyperparameter (p, c0) and obtain the optimal efron model based on AIC and BIC

Description

Select hyperparameter (p, c0) and obtain the optimal efron model based on AIC and BIC

Usage

tuning_efron(
  contin_table,
  p_vec = NULL,
  c0_vec = NULL,
  return_all_fit = FALSE,
  return_all_AIC = TRUE,
  return_all_BIC = TRUE,
  rtol_efron = 1e-10
)

Arguments

contin_table

an IxJ contingency table showing pairwise counts of adverse events for I AEs (along the rows) and J drugs (along the columns).

p_vec

vector of hyperparameter p values to be selected. p is a hyperparameter in "efron" model which should be a positive integer. If is NULL, a default set of p values (80, 100, 120, 150, 200) will be used.

c0_vec

vector of hyperparameter c0 values to be selected. c0 is a hyperparameter in "efron" model which should be a positive number. If is NULL, a default set of c0 values (0.001, 0.01, 0.1, 0.2, 0.5) will be used.

rtol_efron

a tolerance parameter used when 'efron' model is fitted. Default to 1e-10. See 'stats::nlminb' for detail.

Value

a list of fitted models with hyperparameter alpha selected by AIC or BIC.

References

Akaike H. A new look at the statistical model identification. IEEE Transactions on Automatic Control. 2003; 19(6):716-23.

Schwarz G. Estimating the dimension of a model. The Annals of Statistics. 1978; 1:461-4.


Select hyperparameter alpha and obtain the optimal general-gamma model based on AIC and BIC

Description

Select hyperparameter alpha and obtain the optimal general-gamma model based on AIC and BIC

Usage

tuning_general_gamma(
  contin_table,
  alpha_vec = NULL,
  return_all_fit = FALSE,
  return_all_AIC = TRUE,
  return_all_BIC = TRUE,
  tol_ecm = 1e-04
)

Arguments

contin_table

an IxJ contingency table showing pairwise counts of adverse events for I AEs (along the rows) and J drugs (along the columns).

alpha_vec

vector of hyperparameter alpha values to be selected. Alpha is hyperparameter in general-gamma model which is numeric and between 0 and 1. If is NULL, a default set of alpha values (0, 0.1, 0.3, 0.5, 0.7, 0.9) will be used.

tol_ecm

a tolerance parameter used for the ECM stopping rule, defined as the absolute change in the joint marginal likelihood between two consecutive iterations. It is used when 'GPS', 'K-gamma' or 'general-gamma' model is fitted. Default to be 1e-4.

Value

a list of fitted models with hyperparameter alpha selected by AIC or BIC.

References

Akaike H. A new look at the statistical model identification. IEEE Transactions on Automatic Control. 2003; 19(6):716-23.

Schwarz G. Estimating the dimension of a model. The Annals of Statistics. 1978; 1:461-4.

mirror server hosted at Truenetwork, Russian Federation.