Help for package multibias

Type:

Package

Title:

Multiple Bias Analysis in Causal Inference

Version:

1.7.2

Date:

2025-06-15

Maintainer:

Paul Brendel <pcbrendel@gmail.com>

Description:

Quantify the causal effect of a binary exposure on a binary outcome with adjustment for multiple biases. The functions can simultaneously adjust for any combination of uncontrolled confounding, exposure/outcome misclassification, and selection bias. The underlying method generalizes the concept of combining inverse probability of selection weighting with predictive value weighting. Simultaneous multi-bias analysis can be used to enhance the validity and transparency of real-world evidence obtained from observational, longitudinal studies. Based on the work from Paul Brendel, Aracelis Torres, and Onyebuchi Arah (2023) <doi:10.1093/ije/dyad001>.

License:

MIT + file LICENSE

Encoding:

UTF-8

LazyData:

true

Depends:

R (≥ 4.2.0)

RoxygenNote:

7.3.2

Imports:

dplyr (≥ 1.1.3), lifecycle (≥ 1.0.3), magrittr (≥ 2.0.3), rlang (≥ 1.1.1), broom (≥ 1.0.5), purrr (≥ 1.0.0), ggplot2 (≥ 3.5.0)

Suggests:

knitr, rmarkdown, MASS, testthat (≥ 3.0.0), vdiffr (≥ 1.0.5)

URL:

https://github.com/pcbrendel/multibias, http://www.paulbrendel.com/multibias/

BugReports:

https://github.com/pcbrendel/multibias/issues

Config/testthat/edition:

VignetteBuilder:

knitr

NeedsCompilation:

Packaged:

2025-06-15 18:14:08 UTC; pbrendel

Author:

Paul Brendel [aut, cre, cph]

Repository:

CRAN

Date/Publication:

2025-06-15 18:40:02 UTC

multibias: Multiple Bias Analysis in Causal Inference

Description

Author(s)

Maintainer: Paul Brendel pcbrendel@gmail.com [copyright holder]

Represent bias parameters

Description

bias_params is one of two different options to represent bias assumptions for bias adjustment. The multibias_adjust() function will apply the assumptions from these models and use them to adjust for biases in the observed data. It takes one input, a list, where each item in the list corresponds to the necessary models for bias adjustment. See below for bias models.

For each of the following bias models, the variables are defined:

X = True exposure
X* = Misclassified exposure
Y = True outcome
Y* = Misclassified outcome
C = Known confounder
j = Number of known confounders
U = Uncontrolled confounder
S = Selection indicator

Uncontrolled confounding: logit(P(U=1)) = α₀ + α₁X + α₂Y + α_2+jC_j
Exposure misclassification: logit(P(X=1)) = δ₀ + δ₁X* + δ₂Y + δ_2+jC_j
Outcome misclassification: logit(P(Y=1)) = δ₀ + δ₁X + δ₂Y* + δ_2+jC_j
Selection bias: logit(P(S=1)) = β₀ + β₁X + β₂Y
Uncontrolled Confounding & Exposure Misclassification (Option 1): logit(P(U=1)) = α₀ + α₁X + α₂Y
logit(P(X=1)) = δ₀ + δ₁X* + δ₂Y + δ_2+jC_j
Uncontrolled Confounding & Exposure Misclassification (Option 2): log(P(X=1,U=0)/P(X=0,U=0)) = γ_1,0 + γ_1,1X* + γ_1,2Y + γ_1,2+jC_j
log(P(X=0,U=1)/P(X=0,U=0)) = γ_2,0 + γ_2,1X* + γ_2,2Y + γ_2,2+jC_j
log(P(X=1,U=1)/P(X=0,U=0)) = γ_3,0 + γ_3,1X* + γ_3,2Y + γ_3,2+jC_j
Uncontrolled Confounding & Outcome Misclassification (Option 1): logit(P(U=1)) = α₀ + α₁X + α₂Y
logit(P(Y=1)) = δ₀ + δ₁X + δ₂Y* + δ_2+jC_j
Uncontrolled Confounding & Outcome Misclassification (Option 2): log(P(U=1,Y=0)/P(U=0,Y=0)) = γ_1,0 + γ_1,1X + γ_1,2Y* + γ_1,2+jC_j
log(P(U=0,Y=1)/P(U=0,Y=0)) = γ_2,0 + γ_2,1X + γ_2,2Y* + γ_2,2+jC_j
log(P(U=1,Y=1)/P(U=0,Y=0)) = γ_3,0 + γ_3,1X + γ_3,2Y* + γ_3,2+jC_j
Uncontrolled Confounding & Selection Bias: logit(P(U=1)) = α₀ + α₁X + α₂Y + α_2+jC_j
logit(P(S=1)) = β₀ + β₁X + β₂Y
Exposure Misclassification & Outcome Misclassification (Option 1): logit(P(X=1)) = δ₀ + δ₁X* + δ₂Y* + δ_2+jC_j
logit(P(Y=1)) = β₀ + β₁X + β₂Y* + β_2+jC_j
Exposure Misclassification & Outcome Misclassification (Option 2): log(P(X=1,Y=0) / P(X=0,Y=0)) = γ_1,0 + γ_1,1X* + γ_1,2Y* + γ_1,2+jC_j
log(P(X=0,Y=1) / P(X=0,Y=0)) = γ_2,0 + γ_2,1X* + γ_2,2Y* + γ_2,2+jC_j
log(P(X=1,Y=1) / P(X=0,Y=0)) = γ_3,0 + γ_3,1X* + γ_3,2Y* + γ_3,2+jC_j
Exposure Misclassification & Selection Bias: logit(P(X=1)) = δ₀ + δ₁X* + δ₂Y + δ_2+jC_j
logit(P(S=1)) = β₀ + β₁X* + β₂Y + β_2+jC_j
Outcome Misclassification & Selection Bias: logit(P(Y=1)) = δ₀ + δ₁X + δ₂Y* + δ_2+jC_j
logit(P(S=1)) = β₀ + β₁X + β₂Y* + β_2+jC_j
Uncontrolled Confounding, Exposure Misclassification, and Selection Bias (Option 1): logit(P(U=1)) = α₀ + α₁X + α₂Y
logit(P(X=1)) = δ₀ + δ₁X* + δ₂Y + δ_2+jC_j
logit(P(S=1)) = β₀ + β₁X* + β₂Y + β_2+jC_j
Uncontrolled Confounding, Exposure Misclassification, and Selection Bias (Option 2): log(P(X=1,U=0)/P(X=0,U=0)) = γ_1,0 + γ_1,1X* + γ_1,2Y + γ_1,2+jC_j
log(P(X=0,U=1)/P(X=0,U=0)) = γ_2,0 + γ_2,1X* + γ_2,2Y + γ_2,2+jC_j
log(P(X=1,U=1)/P(X=0,U=0)) = γ_3,0 + γ_3,1X* + γ_3,2Y + γ_3,2+jC_j
logit(P(S=1)) = β₀ + β₁X* + β₂Y + β_2+jC_j
Uncontrolled Confounding, Outcome Misclassification, and Selection Bias (Option 1): logit(P(U=1)) = α₀ + α₁X + α₂Y
logit(P(Y=1)) = δ₀ + δ₁X + δ₂Y* + δ_2+jC_j
logit(P(S=1)) = β₀ + β₁X + β₂Y* + β_2+jC_j
Uncontrolled Confounding, Outcome Misclassification, and Selection Bias (Option 2): log(P(U=1,Y=0)/P(U=0,Y=0)) = γ_1,0 + γ_1,1X + γ_1,2Y* + γ_1,2+jC_j
log(P(U=0,Y=1)/P(U=0,Y=0)) = γ_2,0 + γ_2,1X + γ_2,2Y* + γ_2,2+jC_j
log(P(U=1,Y=1)/P(U=0,Y=0)) = γ_3,0 + γ_3,1X + γ_3,2Y* + γ_3,2+jC_j
logit(P(S=1)) = β₀ + β₁X + β₂Y* + β_2+jC_j

Usage

bias_params(coef_list)

Arguments

coef_list

List of coefficient values from the above options of models. Each item of the list is an equation. The left side of the equation identifies the model (i.e., "u" for the model predicting the uncontrolled confounder). For the multinomial models, specify the value here based on the numerator (i.e., "x1u0", "x0u1", "x1u1" for the three multinomial models in Uncontrolled Confounding & Exposure Misclassification, Option 2) The right side of the equation is the vector of values corresponding to the model coefficients (from left to right).

Examples

list_for_uc <- list(
  u = c(-0.19, 0.61, 0.70, -0.09, 0.10, -0.15)
)

bp_uc <- bias_params(coef_list = list_for_uc)

list_for_em_om <- list(
  x1y0 = c(-2.18, 1.63, 0.23, 0.36),
  x0y1 = c(-3.17, 0.22, 1.60, 0.40),
  x1y1 = c(-4.76, 1.82, 1.83, 0.72)
)

bp_em_om <- bias_params(coef_list = list_for_em_om)

Represent observed causal data

Description

data_observed combines the observed dataframe with specific identification of the columns corresponding to the exposure, outcome, and confounders. It is an essential input of the multibias_adjust() function.

Usage

data_observed(data, bias, exposure, outcome, confounders = NULL)

Arguments

data

Dataframe for bias analysis.

bias

String type(s) of bias distorting the effect of the exposure on the outcome. Can choose from a subset of the following: "uc", "em", "om", "sel". These correspond to uncontrolled confounding, exposure misclassification, outcome misclassification, and selection bias, respectively.

exposure

String name of the column in data corresponding to the exposure variable.

outcome

String name of the column in data corresponding to the outcome variable.

confounders

String name(s) of the column(s) in data corresponding to the confounding variable(s).

Value

An object of class data_observed containing:

data

A dataframe with the selected columns

bias

The type(s) of bias present

exposure

The name of the exposure variable

outcome

The name of the outcome variable

confounders

The name(s) of the confounder variable(s)

Examples

df <- data_observed(
  data = df_sel,
  bias = "uc",
  exposure = "X",
  outcome = "Y",
  confounders = c("C1", "C2", "C3")
)

Represent validation causal data

Description

data_validation is one of two different options to represent bias assumptions for bias adjustment. It combines the validation dataframe with specific identification of the appropriate columns for bias adjustment, including: true exposure, true outcome, confounders, misclassified exposure, misclassified outcome, and selection. The purpose of validation data is to use an external data source to transport the necessary causal relationships that are missing in the observed data.

Usage

data_validation(
  data,
  true_exposure,
  true_outcome,
  confounders = NULL,
  misclassified_exposure = NULL,
  misclassified_outcome = NULL,
  selection = NULL
)

Arguments

data

Dataframe of validation data

true_exposure

String name of the column in data corresponding to the true exposure.

true_outcome

String name of the column in data corresponding to the true outcome.

confounders

String name(s) of the column(s) in data corresponding to the confounding variable(s).

misclassified_exposure

String name of the column in data corresponding to the misclassified exposure.

misclassified_outcome

String name of the column in data corresponding to the misclassified outcome.

selection

String name of the column in data corresponding to the selection indicator.

Value

An object of class data_validation containing:

data

A dataframe with the selected columns

true_exposure

The name of the true exposure variable

true_outcome

The name of the true outcome variable

confounders

The name(s) of the confounder variable(s)

misclassified_exposure

The name of the misclassified exposure variable

misclassified_outcome

The name of the misclassified outcome variable

selection

The name of the selection indicator variable

Examples

df <- data_validation(
  data = df_sel_source,
  true_exposure = "X",
  true_outcome = "Y",
  confounders = c("C1", "C2", "C3"),
  selection = "S"
)

Simulated data with exposure misclassification

Description

Data containing one source of bias, three known confounders, and 100,000 observations. This data is obtained from df_emc_source by removing the column X. The resulting data corresponds to what a researcher would see in the real-world: a misclassified exposure, Xstar, and no data on the true exposure. As seen in df_emc_source, the true, unbiased exposure-outcome odds ratio = 2.

Usage

df_em

Format

A dataframe with 100,000 rows and 5 columns:

Xstar: misclassified exposure, 1 = present and 0 = absent
Y: outcome, 1 = present and 0 = absent
C1: 1st confounder, 1 = present and 0 = absent
C2: 2nd confounder, 1 = present and 0 = absent
C3: 3rd confounder, 1 = present and 0 = absent

Simulated data with exposure misclassification and outcome misclassification

Description

Data containing two sources of bias, three known confounders, and 100,000 observations. This data is obtained from df_emc_omc_source by removing the columns X and Y. The resulting data corresponds to what a researcher would see in the real-world: a misclassified exposure, Xstar, and a misclassified outcome, Ystar. As seen in df_em_om_source, the true, unbiased exposure-outcome odds ratio = 2.

Usage

df_em_om

Format

A dataframe with 100,000 rows and 5 columns:

Xstar: misclassified exposure, 1 = present and 0 = absent
Ystar: misclassified outcome, 1 = present and 0 = absent
C1: 1st confounder, 1 = present and 0 = absent
C2: 2nd confounder, 1 = present and 0 = absent
C3: 3rd confounder, 1 = present and 0 = absent

Data source for `df_em_om`

Description

Data with complete information on the two sources of bias, three known confounders, and 100,000 observations. This data is used to derive df_em_om and can be used to obtain bias parameters for purposes of validating the simultaneous multi-bias adjustment method with df_em_om. With this source data, the fitted regression logit(P(Y=1)) = α₀ + α₁X + α₂C1 + α₃C2 + α₄C3 shows that the true, unbiased exposure-outcome odds ratio = 2.

Usage

df_em_om_source

Format

A dataframe with 100,000 rows and 7 columns:

X: true exposure, 1 = present and 0 = absent
Y: outcome, 1 = present and 0 = absent
C1: 1st confounder, 1 = present and 0 = absent
C2: 2nd confounder, 1 = present and 0 = absent
C3: 3rd confounder, 1 = present and 0 = absent
Xstar: misclassified exposure, 1 = present and 0 = absent
Ystar: misclassified outcome, 1 = present and 0 = absent

Simulated data with exposure misclassification and selection bias

Description

Data containing two sources of bias, three known confounders, and 100,000 observations. This data is obtained by sampling with replacement with probability = S from df_em_sel_source then removing the columns X and S. The resulting data corresponds to what a researcher would see in the real-world: a misclassified exposure, Xstar, and missing data for those not selected into the study (S=0). As seen in df_em_sel_source, the true, unbiased exposure-outcome odds ratio = 2.

Usage

df_em_sel

Format

A dataframe with 100,000 rows and 5 columns:

Xstar: misclassified exposure, 1 = present and 0 = absent
Y: outcome, 1 = present and 0 = absent
C1: 1st confounder, 1 = present and 0 = absent
C2: 2nd confounder, 1 = present and 0 = absent
C3: 3rd confounder, 1 = present and 0 = absent

Data source for `df_em_sel`

Description

Data with complete information on the two sources of bias, three known confounders, and 100,000 observations. This data is used to derive df_em_sel and can be used to obtain bias parameters for purposes of validating the simultaneous multi-bias adjustment method with df_em_sel. With this source data, the fitted regression logit(P(Y=1)) = α₀ + α₁X + α₂C1 + α₃C2 + α₄C3 shows that the true, unbiased exposure-outcome odds ratio = 2.

Usage

df_em_sel_source

Format

A dataframe with 100,000 rows and 7 columns:

X: true exposure, 1 = present and 0 = absent
Y: outcome, 1 = present and 0 = absent
C1: 1st confounder, 1 = present and 0 = absent
C2: 2nd confounder, 1 = present and 0 = absent
C3: 3rd confounder, 1 = present and 0 = absent
Xstar: misclassified exposure, 1 = present and 0 = absent
S: selection, 1 = selected into the study and 0 = not selected into the study

Data source for `df_em`

Description

Data with complete information on one sources of bias, three known confounders, and 100,000 observations. This data is used to derive df_em and can be used to obtain bias parameters for purposes of validating the simultaneous multi-bias adjustment method with df_em. With this source data, the fitted regression logit(P(Y=1)) = α₀ + α₁X + α₂C1 + α₃C2 + α₄C3 shows that the true, unbiased exposure-outcome odds ratio = 2.

Usage

df_em_source

Format

A dataframe with 100,000 rows and 6 columns:

X: exposure, 1 = present and 0 = absent
Y: true outcome, 1 = present and 0 = absent
C1: 1st confounder, 1 = present and 0 = absent
C2: 2nd confounder, 1 = present and 0 = absent
C3: 3rd confounder, 1 = present and 0 = absent
Xstar: misclassified exposure, 1 = present and 0 = absent

Simulated data with outcome misclassification

Description

Data containing one source of bias, three known confounders, and 100,000 observations. This data is obtained from df_om_source by removing the column Y. The resulting data corresponds to what a researcher would see in the real-world: a misclassified outcome, Ystar, and no data on the true outcome. As seen in df_om_source, the true, unbiased exposure-outcome odds ratio = 2.

Usage

df_om

Format

A dataframe with 100,000 rows and 5 columns:

X: exposure, 1 = present and 0 = absent
Ystar: misclassified outcome, 1 = present and 0 = absent
C1: 1st confounder, 1 = present and 0 = absent
C2: 2nd confounder, 1 = present and 0 = absent
C3: 3rd confounder, 1 = present and 0 = absent

Simulated data with outcome misclassification and selection bias

Description

Data containing two sources of bias, a known confounder, and 100,000 observations. This data is obtained by sampling with replacement with probability = S from df_om_sel_source then removing the columns Y and S. The resulting data corresponds to what a researcher would see in the real-world: a misclassified outcome, Ystar, and missing data for those not selected into the study (S=0). As seen in df_om_sel_source, the true, unbiased exposure-outcome odds ratio = 2.

Usage

df_om_sel

Format

A dataframe with 100,000 rows and 5 columns:

X: exposure, 1 = present and 0 = absent
Ystar: misclassified outcome, 1 = present and 0 = absent
C1: 1st confounder, 1 = present and 0 = absent
C2: 2nd confounder, 1 = present and 0 = absent
C3: 3rd confounder, 1 = present and 0 = absent

Data source for `df_om_sel`

Description

Data with complete information on the two sources of bias, a known confounder, and 100,000 observations. This data is used to derive df_om_sel and can be used to obtain bias parameters for purposes of validating the simultaneous multi-bias adjustment method with df_om_sel. With this source data, the fitted regression logit(P(Y=1)) = α₀ + α₁X + α₂C1 + α₃C2 + α₄C3 shows that the true, unbiased exposure-outcome odds ratio = 2.

Usage

df_om_sel_source

Format

A dataframe with 100,000 rows and 7 columns:

X: exposure, 1 = present and 0 = absent
Y: true outcome, 1 = present and 0 = absent
C1: 1st confounder, 1 = present and 0 = absent
C2: 2nd confounder, 1 = present and 0 = absent
C3: 3rd confounder, 1 = present and 0 = absent
Ystar: misclassified outcome, 1 = present and 0 = absent
S: selection, 1 = selected into the study and 0 = not selected into the study

Data source for `df_om`

Description

Data with complete information on one sources of bias, three known confounders, and 100,000 observations. This data is used to derive df_om and can be used to obtain bias parameters for purposes of validating the simultaneous multi-bias adjustment method with df_om. With this source data, the fitted regression logit(P(Y=1)) = α₀ + α₁X + α₂C1 + α₃C2 + α₄C3 shows that the true, unbiased exposure-outcome odds ratio = 2.

Usage

df_om_source

Format

A dataframe with 100,000 rows and 6 columns:

X: exposure, 1 = present and 0 = absent
Y: true outcome, 1 = present and 0 = absent
C1: 1st confounder, 1 = present and 0 = absent
C2: 2nd confounder, 1 = present and 0 = absent
C3: 3rd confounder, 1 = present and 0 = absent
Ystar: misclassified outcome, 1 = present and 0 = absent

Simulated data with selection bias

Description

Data containing one source of bias, three known confounders, and 100,000 observations. This data is obtained by sampling with replacement with probability = S from df_sel_source then removing the S column. The resulting data corresponds to what a researcher would see in the real-world: missing data for those not selected into the study (S=0). As seen in df_sel_source, the true, unbiased exposure-outcome odds ratio = 2.

Usage

df_sel

Format

A dataframe with 100,000 rows and 5 columns:

X: exposure, 1 = present and 0 = absent
Y: outcome, 1 = present and 0 = absent
C1: 1st confounder, 1 = present and 0 = absent
C2: 2nd confounder, 1 = present and 0 = absent
C3: 3rd confounder, 1 = present and 0 = absent

Data source for `df_sel`

Description

Data with complete information on study selection, three known confounders, and 100,000 observations. This data is used to derive df_sel and can be used to obtain bias parameters for purposes of validating the simultaneous multi-bias adjustment method with df_sel. With this source data, the fitted regression logit(P(Y=1)) = α₀ + α₁X + α₂C1 + α₃C2 + α₄C3 shows that the true, unbiased exposure-outcome odds ratio = 2.

Usage

df_sel_source

Format

A dataframe with 100,000 rows and 6 columns:

X: true exposure, 1 = present and 0 = absent
Y: outcome, 1 = present and 0 = absent
C1: 1st confounder, 1 = present and 0 = absent
C2: 2nd confounder, 1 = present and 0 = absent
C3: 3rd confounder, 1 = present and 0 = absent
S: selection, 1 = selected into the study and 0 = not selected into the study

Simulated data with uncontrolled confounding

Description

Data containing one source of bias, three known confounders, and 100,000 observations. This data is obtained from df_uc_source by removing the column U. The resulting data corresponds to what a researcher would see in the real-world: information on known confounders (C1, C2, and C3), but not for confounder U. As seen in df_uc_source, the true, unbiased exposure-outcome effect estimate = 2.

Usage

df_uc

Format

A dataframe with 100,000 rows and 7 columns:

X_bi: binary exposure, 1 = present and 0 = absent
X_cont: continuous exposure
Y_bi: binary outcome corresponding to exposure X_bi, 1 = present and 0 = absent
Y_cont: continuous outcome corresponding to exposure X_cont
C1: 1st confounder, 1 = present and 0 = absent
C2: 2nd confounder, 1 = present and 0 = absent
C3: 3rd confounder, 1 = present and 0 = absent

Simulated data with uncontrolled confounding and exposure misclassification

Description

Data containing two sources of bias, three known confounders, and 100,000 observations. This data is obtained from df_uc_em_source by removing the columns X and U. The resulting data corresponds to what a researcher would see in the real-world: a misclassified exposure, Xstar, and missing data on a confounder U. As seen in df_uc_em_source, the true, unbiased exposure-outcome odds ratio = 2.

Usage

df_uc_em

Format

A dataframe with 100,000 rows and 5 columns:

Xstar: misclassified exposure, 1 = present and 0 = absent
Y: outcome, 1 = present and 0 = absent
C1: 1st confounder, 1 = present and 0 = absent
C2: 2nd confounder, 1 = present and 0 = absent
C3: 3rd confounder, 1 = present and 0 = absent

Simulated data with uncontrolled confounding, exposure misclassification, and selection bias

Description

Data containing three sources of bias, three known confounders, and 100,000 observations. This data is obtained by sampling with replacement with probability = S from df_uc_em_sel_source then removing the columns X, U, and S. The resulting data corresponds to what a researcher would see in the real-world: a misclassified exposure, Xstar; missing data on a confounder U; and missing data for those not selected into the study (S=0). As seen in df_uc_em_sel_source, the true, unbiased exposure-outcome odds ratio = 2.

Usage

df_uc_em_sel

Format

A dataframe with 100,000 rows and 5 columns:

Xstar: misclassified exposure, 1 = present and 0 = absent
Y: outcome, 1 = present and 0 = absent
C1: 1st confounder, 1 = present and 0 = absent
C2: 2nd confounder, 1 = present and 0 = absent
C3: 3rd confounder, 1 = present and 0 = absent

Data source for `df_uc_em_sel`

Description

Data with complete information on the three sources of bias, three known confounders, and 100,000 observations. This data is used to derive df_uc_em_sel and can be used to obtain bias parameters for purposes of validating the simultaneous multi-bias adjustment method with df_uc_em_sel. With this source data, the fitted regression logit(P(Y=1)) = α₀ + α₁X + α₂C1 + α₃C2 + α₄C3 + α₅U shows that the true, unbiased exposure-outcome odds ratio = 2.

Usage

df_uc_em_sel_source

Format

A dataframe with 100,000 rows and 8 columns:

X: true exposure, 1 = present and 0 = absent
Y: outcome, 1 = present and 0 = absent
C1: 1st confounder, 1 = present and 0 = absent
C2: 2nd confounder, 1 = present and 0 = absent
C3: 3rd confounder, 1 = present and 0 = absent
U: unmeasured confounder, 1 = present and 0 = absent
Xstar: misclassified exposure, 1 = present and 0 = absent
S: selection, 1 = selected into the study and 0 = not selected into the study

Data source for `df_uc_em`

Description

Data with complete information on the two sources of bias, a known confounder, and 100,000 observations. This data is used to derive df_uc_em and can be used to obtain bias parameters for purposes of validating the simultaneous multi-bias adjustment method with df_uc_em. With this source data, the fitted regression logit(P(Y=1)) = α₀ + α₁X + α₂C1 + α₃U shows that the true, unbiased exposure-outcome odds ratio = 2.

Usage

df_uc_em_source

Format

A dataframe with 100,000 rows and 7 columns:

X: true exposure, 1 = present and 0 = absent
Y: outcome, 1 = present and 0 = absent
C1: 1st confounder, 1 = present and 0 = absent
C2: 2nd confounder, 1 = present and 0 = absent
C3: 3rd confounder, 1 = present and 0 = absent
U: unmeasured confounder, 1 = present and 0 = absent
Xstar: misclassified exposure, 1 = present and 0 = absent

Simulated data with uncontrolled confounding and outcome misclassification

Description

Data containing two sources of bias, three known confounders, and 100,000 observations. This data is obtained from df_uc_om_source by removing the columns Y and U. The resulting data corresponds to what a researcher would see in the real-world: a misclassified outcome, Ystar, and missing data on the binary confounder U. As seen in df_uc_omc_source, the true, unbiased exposure-outcome odds ratio = 2.

Usage

df_uc_om

Format

A dataframe with 100,000 rows and 5 columns:

X: exposure, 1 = present and 0 = absent
Ystar: misclassified outcome, 1 = present and 0 = absent
C1: 1st confounder, 1 = present and 0 = absent
C2: 2nd confounder, 1 = present and 0 = absent
C3: 3rd confounder, 1 = present and 0 = absent

Simulated data with uncontrolled confounding, outcome misclassification, and selection bias

Description

Data containing three sources of bias, three known confounders, and 100,000 observations. This data is obtained by sampling with replacement with probability = S from df_uc_om_sel_source then removing the columns Y, U, and S. The resulting data corresponds to what a researcher would see in the real-world: a misclassified outcome, Ystar; missing data on a confounder U; and missing data for those not selected into the study (S=0). As seen in df_uc_om_sel_source, the true, unbiased exposure-outcome odds ratio = 2.

Usage

df_uc_om_sel

Format

A dataframe with 100,000 rows and 5 columns:

X: exposure, 1 = present and 0 = absent
Ystar: misclassified outcome, 1 = present and 0 = absent
C1: 1st confounder, 1 = present and 0 = absent
C2: 2nd confounder, 1 = present and 0 = absent
C3: 3rd confounder, 1 = present and 0 = absent

Data source for `df_uc_om_sel`

Description

Data with complete information on the three sources of bias, three known confounders, and 100,000 observations. This data is used to derive df_uc_om_sel and can be used to obtain bias parameters for purposes of validating the simultaneous multi-bias adjustment method with df_uc_om_sel. With this source data, the fitted regression logit(P(Y=1)) = α₀ + α₁X + α₂C1 + α₃C2 + α₄C3 + α₅U shows that the true, unbiased exposure-outcome odds ratio = 2.

Usage

df_uc_om_sel_source

Format

A dataframe with 100,000 rows and 8 columns:

X: exposure, 1 = present and 0 = absent
Y: true outcome, 1 = present and 0 = absent
C1: 1st confounder, 1 = present and 0 = absent
C2: 2nd confounder, 1 = present and 0 = absent
C3: 3rd confounder, 1 = present and 0 = absent
U: unmeasured confounder, 1 = present and 0 = absent
Ystar: misclassified outcome, 1 = present and 0 = absent
S: selection, 1 = selected into the study and 0 = not selected into the study

Data source for `df_uc_om`

Description

Data with complete information on the two sources of bias, three known confounders, and 100,000 observations. This data is used to derive df_uc_om and can be used to obtain bias parameters for purposes of validating the simultaneous multi-bias adjustment method with df_uc_om. With this source data, the fitted regression logit(P(Y=1)) = α₀ + α₁X + α₂C1 + α₃U shows that the true, unbiased exposure-outcome odds ratio = 2.

Usage

df_uc_om_source

Format

A dataframe with 100,000 rows and 7 columns:

X: exposure, 1 = present and 0 = absent
Y: outcome, 1 = present and 0 = absent
C1: 1st confounder, 1 = present and 0 = absent
C2: 2nd confounder, 1 = present and 0 = absent
C3: 3rd confounder, 1 = present and 0 = absent
U: unmeasured confounder, 1 = present and 0 = absent
Ystar: misclassified outcome, 1 = present and 0 = absent

Simulated data with uncontrolled confounding and selection bias

Description

Data containing two sources of bias, three known confounders, and 100,000 observations. This data is obtained by sampling with replacement with probability = S from df_uc_sel_source then removing the columns U and S. The resulting data corresponds to what a researcher would see in the real-world: missing data on confounder U; and missing data for those not selected into the study (S=0). As seen in df_uc_sel_source, the true, unbiased exposure-outcome odds ratio = 2.

Usage

df_uc_sel

Format

A dataframe with 100,000 rows and 5 columns:

X: exposure, 1 = present and 0 = absent
Y: outcome, 1 = present and 0 = absent
C1: 1st confounder, 1 = present and 0 = absent
C2: 2nd confounder, 1 = present and 0 = absent
C3: 3rd confounder, 1 = present and 0 = absent

Data source for `df_uc_sel`

Description

Data with complete information on the two sources of bias, a known confounder, and 100,000 observations. This data is used to derive df_uc_sel and can be used to obtain bias parameters for purposes of validating the simultaneous multi-bias adjustment method with df_uc_sel. With this source data, the fitted regression logit(P(Y=1)) = α₀ + α₁X + α₂C1 + α₃C2 + α₄C3 + α₅U shows that the true, unbiased exposure-outcome odds ratio = 2.

Usage

df_uc_sel_source

Format

A dataframe with 100,000 rows and 7 columns:

X: true exposure, 1 = present and 0 = absent
Y: outcome, 1 = present and 0 = absent
C1: 1st confounder, 1 = present and 0 = absent
C2: 2nd confounder, 1 = present and 0 = absent
C3: 3rd confounder, 1 = present and 0 = absent
U: unmeasured confounder, 1 = present and 0 = absent
S: selection, 1 = selected into the study and 0 = not selected into the study

Data source for `df_uc`

Description

Data with complete information on one source of bias, three known confounders, and 100,000 observations. This data is used to derive df_uc and can be used to obtain bias parameters for purposes of validating the simultaneous multi-bias adjustment method with df_uc. With this source data, the fitted regression g(P(Y=1)) = α₀ + α₁X + α₂C1 + α₃C2 + α₄C3 + α₅U shows that the true, unbiased exposure-outcome effect estimate = 2 when:

g = logit, Y = Y_bi, and X = X_bi or
g = identity, Y = Y_cont, X = X_cont.

Usage

df_uc_source

Format

A dataframe with 100,000 rows and 8 columns:

X_bi: binary exposure, 1 = present and 0 = absent
X_cont: continuous exposure
Y_bi: binary outcome corresponding to exposure X_bi, 1 = present and 0 = absent
Y_cont: continuous outcome corresponding to exposure X_cont
C1: 1st confounder, 1 = present and 0 = absent
C2: 2nd confounder, 1 = present and 0 = absent
C3: 3rd confounder, 1 = present and 0 = absent
U: uncontrolled confounder, 1 = present and 0 = absent

Simultaneously adjust for multiple biases

Description

multibias_adjust returns the exposure-outcome odds ratio and confidence interval, adjusted for one or more biases.

Usage

multibias_adjust(
  data_observed,
  data_validation = NULL,
  bias_params = NULL,
  bootstrap = FALSE,
  bootstrap_reps = 100,
  level = 0.95
)

Arguments

data_observed

Object of class data_observed corresponding to the data to perform bias analysis on.

data_validation

Object of class data_validation corresponding to the validation data used to adjust for bias in the observed data. The validation data should have data for the same variables as in data_observed, plus data for the missing variables leading to bias.

bias_params

Object of class 'bias_params' corresponding to the bias parameters used to adjust for bias in the observed data. There must be parameters corresponding to the bias or biases specified in data_observed.

bootstrap

Boolean for whether to perform bootstrapping to obtain the estimate and confidence interval.

bootstrap_reps

Integer number of bootstrap samples to run in bootstrapping.

level

Value from 0-1 representing the full range of the confidence interval. Default is 0.95.

Details

Bias adjustment can be performed by inputting either a validation dataset or the necessary bias parameters. Values for the bias parameters can be applied as fixed values or as single draws from a probability distribution (ex: rnorm(1, mean = 2, sd = 1)). The latter has the advantage of allowing the researcher to capture the uncertainty in the bias parameter estimates. To incorporate this uncertainty in the estimate and confidence interval, this function should be run in loop across bootstrap samples of the dataframe for analysis. The estimate and confidence interval would then be obtained from the median and quantiles of the distribution of odds ratio estimates.

Value

A list including: the bias-adjusted effect estimate of the exposure on the outcome, the standard error, and the confidence interval as the vector: (lower bound, upper bound).

Examples

# Adjust for exposure misclassification -------------------------------------
df_observed <- data_observed(
  data = df_em,
  bias = "em",
  exposure = "Xstar",
  outcome = "Y",
  confounders = "C1"
)

# Using validation data
df_validation <- data_validation(
  data = df_em_source,
  true_exposure = "X",
  true_outcome = "Y",
  confounders = "C1",
  misclassified_exposure = "Xstar"
)

multibias_adjust(
  data_observed = df_observed,
  data_validation = df_validation
)

# Using bias_params
bp <- bias_params(coef_list = list(x = c(-2.10, 1.62, 0.63, 0.35)))

multibias_adjust(
  data_observed = df_observed,
  bias_params = bp
)

# Adjust for three biases ---------------------------------------------------
df_observed <- data_observed(
  data = df_uc_om_sel,
  bias = c("uc", "om", "sel"),
  exposure = "X",
  outcome = "Ystar",
  confounders = c("C1", "C2", "C3")
)

# Using validation data
df_validation <- data_validation(
  data = df_uc_om_sel_source,
  true_exposure = "X",
  true_outcome = "Y",
  confounders = c("C1", "C2", "C3", "U"),
  misclassified_outcome = "Ystar",
  selection = "S"
)

multibias_adjust(
  data_observed = df_observed,
  data_validation = df_validation
)

# Using bias_params
bp1 <- bias_params(
  coef_list = list(
    u = c(-0.32, 0.59, 0.69),
    y = c(-2.85, 0.71, 1.63, 0.40, -0.85, 0.22),
    s = c(0.00, 0.74, 0.19, 0.02, -0.06, 0.02)
  )
)

multibias_adjust(
  data_observed = df_observed,
  bias_params = bp1
)

bp2 <- bias_params(
  coef_list = list(
    u1y0 = c(-0.20, 0.62, 0.01, -0.08, 0.10, -0.15),
    u0y1 = c(-3.28, 0.63, 1.65, 0.42, -0.85, 0.26),
    u1y1 = c(-2.70, 1.22, 1.64, 0.32, -0.77, 0.09),
    s = c(0.00, 0.74, 0.19, 0.02, -0.06, 0.02)
  )
)

# with bootstrapping
## Not run: 
multibias_adjust(
  data_observed = df_observed,
  bias_params = bp2,
  bootstrap = TRUE,
  bootstrap_reps = 1000
)

## End(Not run)

Create a Forest Plot comparing observed and adjusted effect estimates

Description

This function generates a forest plot comparing the observed effect estimate with adjusted effect estimates from sensitivity analyses. The plot includes point estimates and confidence intervals for each analysis.

Usage

multibias_plot(data_observed, multibias_result_list, log_scale = FALSE)

Arguments

data_observed

Object of class data_observed representing the observed causal data and effect of interest.

multibias_result_list

A named list of sensitivity analysis results. Each element should be a result from multibias_adjust().

log_scale

Boolean indicating whether to display the x-axis on the log scale. Default is FALSE.

Value

A ggplot object showing a forest plot with:

Point estimates (blue dots)
Confidence intervals (gray horizontal lines)
A vertical reference line at x=1 (dashed)
Appropriate labels and title

Examples

## Not run: 
df_observed <- data_observed(
  data = df_em,
  bias = "em",
  exposure = "Xstar",
  outcome = "Y",
  confounders = "C1"
)

bp1 <- bias_params(coef_list = list(x = c(-2.10, 1.62, 0.63, 0.35)))
bp2 <- bias_params(coef_list = list(x = c(-2.10 * 2, 1.62 * 2, 0.63 * 2, 0.35 * 2)))

result1 <- multibias_adjust(
  data_observed = df_observed,
  bias_params = bp1
)
result2 <- multibias_adjust(
  data_observed = df_observed,
  bias_params = bp2
)

multibias_plot(
  data_observed = df_observed,
  multibias_result_list = list(
    "Adjusted with bias params" = result1,
    "Adjusted with bias params doubled" = result2
  )
)

## End(Not run)

Print method for data_observed objects

Description

Prints a formatted summary of a data_observed object, including:

The types of biases present
The exposure, outcome, and confounder variables
A preview of the first 5 rows of data

Usage

## S3 method for class 'data_observed'
print(x, ...)

Arguments

x

A data_observed object

...

Additional arguments passed to print

Value

The input object invisibly, allowing for method chaining

Print method for data_validation objects

Description

Prints a formatted summary of a data_validation object, including:

The true exposure and outcome variables
Any confounders, misclassified variables, or selection indicators
A preview of the first 5 rows of data

Usage

## S3 method for class 'data_validation'
print(x, ...)

Arguments

x

A data_validation object

...

Additional arguments passed to print

Value

The input object invisibly

Summary method for data_observed objects

Description

Provides a statistical summary of the observed data by fitting either:

A logistic regression model for binary outcomes
A linear regression model for continuous outcomes

The model includes the exposure and all confounders as predictors. For binary outcomes, estimates are exponentiated to show odds ratios.

Usage

## S3 method for class 'data_observed'
summary(object, ...)

Arguments

object

A data_observed object

...

Additional arguments passed to summary

Value

A data frame containing model coefficients, standard errors, confidence intervals, and p-values. For binary outcomes, coefficients are exponentiated to show odds ratios.

multibias: Multiple Bias Analysis in Causal Inference

Description

Author(s)

See Also

Represent bias parameters

Description

Usage

Arguments

Examples

Represent observed causal data

Description

Usage

Arguments

Value

Examples

Represent validation causal data

Description

Usage

Arguments

Value

Examples

Simulated data with exposure misclassification

Description

Usage

Format

Simulated data with exposure misclassification and outcome misclassification

Description

Usage

Format

Data source for df_em_om

Description

Usage

Format

Simulated data with exposure misclassification and selection bias

Description

Usage

Format

Data source for df_em_sel

Description

Usage

Format

Data source for df_em

Description

Usage

Format

Simulated data with outcome misclassification

Description

Usage

Format

Simulated data with outcome misclassification and selection bias

Description

Usage

Format

Data source for df_om_sel

Description

Usage

Format

Data source for df_om

Description

Usage

Format

Simulated data with selection bias

Description

Usage

Format

Data source for df_sel

Description

Usage

Format

Simulated data with uncontrolled confounding

Description

Usage

Format

Simulated data with uncontrolled confounding and exposure misclassification

Description

Usage

Format

Simulated data with uncontrolled confounding, exposure misclassification, and selection bias

Description

Usage

Data source for `df_em_om`

Data source for `df_em_sel`

Data source for `df_em`

Data source for `df_om_sel`

Data source for `df_om`

Data source for `df_sel`

Data source for `df_uc_em_sel`

Data source for `df_uc_em`

Data source for `df_uc_om_sel`

Data source for `df_uc_om`

Data source for `df_uc_sel`

Data source for `df_uc`