Type: Package
Title: Bayesian Mediation Analysis Using BART
Version: 2.0
Date: 2025-06-26
Depends: R (≥ 2.14.1), BART, survival, gplots
Imports: lattice, methods
Suggests: knitr, rmarkdown
VignetteBuilder: knitr
Encoding: UTF-8
Description: Used for Bayesian mediation analysis based on Bayesian additive Regression Trees (BART). The analysis method is described in Yu and Li (2025) "Mediation Analysis with Bayesian Additive Regression Trees", submitted for publication.
License: GPL-2 | GPL-3 [expanded from: GPL (≥ 2)]
URL: https://www.r-project.org, https://publichealth.lsuhsc.edu/Faculty_pages/qyu/index.html
RoxygenNote: 7.3.2
NeedsCompilation: no
Packaged: 2025-06-26 18:28:44 UTC; qyu
Author: Qingzhao Yu [aut, cre], Bin Li [aut]
Maintainer: Qingzhao Yu <qyu@lsuhsc.edu>
Repository: CRAN
Date/Publication: 2025-06-26 18:50:02 UTC

Bayesian Mediation Analysis Using Bayesian Additive Regression Trees

Description

Used for Bayesian mediation analysis based on Bayesian additive Regression Trees (BART). The analysis method is described in Yu and Li (2025) "Mediation Analysis with Bayesian Additive Regression Trees", submitted for publication.

Details

Build BARTs using the R package and perform the Bayesian Mediation Analysis.

Author(s)

Qingzhao Yu and Bin Li

Maintainer: Qingzhao Yu <qyu@lsuhsc.edu>

References

Yu, Q., and Li, B. (2025) <doi:>. "Mediation Analysis with Bayesian Additive Regression Trees," submitted.


Bayesian Mediation Analysis Using Bayesian Additive Regression Trees

Description

Build BARTs using the R package and perform the Bayesian Mediation Analysis.

Usage

bma.bart(pred, m, y, refy = rep(NA, ncol(data.frame(y))), 
         predref = rep(NA, ncol(data.frame(pred))), deltap = NA, 
         deltam = NA, mref = rep(NA, ncol(data.frame(m))), cova = NULL, 
         cova.ref = list(), mcov = NULL, mcov.ref = list(), mclist = NULL, 
         complete = FALSE, ntree = 200L, numcut = 100L, ndpost = 1000L, 
         nskip = 100L, keepevery = 1L, nkeeptrain = ndpost, nkeeptest = ndpost,
         nkeeptestmean = ndpost, nkeeptreedraws = ndpost, printevery = 100L, 
         seed = sample(1:1e+06, 1))

Arguments

pred

The vector/matrix of the exposure/predict variable(s).

m

The dataframe of all potential mediators

y

The vector/matrix of the outcome(s).

refy

The reference groups of y when the corresponding outcome is binary or categorical.

predref

The reference groups of pred when the corresponding outcome is binary or categorical.

deltap

A vector of the length of the number of exposures. The difference in pred when calculate the changing rate by pred. If not set, the difference is 1 for categorical predictor and one tenth of the standard deviaiton of the predictor if continuous.

deltam

A vector of the length of the number of mediators. The ith item is the difference in the ith mediator when calculate the changing rate by each mediator. If not set, the difference is 1 for categorical mediators and one tenth of the standard deviaiton of the mediator if continuous.

mref

The reference groups of mediators when the corresponding mediator is binary or categorical.

cova

The covariance data for y.

cova.ref

The reference group for the binary or categorical covariates in cova.

mcov

The covariance data for mediators.

mcov.ref

The reference group if the mcovs are categorical or binary.

mclist

If mclist is null but not mcov, mcov is applied to all mediators. If both mcov and mclist are not NULL, the first item of mclist lists all mediators that are using different mcov, the following items gives the mcov for the mediators in order, NA if no mcov to be used. e.g. mclist=list(c(1,2,4),l1=1,l2=NA,l4=c(1,3)), mediator 1, m[,1], use mcov[,1], 2 uses no covariates, 4 uses mcov[,c(1,3)], all other mediators use all. Can also replace variable names with column numbers in the mclist.

complete

complete=TRUE if only completed cases are used in analysis.

ntree

As in the BART package, the number of trees in the sum.

numcut

See the BART package. The number of possible values of c (see usequants). If a single number if given, this is used for all variables. Otherwise a vector with length equal to ncol(x.train) is required, where the ith element gives the number of c used for the ith variable in x.train. If usequants is false, numcut equally spaced cutoffs are used covering the range of values in the corresponding column of x.train. If usequants is true, then min(numcut, the number of unique values in the corresponding columns of x.train - 1) c values are used.

ndpost

As in the BART package, the number of posterior draws returned.

nskip

As in the BART package, number of MCMC iterations to be treated as burn in.

keepevery

As in the BART package, every keepevery draw is kept to be returned to the user.

nkeeptrain

As in the BART package, number of MCMC iterations to be returned for train data.

nkeeptest

As in the BART package, number of MCMC iterations to be returned for test data.

nkeeptestmean

As in the BART package, number of MCMC iterations to be returned for test mean.

nkeeptreedraws

As in the BART package, number of MCMC iterations to be returned for tree draws.

printevery

As in the BART package, as the MCMC runs, a message is printed every printevery draws.

seed

A seed number to keep the results repeatable.

Details

Please refer to the reference for the details of model fitting and inferences of mediation effects.

Value

aieX

posterior samples of average indirect effects using method X. method 2 to show the results from the partial differences, method 3 to show the results from the G-computation, and method 4 for G-computation with non-parametric method (binary exposures only).

adeX

posterior samples of average direct effects using method X.

ateX

posterior samples of average total effects using method X.

ieX, deX, teX

posterior samples of indirect effects, direct effects and total effects using method X.

apart.ie

posterior samples of the a-part:changing rate of mediators with pred, using method 2.

bpart.ie

posterior samples of the b-part:changing rate of outcomes with mediators, using method 2.

data0

the output from data_org.

y.type

the type of outcomes.

y.model

the BART model of outcomes.

m.models

the BART model of each mediator.

DIC

the estimated DIC,deviances, D_bar, Var_D, and p_D.

Note

data_org is run automatically in bma.bart. No need to run it separately.

Author(s)

Qingzhao Yu and Bin Li

References

Yu, Q., and Li, B. (2025) <doi:>. "Mediation Analysis with Bayesian Additive Regression Trees," submitted.

Examples

data(weight_behavior)
#nubmer of mcmc iterations are set to 3 to reduce time. Need to bring it up to reasonable times.
#binary predictor
try0= bma.bart(pred=weight_behavior[,3], m=weight_behavior[,c(2,4:14)], 
               y=weight_behavior[,15], refy = 0, predref = "F",nskip=0,ndpost=2)
summary(try0)

#add covariate for mediators
try1= bma.bart(pred=weight_behavior[,3], m=weight_behavior[,c(2,4:13)], 
               mcov=weight_behavior[,14], mclist=append(list(var=1:10),rep(NA,10)), 
               #"sweater" is used as a cov for "excercises" only
               y=weight_behavior[,15], refy = 0, predref = "F",nskip=0,ndpost=2)  
summary(try1)
summary(try1,trim=0)
#multiple prdictor
try2= bma.bart(pred=weight_behavior[,4], m=weight_behavior[,c(2:3,5:14)], 
               y=weight_behavior[,15], refy = 0, predref = "OTHER",nskip=0,ndpost=2)
summary(try2)
try3= bma.bart(pred=weight_behavior[,c(1,4)], m=weight_behavior[,c(2:3,5:14)], 
               y=weight_behavior[,15], refy = 0, predref = "OTHER",nskip=0,ndpost=2)
summary(try3)
#continuous y
try4= bma.bart(pred=weight_behavior[,4], m=weight_behavior[,c(2:3,5)], 
               y=weight_behavior[,1], refy = 0, predref = "OTHER",nskip=0,ndpost=2)
summary(try4)
#categorical y
try5= bma.bart(pred=weight_behavior[,1], m=weight_behavior[,c(2:3,5)], 
               y=weight_behavior[,4], refy = "",nskip=0,ndpost=2)
summary(try5)
#add covariates for y and for mediators
try6= bma.bart(pred=weight_behavior[,4], m=weight_behavior[,c(5:12)], 
               cova=weight_behavior[,2:3], mcov=weight_behavior[,13:14], 
               mclist=c(list(var=1:7),rep(NA,6),list(1)),
               y=weight_behavior[,1], refy = 0, predref = "OTHER",nskip=0,ndpost=2)
#cova and mcov needs to be binarized and numerized
summary(try6)

##Surv class outcome (survival analysis)
data(cgd1)       #a dataset in the survival package
x=cgd1[,c(4:5,7:12)]
pred=cgd1[,6]
status<-ifelse(is.na(cgd1$etime1),0,1)
y=Surv(cgd1$futime,status)          
#for continuous predictor
try7= bma.bart(pred=pred,m=x,y=y,nskip=0,ndpost=3)
#summary(try7)


cgd1 Data Set

Description

This database was obtained from the survival package containing a time-to-event data.

Usage

data(weight_behavior)

Format

The data set contains many variables.

Examples

data(cgd1)
names(cgd1)

Prepare Variables for Bayesian Mediation Analysis with BART

Description

Read in exposure, mediators, outcome, and covariates, and transform them into formats fit for BART fitting.

Usage

data_org(pred, m, y, refy = rep(NA, ncol(data.frame(y))), 
         predref = rep(NA, ncol(data.frame(pred))), deltap = NA, 
         deltam = NA, mref = rep(NA, ncol(data.frame(m))), cova = NULL, 
         cova.ref = list(), mcov = NULL, mcov.ref = list(), mclist = NULL, 
         complete = FALSE)

Arguments

pred

The vector/matrix of the exposure/predict variable(s).

m

The dataframe of all potential mediators

y

The vector/matrix of the outcome(s).

refy

The reference groups of y when the corresponding outcome is binary or categorical.

predref

The reference groups of pred when the corresponding outcome is binary or categorical.

deltap

A vector of the length of the number of exposures. The difference in pred when calculate the changing rate by pred. If not set, the difference is 1 for categorical predictor and one tenth of the standard deviaiton of the predictor if continuous.

deltam

A vector of the length of the number of mediators. The ith item is the difference in the ith mediator when calculate the changing rate by each mediator. If not set, the difference is 1 for categorical mediators and one tenth of the standard deviaiton of the mediator if continuous.

mref

The reference groups of mediators when the corresponding mediator is binary or categorical.

cova

The covariance data for y.

cova.ref

The reference group for the binary or categorical covariates in cova.

mcov

The covariance data for mediators.

mcov.ref

The reference group if the mcovs are categorical or binary.

mclist

If mclist is null but not mcov, mcov is applied to all mediators. If both mcov and mclist are not NULL, the first item of mclist lists all mediators that are using different mcov, the following items gives the mcov for the mediators in order, NA if no mcov to be used. e.g. mclist=list(c(1,2,4),l1=1,l2=NA,l4=c(1,3)), mediator 1, m[,1], use mcov[,1], 2 uses no covariates, 4 uses mcov[,c(1,3)], all other mediators use all. Can also replace variable names with column numbers in the mclist.

complete

complete=TRUE if only completed cases are used in analysis.

Details

The function helps organize input data into formats readible to the BART package for building BART. It also recoganize the type of the response variable(s), so that different functions and methods will be used for the mediation effect inferences.

Value

Return the cleaned up dataset and organized by types, which is ready for the Bayesian Mediation Analysis.

N

The total number of observations.

y_type

The format of the response variable(s): 1 for continuous, 2 binary, 3 categorical, and 4 time-to-event. It is the same length as the number of outcomes.

y

The original y with observations of missing data removed, if complete=T.

y1

The outcome variables where binary or categorical variables are replaced with dummy design matrix.

cova

The covariates for y, where binary or categorical variables are replaced with dummy design matrix.

npred

The number of predictors/exposures, where a categorical exposure of k levels has k-1 dummy predictors.

nm

The number of original mediators, ncol(m).

mcov

Reformated mcov.

mind

If mcov is not NULL, mind is a matrix of (# of mediator)*ncol(mcov), cell (i,j) is the indicator of whether the jth column of mcov should be used for mediator i in m1.

pred1

The original pred with observations of missing data removed, if complete=T.

pred2

The pred1 with all categorical or binary variables are turned into dummis.

binpred1

The column numbers of binary predictors in pred1.

binpred2

The column numbers of binary predictors in pred2.

catpred1

The column numbers of categorical predictors in pred1.

catpred2

The column numbers of categorical predictors in pred2.

contpred1

The column numbers of continuous predictors in pred1.

contpred2

The column numbers of continuous predictors in pred2.

m1

The original m with observations of missing data removed, if complete=T.

m2

The m1 with all categorical or binary variables are turned into dummis.

m3.1

The m2 with all continuous variables minus a deltam[i]/2, where i is the ith mediator.

m3.2

The m2 with all continuous variables add a deltam[i]/2, where i is the ith mediator.

p1

The number of continuous mediators.

p2

The number of binary mediators.

p3

The number of categorical mediators.

binm1

The column number of binary mediators in m1.

binm2

The column number of binary mediators in m2.

catm1

The column number of categorical mediators in m1.

catm2

A matrix with the number of rows the number of categorical meidators by the order of catm1. Each row has the start (first column) and end (second column) column numbers of the categorical variable's design matrix in m2.

contm1

The column number of continuous mediators in m1.

contm2

The column number of continuous mediators in m2.

deltap

A vector of the length of the number of exposures. The difference in pred when calculate the changing rate by pred. If not input, the difference is 1 for categorical predictor and one tenth of the standard deviaiton of the predictor if continuous.

deltam

A vector of the length of the number of mediators. The ith item is the difference in the ith mediator when calculate the changing rate by each mediator. If not set, the difference is 1 for categorical mediators and one tenth of the standard deviaiton of the mediator if continuous.

Note

data_org is run within bma.bart function. Users do not have to run data_org separately.

Author(s)

Qingzhao Yu and Bin Li

References

Yu, Q., and Li, B. (2025) <doi:>. "Mediation Analysis with Bayesian Additive Regression Trees," submitted.

Examples

data("weight_behavior")
#binary predictor
try0= data_org(pred=weight_behavior[,3], m=weight_behavior[,c(2,4:14)], 
               y=weight_behavior[,15], refy = 0, predref = "F")
#add covariate for mediators
try1= data_org(pred=weight_behavior[,3], m=weight_behavior[,c(2,4:13)], 
               mcov=weight_behavior[,14], mclist=append(list(var=1:10),rep(NA,10)), 
               #"sweater" is used as a cov for "excercises" only
               y=weight_behavior[,15], refy = 0, predref = "F")  #,complete=T
#multiple prdictor
try2= data_org(pred=weight_behavior[,4], m=weight_behavior[,c(2:3,5:14)], 
               y=weight_behavior[,15], refy = 0, predref = "OTHER")
try3= data_org(pred=weight_behavior[,c(1,4)], m=weight_behavior[,c(2:3,5:14)], 
               y=weight_behavior[,15], refy = 0, predref = "OTHER")
#continuous y
try4= data_org(pred=weight_behavior[,4], m=weight_behavior[,c(2:3,5:14)], 
               y=weight_behavior[,1], refy = 0, predref = "OTHER")
#categorical y
try5= data_org(pred=weight_behavior[,1], m=weight_behavior[,c(2:3,5:14)], 
               y=weight_behavior[,4], refy = "", predref = "OTHER")
#add covariates for y and for mediators
try6= data_org(pred=weight_behavior[,4], m=weight_behavior[,c(5:12)], 
               cova=weight_behavior[,2:3],mcov=weight_behavior[,13:14], 
               mclist=c(list(var=1:7),rep(NA,6),list(1)),
               y=weight_behavior[,1], refy = 0, predref = "OTHER")
#time-to-event outcome
data(cgd1)       #a dataset in the survival package
x=cgd1[,c(4:5,7:12)]
pred=cgd1[,6]
status<-ifelse(is.na(cgd1$etime1),0,1)
y=Surv(cgd1$futime,status)          
#for continuous predictor
try7<-data_org(pred=pred,m=x,y=y) 

Print the summary results for bma.bart object.

Description

Print and plot the inference results.

Usage

## S3 method for class 'summary.bma.bart'
print(x, ..., digit = x$digit, method = x$method, RE = x$RE)

Arguments

x

the summary.bma.bart object from the summary function.

...

other arguments passed to the print function.

digit

the number of decimal digits to keep.

method

method=2 to show the results from the partial differences, method=3 to show the results from the G-computation, and method=4 for G-computation with non-parametric method (binary exposures only).

RE

If ture, print the relative effects.

Value

No return value, called for side effects.

Author(s)

Qingzhao Yu and Bin Li

References

Yu, Q., and Li, B. (2025) <doi:>. "Mediation Analysis with Bayesian Additive Regression Trees," submitted.

See Also

"bma.bart" for examples.


Summary of a bma.bart object

Description

The bma.bart object is from the bma.bart function. The summary function is to calculate the estimates, standard deviation and credible sets of the mediation effects and relative effects.

Usage

## S3 method for class 'bma.bart'
summary(object, ..., plot = TRUE, RE = TRUE, 
                 quant = c(0.025, 0.25, 0.5, 0.75, 0.975), 
                 digit = 4, method = 3, trim = 0.05)

Arguments

object

a bma.bart object created by bma.bart.

...

other arguments passed to the print function.

plot

default is TRUE, if ture, draw a barplot of the mediation effects with credible sets.

RE

default is FALSE, if ture, show the inferences on relative mediation effects.

quant

show the quantiles defined by quant of the posterior distributions of mediation effects.

digit

the number of decimal digits to keep.

method

method=2 to show the results from the partial differences, method=3 to show the results from the G-computation, and method=4 for G-computation with non-parametric method (binary exposures only).

trim

the percentage of trims to calcuate the trimed average mediation effects. By default, trim=0.5.

Details

Show the posterior distribution of the estimated mediation effects.

Value

resultX

the mediation effect estimates using method X.

resultX.re

the relative effect estimates using method X.

Author(s)

Bin Li and Qingzhao Yu

References

Yu, Q., and Li, B. (2025) <doi:>. "Mediation Analysis with Bayesian Additive Regression Trees," submitted.

See Also

"bma.bart" for examples.


Weight_Behavior Data Set

Description

This database was obtained from the Louisiana State University Health Sciences Center, New Orleans, by Dr. Richard Scribner. He explored the relationship between BMI and kids behavior through a survey at children, teachers and parents in Grenada in 2014. This data set includes 691 observations and 15 variables.

Usage

data(weight_behavior)

Format

The data set contains the following variables:

bmi - body mass index, calculated by weight(kg)/height(cm)^2, numeric

age - children's age in years at the time of survey, numeric

sex - sex of the children, factor

race - race of the children, factor

numpeople - number of people in family, numeric

car - the number of cars in family, numeric

gotosch - the method used to go to school, factor

snack - eat snack or not in a day, binary

tvhours - number of hours watching TV per week, numeric

cmpthours - number of hours using computer per week, numeric

cellhours - number of hours playing with cell phones per week, numeric

sports - join in a sport team or not, 1: yes; and 2: no

exercises - number of hours of exercises per week, numeric

sweat - number of hours of sweating activities per week, numeric

overweigh - the child is overweighed or not, binary

Examples

data(weight_behavior)
names(weight_behavior)

mirror server hosted at Truenetwork, Russian Federation.