{whatifbandit}{whatifbandit} allows researchers to re-simulate their
randomized experiments under adaptive experimental designs. Replicating
experiments can be costly and time-consuming, so this simulation tool
allows researchers to explore what their results could have looked like
under an adaptive experimental design. These re-analyses can inform the
planning for future studies, allowing a researcher to justify an
adaptive design for a future project by understanding how such designs
help meet research and implementation goals.
Adaptive experimental designs randomly assigning treatments, but do so while taking into account the relative performance of each treatment. Usually this means that better performing treatments will be assigned more participants at each assignment period, allowing both identification of and assignment to the best treatment arm(s). These designs can shine in situations such as:
This vignette introduces the key functions of the
{whatifbandit} package and how to use them effectively. All
that’s required is data from a previous randomized experiment containing
information about which treatment each observation was originally
assigned to, and the original outcome. For a more detailed view of the
exact adaptive procedures and implementation, please refer to the
function documentation, and if you encounter any bugs please report them
on the package’s GitHub
repository.
{whatifbandit} provides 2 main functions for conducting
simulations, single_mab_simulation() and
multiple_mab_simulation(). Like their names suggest
single_mab_simulation() runs only 1 simulation while
multiple_mab_simulation() runs a user-specified number of
simulations. Internally both functions are similar, and share the same
arguments except for the few additional ones
multiple_mab_simulation() needs for repeated trials.
Adaptive experiments are a form of multi-arm bandit (MAB) problem,
where each treatment arm has an unknown probability of success, and a
balance must be achieved between exploration and exploitation. In an
applied setting, we want a design can adequately explore the space of
possible treatments, and find the best treatment to exploit. If the
decision algorithm is quick to exploit a single treatment, then there is
a high chance that the true best treatment is missed, but if the
decision algorithm explores too much, then we miss out on the rewards
from exploiting the current treatment we believe is the best.
Additionally, these unknown probabilities can change overtime in what is
called a non-stationary bandit problem. Solutions to these bandit
problems, in the form of decision algorithms, provide the basis for how
{whatifbandit} simulates an adaptive trial.
{whatifbandit} supports 2 assignment algorithms,
Thompson sampling and upper confidence bound assignment (“UCB1”).
Thompson sampling calculates the posterior probability of each treatment
being the best treatment arm, and then uses those probabilities to
assign the treatment arms in the next wave. In the Bernoulli case
implemented in {whatifbandit}, the posterior probabilities
can be calculated from a beta distribution. On the other hand, UCB1
instead calculates an upper confidence bound on the expected reward of
each arm, and selects the maximum value (Kuleshov
and Precup 2014; Slivkins 2024; Offer-Westort, Coppock, and Green
2021).
In a multi-arm experiment with \(T\) time periods, let the set of potential treatment arms be denoted \(\mathcal{K}\). At each time period \(t \in \{1, 2, ..., T\}\), a treatment arm \(W_t \in \mathcal{K}\) needs to be selected.
In Thompson sampling, this involves finding the arm \(k \in \mathcal{K}\) with the highest probability of being the best treatment arm. When outcomes are binary, the probability of a reward from any arm \(k \in \mathcal{K}\) after wave \(t-1\) of the adaptive procedure, denoted \(\theta^k_t\), follows a beta distribution according to the number of observed success and failures the treatment has received up to that point. These cumulative successes and failures are for each arm are denoted, \(\text{success}_{kt}\) and \(\text{failures}_{kt}\) respectively. Now instead of choosing the arm \(k\) which has the highest probability of being the best as the one and only \(W_t\), Thompson sampling uses these probabilities to assign new treatments, so the probability that each arm \(k\) is the best becomes the probability that it will be assigned from the set of \(\mathcal{K}\) arms at time \(t\). For a given treatment arm \(w \in \mathcal{K}\), this probability is represented below:
\[ P(W_t = w) = P\left(\theta_t^w > \theta_t^k, \ \forall k \in \mathcal{K} \setminus \{w\} \mid \ \theta^k_t \sim \text{Beta}(1+\text{success}_{kt}, 1 + \text{failures}_{kt}) \ \forall k \in \mathcal{K}\right) \]
For UCB1 only 1 treatment arm is selected at any time \(t\). \(W_t\), the selected treatment at time \(t\), is the treatment arm \(k \in \mathcal{K}\) which maximizes the sum of the estimated expected reward and the uncertainty bound at time \(t\), where \(n_{kt}\) is the number of times arm \(k\) has been selected up to time \(t\).
\[ W_t = \arg\max_k \left(\widehat{\mathbb{E}[\text{Reward}_{kt}]} + \sqrt{\frac{2 \ln t}{n_{kt}}} \right) \]
Statistical inference under adaptive designs requires careful
consideration. The sample mean is biased for the true reward
probability, because observations are no longer independently
identically distributed (i.i.d.). Thus, we utilize an augmented inverse
probability weighted estimator (AIPW), with a constant allocation of the
variance across periods formulated by Hadad et
al. (2021), whose formulas we implement in the package. These
estimators are unbiased and asymptotically normal under adaptive trials,
so a normal distribution can be used for parametric inference.
Additionally, {whatifbandit} also allows for multiple
simulations to easily estimate the variance of the procedure, and build
an empirical distribution of the unbiased AIPW estimates, instead of
just relying on the normal approximation.
The estimator is calculated by weighting the outcome against the probability of being assigned the treatment, and a conditional expectation/regression estimate based on the previous periods of assignment, then aggregating based on an adaptive weight. In the package the conditional expectation is estimated via a grouped mean, and the adaptive weight uses the constant allocation rate described in Hadad et al. (2021). The allocation rate is a function which describes the proportion of the remaining variance that is allocated for next observation. This allocation rate determines how the adaptive weights are used which are necessary to ensure the variance of the estimator converges properly. Additional details about adaptive weights, and allocation functions can be found in Hadad et al. (2021), along with the other specifications they tested. The formulas below are adapted from the Hadad et al. (2021), and showcase the AIPW calculations.
The first equation below gives the individual AIPW estimate for treatment \(w\) and at period \(t\), where \(\mathbb{I}\) is an indicator function, \(e_t(w)\) is the probability of unit in period \(t\) being assigned treatment \(w\), and \(\hat{m}_t(w)\) is conditional expectation/regression estimate based on the previous periods of assignment. The regression estimate at period \(t\), is the average of the outcomes under treatment \(w\), from periods \(t-1\), where \(n_{t-1}(w)\) is the number of times treatment \(w\) was assigned by period \(t-1\).
The second equation below gives the final AIPW estimate for treatment \(w\), and is the weighted sum of each individual estimate \(\widehat{\Gamma}^{AIPW}_t\) and the adaptive weights \(h_t(w)\) for treatment \(w\) at time \(t\), all divided by the sum of the weights. These adaptive weights are calculated in the third equation below by dividing current probability of being assigned treatment, \(e_t(w)\), by the total number of periods, \(T\), and taking the square root.
\[ \widehat{\Gamma}^{AIPW}_t = \frac{\mathbb{I}\{W_t = w\}}{e_t(w)} Y_t + \left(1 - \frac{\mathbb{I}\{W_t = w\}}{e_t(w)} \right)\hat{m}_t(w) \tag {Individual} \\ \] \[ \widehat{Q}^h_T(w) = \frac{\sum_{t=1}^Th_t(w)\widehat{\Gamma}^{AIPW}_t}{\sum_{t=1}^Th_t(w)} \tag{Average} \] \[ h_t(w) = \sqrt{\frac{e_t(w)}{T}} \tag{Adaptive Weight} \\ \] \[ \hat{m}_t(w) = \frac{1}{n_{t-1}(w)}\sum_{i=1}^{n_{t-1}(w)} y_i \tag{Grouped Mean} \]
To explore the functionality of {whatifbandit}, we will
use the included tanf dataset. These data detail the
results of a randomized experiment, which tested the impact of different
letters on re-certification for the Temporary Assistance for Needy
Families (TANF) program in Washington D.C. (Moore
et al. 2022). The original field experiment tests 3 different
conditions, over the course of 5 months.
data(tanf)
glimpse(tanf)
#> Rows: 3,517
#> Columns: 21
#> $ ic_case_id                       <chr> "42c43695d7d06af7f66fd072c3ee2593", "…
#> $ service_center                   <chr> "Anacostia", "Anacostia", "Congress_H…
#> $ appt_date                        <date> 2017-06-26, 2017-06-26, 2017-06-26, …
#> $ condition                        <fct> no_letter, no_letter, no_letter, no_l…
#> $ recert_month                     <fct> July, July, July, July, July, July, J…
#> $ letter_sent_date                 <date> 2017-06-19, 2017-06-19, 2017-06-19, …
#> $ recert_id                        <dbl> 151516, 153505, 151887, 155288, 15033…
#> $ return_to_sender                 <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
#> $ pdc_status                       <chr> "Closed", "Closed", "Pending closure"…
#> $ renewal_date                     <date> 2017-07-31, 2017-07-31, 2017-07-31, …
#> $ notice_date.x                    <date> 2017-05-29, 2017-05-29, 2017-05-29, …
#> $ days_betwn_notice_and_recert_due <dbl> 63, 63, 63, 63, 63, 63, 63, 63, 63, 6…
#> $ cert_period_start                <date> 2016-11-01, 2017-03-07, 2016-11-01, …
#> $ cert_period_end                  <date> 2017-07-31, 2017-07-31, 2017-07-31, …
#> $ recert_status                    <chr> "Notice Sent", "Notice Sent", "Notice…
#> $ denial_reason                    <chr> "No response from client", "No respon…
#> $ recert_month_year                <chr> "July 2017", "July 2017", "July 2017"…
#> $ notice_date.y                    <date> 2017-05-29, 2017-05-29, 2017-05-29, …
#> $ recert_status_dcas               <chr> "Denied due to failure of action", "D…
#> $ date_of_recert                   <date> NA, NA, NA, 2017-06-27, NA, NA, NA, …
#> $ success                          <dbl> 0, 0, 0, 1, 0, 0, 0, 1, 1, 1, 1, 0, 0…single_mab_simulation() runs an adaptive-design
simulation, where each of its arguments tunes the settings of the
simulation. A simulation requres a data frame containing a unique ID for
every row, that row’s original treatment, and its original outcome.
These column names are passed to the function as strings.
sim <- single_mab_simulation(
  data = tanf,
  assignment_method = "Batch",
  period_length = 1000,
  algorithm = "UCB1",
  whole_experiment = TRUE, perfect_assignment = TRUE,
  prior_periods = "All",
  blocking = FALSE,
  data_cols = c(
    id_col = "ic_case_id",
    success_col = "success",
    condition_col = "condition"
  )
)The function contains several options which control the specification of the simulation. The setup above performs UCB1 sampling without treatment blocking with batch sizes of 1000 people, where the whole experiment is used for imputing outcomes. This differs significantly from the original TANF field experiment.
The original TANF experiment was blocked by D.C. Community and Human
Services service centers (and by official appointment date), ensuring
each center had a comparable number of participants in each group. To
replicate this we set block = TRUE and add the
service_center column to block_cols (Moore et al. 2022).
The original experiment took place over 5 distinct months. To mirror
that, we set assignment_method = "Date",
time_unit = "Month", and period_length = 1,
which creates 1-month-long periods. With a date assignment, the name of
date_col needs to be provided in
data_cols.
UCB1 sampling also only chooses 1 treatment per period, but because
we want time-based periods, and each period is a large portion of the
dataset, we use Thompson sampling instead. This ensures that we can see
the treatment assignments evolve more over time. We sset
algorithm = "Thompson" for this.
The time component of the experiment can be further taken into
account. There is no guarantee, that by the time next month’s letters
are sent out, that all the people who would recertify had done so.
Additionally, when whole_experiment = TRUE, the full set of
original results is used for outcome imputation, while setting it to
FALSE only uses the data up to the assignment period, which more closer
reflects the information available, had the original experiment been a
MAB. To reflect these conditions, we set
perfect_assignment = FALSE, and
whole_experiment = FALSE, along with including
success_date_col and assignment_date_col in
data_cols.
With this configuration, if a success occurs after the date treatments are assigned, it is masked and treated as a failure. This simulates the researcher’s limited information as those late outcomes would not have been observed yet.
sim <- single_mab_simulation(
  data = tanf,
  assignment_method = "Date",
  time_unit = "Month",
  period_length = 1,
  algorithm = "Thompson",
  whole_experiment = FALSE, perfect_assignment = TRUE,
  prior_periods = "All",
  blocking = TRUE, block_cols = c("service_center"),
  data_cols = c(
    id_col = "ic_case_id",
    date_col = "appt_date",
    success_col = "success",
    condition_col = "condition",
    month_col = "recert_month",
    success_date_col = "date_of_recert",
    assignment_date_col = "letter_sent_date"
  )
)You don’t always have to replicate the setting perfectly, reanalyzing
the experiment as an adaptive trial also involves testing how the
results change across different configurations, such as the size and
length of periods, or the usage of perfect_assignment or
whole_experiment. The only guidelines would be to use UCB1
only when the period size is a relatively small component of the
dataset, and only set perfect_assignment = FALSE, when you
have dates that line up the periods you have set. If this is not the
case, then all the successes will simply be masked.
For example, we can experiment with using individual assignment, by
setting assignment_method = "Individual". This does not
follow the original experiment, and we don’t have data in that form, so
perfect_assignment = TRUE should be set for the best
results. Additionally, since the date still represents the order, we
sort the data before running the simulation, to ensure it’s in the
observed order.
It also may be strong assumption that the true probabilities of
success for each treatment are stationary over time. To account for
this, I can set prior_periods = 500, to only look at the
last 500 periods, in the "Individual" case, observations,
in the simulation when performing adaptive assignment. This ensures that
the oldest periods, which might not be representative of the true
current probabilities, are not considered when assigning new
treatments.
Additionally, because the goal of this experiment is also to estimate a treatment effect for the letters, we might not want to leave 100% of the assignment up to the bandit algorithms. For this we have two methods, either control augmentation or a hybrid design.
Control augmentation ensures that the marked control group meets the
specified probability threshold for assignment. For example, if we set
control_augment = 0.25, the algorithm allocates 25% of each
period to be assigned to the control group, and if a period has the
control group at 15%, then the probabilities are adjusted
accordingly.
The hybrid design sets aside a percentage of the dataset to be
assigned uniformly randomly. If we set
random_assign_prop = 0.2, then the algorithm allocates 20%
of each period to be assigned to treatments randomly, while 80% are
assigned via the Thompson sampling or UCB1 algorithm.
We do not recommend using both control augmentation and a hybrid
design. If there are 3 conditions, like in the TANF experiment, and we
set control_augment = 0.1, and
random_assign_prop = 0.21, then the minimum assignment
probability for the control group is not 10% but 14.9% instead (0.1 *
0.79 + 0.21/3 = 0.149). Below, we use
control_augment = 0.2, and identify the control group using
the control_condition argument. If we use
random_assign_prob instead, we do not need to specify
this.
tanf <- arrange(tanf, appt_date)
conditions <- setNames(levels(tanf$condition), c("Control", "T1", "T2"))
sim <- single_mab_simulation(
  data = tanf,
  assignment_method = "Individual",
  algorithm = "Thompson",
  whole_experiment = FALSE, perfect_assignment = TRUE,
  prior_periods = 500,
  blocking = TRUE, block_cols = c("service_center"),
  data_cols = c(
    id_col = "ic_case_id",
    success_col = "success",
    condition_col = "condition"
  ),
  control_augment = 0.2,
  control_condition = "no_letter"
)single_mab_simulation() returns a list with 4
components:
final_data is the transformed input containing the
treatment assignments and outcomes from the simulation, along with other
new columns such as the probability of being assigned each treatment at
each period, and the regression adjustment \(\hat{m}(W)\) for the AIPW estimates.bandits contains either the Thompson sampling
probabilities or UCB1 values from each period in the trial. The chart
should be interpreted as the bandit calculation after that period
occurred.assignment_probs contains the probability of assignment
from each period in the trial. The chart should be interpreted as the
probability for that specific period.estimates contains the AIPW and sample estimates with
variances for each treatment arm.settings contains the configuration for the
simulation.This object is also a custom mab class, that comes with
several generic methods. The print() method displays the
settings of the trial, and the summary() method aggregates
the results.
class(sim)
#> [1] "mab"  "list"
sim
#> Summary for MAB Procedure: 
#>  ----------------------------------------------------- 
#> Bandit Algorithm:      thompson 
#> Control Augmentation:  0 
#> Bandit Assignment:     1 
#> Randomized Assignment: 0 
#> Perfect Assignment:    TRUE 
#> Whole Experiment:      FALSE 
#> Blocking Variables:    service_center 
#> Assignment Method:     date 
#> Period Length:         1 month
#> Total Periods:         5 periods
#> Prior Periods:         all periods
#> Number of Treatments:  3 
#> -----------------------------------------------------The summary method presents the final Thompson/UCB1 calculation from after the trial concluded, the AIPW estimate for the probability of success, its standard error, a 95% two-tailed confidence interval based on a normal distribution, and the number of observations assigned to each treatment group over the course of the trial.
sim_summary <- summary(sim)
print(sim_summary, width = Inf)
#> # A tibble: 3 × 9
#>   Treatment_Arm Probability_Of_Best_Arm estimated_probability_of_success      SE
#>   <chr>                           <dbl>                            <dbl>   <dbl>
#> 1 no_letter                     0.00804                            0.381 0.00832
#> 2 open_appt                     0.746                              0.441 0.00970
#> 3 specific_appt                 0.246                              0.429 0.00777
#>   lower_bound upper_bound num_assigned level periods
#>         <dbl>       <dbl>        <int> <dbl>   <dbl>
#> 1       0.365       0.398          423  0.95       5
#> 2       0.422       0.460         1568  0.95       5
#> 3       0.414       0.444         1526  0.95       5The width of the confidence intervals can be changed in the
level argument of the summary method, but can also be
calculated by hand using the standard error. Below, we specify 80%
confidence intervals or an \(\alpha =
0.2\).
# Inside Summary Call
summary(sim, level = 0.8) |>
  select(estimated_probability_of_success, SE, lower_bound, upper_bound, level) |>
  print(width = Inf)
#> # A tibble: 3 × 5
#>   estimated_probability_of_success      SE lower_bound upper_bound level
#>                              <dbl>   <dbl>       <dbl>       <dbl> <dbl>
#> 1                            0.381 0.00832       0.371       0.392   0.8
#> 2                            0.441 0.00970       0.429       0.454   0.8
#> 3                            0.429 0.00777       0.419       0.439   0.8
# By hand
quantile <- qnorm(0.2 / 2, lower.tail = FALSE)
sim_summary |>
  mutate(
    lower_bound = estimated_probability_of_success - SE * quantile,
    upper_bound = estimated_probability_of_success + SE * quantile
  ) |>
  select(estimated_probability_of_success, SE, lower_bound, upper_bound)
#> # A tibble: 3 × 4
#>   estimated_probability_of_success      SE lower_bound upper_bound
#>                              <dbl>   <dbl>       <dbl>       <dbl>
#> 1                            0.381 0.00832       0.371       0.392
#> 2                            0.441 0.00970       0.429       0.454
#> 3                            0.429 0.00777       0.419       0.439Plot methods display the results. These methods use
{ggplot2} to create visualizations. The mab
class supports 3 plot types, "arm", "assign"
and "estimate", each providing different information about
the results. These plots can be built with + like any
{ggplot2} object, and arguments to the main
geom* can be specified through the ...
argument. The geoms used are geom_line(), and
geom_errorbarh().
type = "arm" provides a view of each arm over time, its
relative probability of being the best, or the UCB1 value.
type = "assign" showcases the cumulative proportion of
the data, assigned to each arm over time
type = "estimate" shows error bars on the AIPW using the
estimated variances. Just like the summary(), the
confidence level can be specified.
plot(sim, type = "estimate", level = 0.9, height = 0.4) +
  scale_x_continuous(breaks = seq(0, 1, .1), limits = range(0, 1))All of the data used to create these plots and summary statistics is present in the original output, so you can easily create your own more detailed versions. Additionally, the information provided should also let you calculate other adaptive estimators, if you do not want to use the AIPW from Hadad et al. (2021) or want to specify another one of their proposed adaptive variance allocation methods.
multiple_mab_simulation() is a wrapper function for
executing multiple trials under the same settings. It is used to
estimate the variance in the outcomes that occurs from the simulation
process. Depending on the number of trials specified and the size of
your dataset, running multiple simulations can be memory intensive, and
have a long run time, so we recommend that running a single simulation
first to estimate how long multiple simulations will take.
multiple_mab_simulation() has 3 new arguments but is
otherwise the same as single_mab_simulation(). These
arguments are:
times: The number of times to run the simulation.seeds: A numeric vector of random seeds, where
length(seeds) = times. Required to make results
reproducible, and we encourage setting them before hand using
sample.int().keep_data: A logical value, whether or not to keep the
final_data object from each simulation. Default is FALSE,
to reduce memory load of execution.Below, we re-simulate the TANF experiment as 100 MAB trials:
set.seed(532454)
seeds <- sample.int(1000000, 100, replace = FALSE)
time <- system.time(
  multiple_sims <- multiple_mab_simulation(
    data = tanf,
    assignment_method = "Date",
    time_unit = "Month",
    period_length = 1,
    algorithm = "Thompson",
    whole_experiment = FALSE, perfect_assignment = TRUE,
    prior_periods = "All",
    blocking = TRUE, block_cols = c("service_center"),
    data_cols = c(
      id_col = "ic_case_id",
      date_col = "appt_date",
      success_col = "success",
      condition_col = "condition",
      month_col = "recert_month",
      success_date_col = "date_of_recert",
      assignment_date_col = "letter_sent_date"
    ),
    keep_data = TRUE, times = 100, seeds = seeds
  )
)Here we’ve kept keep_data = TRUE to showcase the memory
impact of saving the data from each iteration. The full object size with
all the data is 134 megabytes (MiB). However, when we remove the data
from the object, the size becomes just 963 kilobytes (KiB), making it
142 times smaller than the original size.
Executing those 100 simulations took 20.58 seconds. To speed up this
process multiple_mab_simulation() supports parallel
processing via the furrr
and future packages. To run
the function in parallel, call future::plan() before
running it and specify the number of cores you want to use.
set.seed(532454)
seeds <- sample.int(1000000, 100, replace = FALSE)
plan("multisession", workers = 4)
parallel_time <- system.time(
  multiple_sims <- multiple_mab_simulation(
    data = tanf,
    assignment_method = "Date",
    time_unit = "Month",
    period_length = 1,
    algorithm = "Thompson",
    whole_experiment = FALSE, perfect_assignment = TRUE,
    prior_periods = "All",
    blocking = TRUE, block_cols = c("service_center"),
    data_cols = c(
      id_col = "ic_case_id",
      date_col = "appt_date",
      success_col = "success",
      condition_col = "condition",
      month_col = "recert_month",
      success_date_col = "date_of_recert",
      assignment_date_col = "letter_sent_date"
    ),
    keep_data = FALSE, times = 100, seeds = seeds
  )
)
plan("sequential")Using 4 cores, it only takes 8.71 seconds to run 100 simulations.
Your core usage is determined by your system requirements. To see how
many cores are available on your system call
future::availableCores().
Using {future} comes with some new complexities.
Generally, Windows users should use plan("multisession"),
while Linux and MacOS users should use plan("multisession")
or plan("multicore"). If you are using a High Performance
Computing cluster, use the future.batchtools backend with
the proper plan for your HPC scheduler. For details on how
futures work, and specific future backends, please consult
the future
documentation.
multiple_mab_simulation() returns the mostly the same
elements as single_mab_simulation() except for
final_data. When keep_data = TRUE the
final_data element is a nested data frame of all the
final_data from each trial, and when
keep_data = FALSE, the function returns only objects
bandits, assignment_probs,
estimates, assignment_quantities and
settings. assignment_quantities is a new
object, which is the same structure as bandits and
assignment_probs but contains the number of units assigned
to each treatment for each trial.
The object is also a custom multiple.mab class
containing similar generic methods as the mab class,
including print() and summary().
class(multiple_sims)
#> [1] "multiple.mab" "list"
print(multiple_sims)
#> Summary for MAB Procedure: 
#>  ----------------------------------------------------- 
#> Bandit Algorithm:      thompson 
#> Control Augmentation:  0 
#> Bandit Assignment:     1 
#> Randomized Assignment: 0 
#> Perfect Assignment:    TRUE 
#> Whole Experiment:      FALSE 
#> Blocking Variables:    service_center 
#> Assignment Method:     date 
#> Period Length:         1 month
#> Total Periods:         5 periods
#> Prior Periods:         all periods
#> Number of Treatments:  3 
#> Trials Conducted:      100 trials
#> Keep Final Data:       FALSE 
#> -----------------------------------------------------For multiple.mab objects, summary() returns
the average for AIPW estimates and 2 estimates of their standard error.
One estimate is the average of the standard errors across each trial,
with the variances averaged first \(\left(\sqrt{\frac{1}{\text{n}}\sum_1^n\mathbb{V}[\text{Estimate}_i]}\right)\).
The other estimate, is the sample variance of all of the AIPW estimates
across the trials (SE_empirical). 95% confidence intervals
are provided using both the empirical cumulative distribution function
(eCDF), and the normal CDF using the average standard errors.
Additionally, the number of times each treatment arm was selected as the
best is displayed, along with the mean and standard deviation of the
number of units assigned to each treatment across the trials. The best
arm for a given simulation is selected as the treatment arm with the
highest Thompson sampling probability or UCB1 value at the end of the
trial.
summary(multiple_sims) |> print(width = Inf)
#> # A tibble: 3 × 12
#>   Treatment_Arm average_probability_of_success SE_avg SE_empirical lower_normal
#>   <chr>                                  <dbl>  <dbl>        <dbl>        <dbl>
#> 1 no_letter                              0.383 0.0124      0.00913        0.359
#> 2 open_appt                              0.444 0.0120      0.00555        0.420
#> 3 specific_appt                          0.421 0.0107      0.00774        0.400
#>   upper_normal lower_empirical upper_empirical times_best level
#>          <dbl>           <dbl>           <dbl>      <dbl> <dbl>
#> 1        0.401           0.367           0.402          0  0.95
#> 2        0.455           0.430           0.452         98  0.95
#> 3        0.436           0.405           0.434          2  0.95
#>   average_num_assigned sd_num_assigned
#>                  <dbl>           <dbl>
#> 1                 471.            59.6
#> 2                1890.           238. 
#> 3                1156.           224.Just like with mab objects, the confidence levels can be
changed in the summary() call.
summary(multiple_sims, level = 0.75) |> print(width = Inf)
#> # A tibble: 3 × 12
#>   Treatment_Arm average_probability_of_success SE_avg SE_empirical lower_normal
#>   <chr>                                  <dbl>  <dbl>        <dbl>        <dbl>
#> 1 no_letter                              0.383 0.0124      0.00913        0.369
#> 2 open_appt                              0.444 0.0120      0.00555        0.430
#> 3 specific_appt                          0.421 0.0107      0.00774        0.409
#>   upper_normal lower_empirical upper_empirical times_best level
#>          <dbl>           <dbl>           <dbl>      <dbl> <dbl>
#> 1        0.394           0.374           0.395          0  0.75
#> 2        0.450           0.439           0.449         98  0.75
#> 3        0.430           0.413           0.429          2  0.75
#>   average_num_assigned sd_num_assigned
#>                  <dbl>           <dbl>
#> 1                 471.            59.6
#> 2                1890.           238. 
#> 3                1156.           224.multiple.mab objects also have their own plot method
with 3 distinct plots: "summary", "hist", and
"estimate". Just like with mab these plots are
created using {ggplot2}, so can be added to with
+ and arguments in ... can be passed to the
geom* functions inside. The geoms used are
geom_bar(), geom_histogram(), and
geom_errorbarh().
type = "summary" creates a bar graph showing how many
times each treatment arm was selected as the best:
type = "hist" creates histograms showcasing the
distribution for the AIPW for each treatment arm, over the repeated
trials or the distribution of the number of units assigned to each
treatment arm. For this function because it also takes advtanage of
facet_grid(), additional arguments need to be specified in
a list labelled geom for items to go to
geom_histogram() and facet for items to go to
facet_grid().
plot(multiple_sims, type = "hist", quantity = "assignment", facet = list(switch = "x"))
#> `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.type = "estimate" is the same as for mab,
but now the spread can be shown using either empirical or normal
CDF.
plot(multiple_sims, type = "estimate",
  cdf = "empirical", level = 0.99, height = 0.4
) +
  scale_x_continuous(breaks = seq(0, 1, .1), limits = range(0, 1))All the data used to construct these plots is available in the output, so users can perform custom analyses or make custom plots.
Both single_mab_simulation() and
multiple_mab_simulation() have support for large data
objects through {data.table}. When working with large
datasets, data.tables are faster with common operations and more memory
efficient than data.frames or tibbles, so using them can result in
faster simulations.
To turn a data.frame into a data.table simply call
data.table::setDT() on your data.frame. Below, solely in
order to gauge execution times, we rbind() the
tanf dataset to itself several times to create a large
object.
# Prepare the dataset:
tanf_large <- tanf
setDT(tanf_large)
for (i in 1:9) {
  tanf_large <- rbindlist(list(tanf_large, tanf_large))
}
setorder(tanf_large, appt_date)
# Set id to be the row number for uniqueness:
tanf_large[, id := .I] Now tanf_large has 1800704 rows to simulate over instead
of 3517. For this example, the batch size will be 3000 people, resulting
in 601 periods.
When using large datasets in conjunction with Thompson sampling, the
direct calculation can fail. In the event of a failure, an attempt will
be made to estimate the Thompson sampling probability by simulation
draws from the posterior distribution. The number of these draws is set
by the ndraws argument, which defaults to 5000, and you can
change based on how accurate or fast you want the calculation to be.
This simulation still has the potential to fail, and if it does, an
error will be thrown. If this occurs, we recommend you change
prior_periods from “All” to a lower number which limits the
size data used for Thompson sampling in each period, or use the UCB1 as
the decision algorithm instead.
set.seed(523432453)
dataframe_time <- system.time(single_mab_simulation(
  data = as.data.frame(tanf_large),
  assignment_method = "Batch",
  period_length = 3000,
  algorithm = "Thomspon",
  whole_experiment = FALSE, perfect_assignment = TRUE,
  prior_periods = "All",
  blocking = FALSE,
  data_cols = c(
    id_col = "id",
    success_col = "success",
    condition_col = "condition"
  ),
  ndraws = 5000
))
datatable_time <- system.time(single_mab_simulation(
  data = tanf_large,
  assignment_method = "Batch",
  period_length = 3000,
  algorithm = "UCB1",
  whole_experiment = FALSE, perfect_assignment = TRUE,
  prior_periods = "All",
  blocking = FALSE,
  data_cols = c(
    id_col = "id",
    success_col = "success",
    condition_col = "condition"
  ),
  ndraws = 5000
))The data.table version completes in 83.11 seconds while the data.frame version takes 275.99 seconds making the the data.table version 232% faster in this case.
The data.table version may not be faster in all cases. Performance depends on the size of the data, the number of observations in each period, and the total number of periods. For simulations where each period contains fewer than 500 observations or the total size is less than 1,000,000 rows, the data.table version will likely be slower.