Introduction to {whatifbandit}

{whatifbandit} allows researchers to re-simulate their randomized experiments under adaptive experimental designs. Replicating experiments can be costly and time-consuming, so this simulation tool allows researchers to explore what their results could have looked like under an adaptive experimental design. These re-analyses can inform the planning for future studies, allowing a researcher to justify an adaptive design for a future project by understanding how such designs help meet research and implementation goals.

Adaptive experimental designs randomly assigning treatments, but do so while taking into account the relative performance of each treatment. Usually this means that better performing treatments will be assigned more participants at each assignment period, allowing both identification of and assignment to the best treatment arm(s). These designs can shine in situations such as:

This vignette introduces the key functions of the {whatifbandit} package and how to use them effectively. All that’s required is data from a previous randomized experiment containing information about which treatment each observation was originally assigned to, and the original outcome. For a more detailed view of the exact adaptive procedures and implementation, please refer to the function documentation, and if you encounter any bugs please report them on the package’s GitHub repository.

Core Functionality

{whatifbandit} provides 2 main functions for conducting simulations, single_mab_simulation() and multiple_mab_simulation(). Like their names suggest single_mab_simulation() runs only 1 simulation while multiple_mab_simulation() runs a user-specified number of simulations. Internally both functions are similar, and share the same arguments except for the few additional ones multiple_mab_simulation() needs for repeated trials.

Multi-Arm-Bandits: Adaptive Assignment

Adaptive experiments are a form of multi-arm bandit (MAB) problem, where each treatment arm has an unknown probability of success, and a balance must be achieved between exploration and exploitation. In an applied setting, we want a design can adequately explore the space of possible treatments, and find the best treatment to exploit. If the decision algorithm is quick to exploit a single treatment, then there is a high chance that the true best treatment is missed, but if the decision algorithm explores too much, then we miss out on the rewards from exploiting the current treatment we believe is the best. Additionally, these unknown probabilities can change overtime in what is called a non-stationary bandit problem. Solutions to these bandit problems, in the form of decision algorithms, provide the basis for how {whatifbandit} simulates an adaptive trial.

{whatifbandit} supports 2 assignment algorithms, Thompson sampling and upper confidence bound assignment (“UCB1”). Thompson sampling calculates the posterior probability of each treatment being the best treatment arm, and then uses those probabilities to assign the treatment arms in the next wave. In the Bernoulli case implemented in {whatifbandit}, the posterior probabilities can be calculated from a beta distribution. On the other hand, UCB1 instead calculates an upper confidence bound on the expected reward of each arm, and selects the maximum value (Kuleshov and Precup 2014; Slivkins 2024; Offer-Westort, Coppock, and Green 2021).

In a multi-arm experiment with \(T\) time periods, let the set of potential treatment arms be denoted \(\mathcal{K}\). At each time period \(t \in \{1, 2, ..., T\}\), a treatment arm \(W_t \in \mathcal{K}\) needs to be selected.

In Thompson sampling, this involves finding the arm \(k \in \mathcal{K}\) with the highest probability of being the best treatment arm. When outcomes are binary, the probability of a reward from any arm \(k \in \mathcal{K}\) after wave \(t-1\) of the adaptive procedure, denoted \(\theta^k_t\), follows a beta distribution according to the number of observed success and failures the treatment has received up to that point. These cumulative successes and failures are for each arm are denoted, \(\text{success}_{kt}\) and \(\text{failures}_{kt}\) respectively. Now instead of choosing the arm \(k\) which has the highest probability of being the best as the one and only \(W_t\), Thompson sampling uses these probabilities to assign new treatments, so the probability that each arm \(k\) is the best becomes the probability that it will be assigned from the set of \(\mathcal{K}\) arms at time \(t\). For a given treatment arm \(w \in \mathcal{K}\), this probability is represented below:

\[ P(W_t = w) = P\left(\theta_t^w > \theta_t^k, \ \forall k \in \mathcal{K} \setminus \{w\} \mid \ \theta^k_t \sim \text{Beta}(1+\text{success}_{kt}, 1 + \text{failures}_{kt}) \ \forall k \in \mathcal{K}\right) \]

For UCB1 only 1 treatment arm is selected at any time \(t\). \(W_t\), the selected treatment at time \(t\), is the treatment arm \(k \in \mathcal{K}\) which maximizes the sum of the estimated expected reward and the uncertainty bound at time \(t\), where \(n_{kt}\) is the number of times arm \(k\) has been selected up to time \(t\).

\[ W_t = \arg\max_k \left(\widehat{\mathbb{E}[\text{Reward}_{kt}]} + \sqrt{\frac{2 \ln t}{n_{kt}}} \right) \]

Adaptive Inference

Statistical inference under adaptive designs requires careful consideration. The sample mean is biased for the true reward probability, because observations are no longer independently identically distributed (i.i.d.). Thus, we utilize an augmented inverse probability weighted estimator (AIPW), with a constant allocation of the variance across periods formulated by Hadad et al. (2021), whose formulas we implement in the package. These estimators are unbiased and asymptotically normal under adaptive trials, so a normal distribution can be used for parametric inference. Additionally, {whatifbandit} also allows for multiple simulations to easily estimate the variance of the procedure, and build an empirical distribution of the unbiased AIPW estimates, instead of just relying on the normal approximation.

The estimator is calculated by weighting the outcome against the probability of being assigned the treatment, and a conditional expectation/regression estimate based on the previous periods of assignment, then aggregating based on an adaptive weight. In the package the conditional expectation is estimated via a grouped mean, and the adaptive weight uses the constant allocation rate described in Hadad et al. (2021). The allocation rate is a function which describes the proportion of the remaining variance that is allocated for next observation. This allocation rate determines how the adaptive weights are used which are necessary to ensure the variance of the estimator converges properly. Additional details about adaptive weights, and allocation functions can be found in Hadad et al. (2021), along with the other specifications they tested. The formulas below are adapted from the Hadad et al. (2021), and showcase the AIPW calculations.

The first equation below gives the individual AIPW estimate for treatment \(w\) and at period \(t\), where \(\mathbb{I}\) is an indicator function, \(e_t(w)\) is the probability of unit in period \(t\) being assigned treatment \(w\), and \(\hat{m}_t(w)\) is conditional expectation/regression estimate based on the previous periods of assignment. The regression estimate at period \(t\), is the average of the outcomes under treatment \(w\), from periods \(t-1\), where \(n_{t-1}(w)\) is the number of times treatment \(w\) was assigned by period \(t-1\).

The second equation below gives the final AIPW estimate for treatment \(w\), and is the weighted sum of each individual estimate \(\widehat{\Gamma}^{AIPW}_t\) and the adaptive weights \(h_t(w)\) for treatment \(w\) at time \(t\), all divided by the sum of the weights. These adaptive weights are calculated in the third equation below by dividing current probability of being assigned treatment, \(e_t(w)\), by the total number of periods, \(T\), and taking the square root.

\[ \widehat{\Gamma}^{AIPW}_t = \frac{\mathbb{I}\{W_t = w\}}{e_t(w)} Y_t + \left(1 - \frac{\mathbb{I}\{W_t = w\}}{e_t(w)} \right)\hat{m}_t(w) \tag {Individual} \\ \] \[ \widehat{Q}^h_T(w) = \frac{\sum_{t=1}^Th_t(w)\widehat{\Gamma}^{AIPW}_t}{\sum_{t=1}^Th_t(w)} \tag{Average} \] \[ h_t(w) = \sqrt{\frac{e_t(w)}{T}} \tag{Adaptive Weight} \\ \] \[ \hat{m}_t(w) = \frac{1}{n_{t-1}(w)}\sum_{i=1}^{n_{t-1}(w)} y_i \tag{Grouped Mean} \]

Data

To explore the functionality of {whatifbandit}, we will use the included tanf dataset. These data detail the results of a randomized experiment, which tested the impact of different letters on re-certification for the Temporary Assistance for Needy Families (TANF) program in Washington D.C. (Moore et al. 2022). The original field experiment tests 3 different conditions, over the course of 5 months.

data(tanf)
glimpse(tanf)
#> Rows: 3,517
#> Columns: 21
#> $ ic_case_id                       <chr> "42c43695d7d06af7f66fd072c3ee2593", "…
#> $ service_center                   <chr> "Anacostia", "Anacostia", "Congress_H…
#> $ appt_date                        <date> 2017-06-26, 2017-06-26, 2017-06-26, …
#> $ condition                        <fct> no_letter, no_letter, no_letter, no_l…
#> $ recert_month                     <fct> July, July, July, July, July, July, J…
#> $ letter_sent_date                 <date> 2017-06-19, 2017-06-19, 2017-06-19, …
#> $ recert_id                        <dbl> 151516, 153505, 151887, 155288, 15033…
#> $ return_to_sender                 <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
#> $ pdc_status                       <chr> "Closed", "Closed", "Pending closure"…
#> $ renewal_date                     <date> 2017-07-31, 2017-07-31, 2017-07-31, …
#> $ notice_date.x                    <date> 2017-05-29, 2017-05-29, 2017-05-29, …
#> $ days_betwn_notice_and_recert_due <dbl> 63, 63, 63, 63, 63, 63, 63, 63, 63, 6…
#> $ cert_period_start                <date> 2016-11-01, 2017-03-07, 2016-11-01, …
#> $ cert_period_end                  <date> 2017-07-31, 2017-07-31, 2017-07-31, …
#> $ recert_status                    <chr> "Notice Sent", "Notice Sent", "Notice…
#> $ denial_reason                    <chr> "No response from client", "No respon…
#> $ recert_month_year                <chr> "July 2017", "July 2017", "July 2017"…
#> $ notice_date.y                    <date> 2017-05-29, 2017-05-29, 2017-05-29, …
#> $ recert_status_dcas               <chr> "Denied due to failure of action", "D…
#> $ date_of_recert                   <date> NA, NA, NA, 2017-06-27, NA, NA, NA, …
#> $ success                          <dbl> 0, 0, 0, 1, 0, 0, 0, 1, 1, 1, 1, 0, 0…

A Single MAB Simulation

single_mab_simulation() runs an adaptive-design simulation, where each of its arguments tunes the settings of the simulation. A simulation requres a data frame containing a unique ID for every row, that row’s original treatment, and its original outcome. These column names are passed to the function as strings.

sim <- single_mab_simulation(
  data = tanf,
  assignment_method = "Batch",
  period_length = 1000,
  algorithm = "UCB1",
  whole_experiment = TRUE, perfect_assignment = TRUE,
  prior_periods = "All",
  blocking = FALSE,
  data_cols = c(
    id_col = "ic_case_id",
    success_col = "success",
    condition_col = "condition"
  )
)

The function contains several options which control the specification of the simulation. The setup above performs UCB1 sampling without treatment blocking with batch sizes of 1000 people, where the whole experiment is used for imputing outcomes. This differs significantly from the original TANF field experiment.

The original TANF experiment was blocked by D.C. Community and Human Services service centers (and by official appointment date), ensuring each center had a comparable number of participants in each group. To replicate this we set block = TRUE and add the service_center column to block_cols (Moore et al. 2022).

The original experiment took place over 5 distinct months. To mirror that, we set assignment_method = "Date", time_unit = "Month", and period_length = 1, which creates 1-month-long periods. With a date assignment, the name of date_col needs to be provided in data_cols.

UCB1 sampling also only chooses 1 treatment per period, but because we want time-based periods, and each period is a large portion of the dataset, we use Thompson sampling instead. This ensures that we can see the treatment assignments evolve more over time. We sset algorithm = "Thompson" for this.

The time component of the experiment can be further taken into account. There is no guarantee, that by the time next month’s letters are sent out, that all the people who would recertify had done so. Additionally, when whole_experiment = TRUE, the full set of original results is used for outcome imputation, while setting it to FALSE only uses the data up to the assignment period, which more closer reflects the information available, had the original experiment been a MAB. To reflect these conditions, we set perfect_assignment = FALSE, and whole_experiment = FALSE, along with including success_date_col and assignment_date_col in data_cols.

With this configuration, if a success occurs after the date treatments are assigned, it is masked and treated as a failure. This simulates the researcher’s limited information as those late outcomes would not have been observed yet.

sim <- single_mab_simulation(
  data = tanf,
  assignment_method = "Date",
  time_unit = "Month",
  period_length = 1,
  algorithm = "Thompson",
  whole_experiment = FALSE, perfect_assignment = TRUE,
  prior_periods = "All",
  blocking = TRUE, block_cols = c("service_center"),
  data_cols = c(
    id_col = "ic_case_id",
    date_col = "appt_date",
    success_col = "success",
    condition_col = "condition",
    month_col = "recert_month",
    success_date_col = "date_of_recert",
    assignment_date_col = "letter_sent_date"
  )
)

You don’t always have to replicate the setting perfectly, reanalyzing the experiment as an adaptive trial also involves testing how the results change across different configurations, such as the size and length of periods, or the usage of perfect_assignment or whole_experiment. The only guidelines would be to use UCB1 only when the period size is a relatively small component of the dataset, and only set perfect_assignment = FALSE, when you have dates that line up the periods you have set. If this is not the case, then all the successes will simply be masked.

For example, we can experiment with using individual assignment, by setting assignment_method = "Individual". This does not follow the original experiment, and we don’t have data in that form, so perfect_assignment = TRUE should be set for the best results. Additionally, since the date still represents the order, we sort the data before running the simulation, to ensure it’s in the observed order.

It also may be strong assumption that the true probabilities of success for each treatment are stationary over time. To account for this, I can set prior_periods = 500, to only look at the last 500 periods, in the "Individual" case, observations, in the simulation when performing adaptive assignment. This ensures that the oldest periods, which might not be representative of the true current probabilities, are not considered when assigning new treatments.

Additionally, because the goal of this experiment is also to estimate a treatment effect for the letters, we might not want to leave 100% of the assignment up to the bandit algorithms. For this we have two methods, either control augmentation or a hybrid design.

Control augmentation ensures that the marked control group meets the specified probability threshold for assignment. For example, if we set control_augment = 0.25, the algorithm allocates 25% of each period to be assigned to the control group, and if a period has the control group at 15%, then the probabilities are adjusted accordingly.

The hybrid design sets aside a percentage of the dataset to be assigned uniformly randomly. If we set random_assign_prop = 0.2, then the algorithm allocates 20% of each period to be assigned to treatments randomly, while 80% are assigned via the Thompson sampling or UCB1 algorithm.

We do not recommend using both control augmentation and a hybrid design. If there are 3 conditions, like in the TANF experiment, and we set control_augment = 0.1, and random_assign_prop = 0.21, then the minimum assignment probability for the control group is not 10% but 14.9% instead (0.1 * 0.79 + 0.21/3 = 0.149). Below, we use control_augment = 0.2, and identify the control group using the control_condition argument. If we use random_assign_prob instead, we do not need to specify this.

tanf <- arrange(tanf, appt_date)
conditions <- setNames(levels(tanf$condition), c("Control", "T1", "T2"))

sim <- single_mab_simulation(
  data = tanf,
  assignment_method = "Individual",
  algorithm = "Thompson",
  whole_experiment = FALSE, perfect_assignment = TRUE,
  prior_periods = 500,
  blocking = TRUE, block_cols = c("service_center"),
  data_cols = c(
    id_col = "ic_case_id",
    success_col = "success",
    condition_col = "condition"
  ),
  control_augment = 0.2,
  control_condition = "no_letter"
)

Analysis

single_mab_simulation() returns a list with 4 components:

This object is also a custom mab class, that comes with several generic methods. The print() method displays the settings of the trial, and the summary() method aggregates the results.

class(sim)
#> [1] "mab"  "list"
sim
#> Summary for MAB Procedure: 
#>  ----------------------------------------------------- 
#> Bandit Algorithm:      thompson 
#> Control Augmentation:  0 
#> Bandit Assignment:     1 
#> Randomized Assignment: 0 
#> Perfect Assignment:    TRUE 
#> Whole Experiment:      FALSE 
#> Blocking Variables:    service_center 
#> Assignment Method:     date 
#> Period Length:         1 month
#> Total Periods:         5 periods
#> Prior Periods:         all periods
#> Number of Treatments:  3 
#> -----------------------------------------------------

The summary method presents the final Thompson/UCB1 calculation from after the trial concluded, the AIPW estimate for the probability of success, its standard error, a 95% two-tailed confidence interval based on a normal distribution, and the number of observations assigned to each treatment group over the course of the trial.

sim_summary <- summary(sim)
print(sim_summary, width = Inf)
#> # A tibble: 3 × 9
#>   Treatment_Arm Probability_Of_Best_Arm estimated_probability_of_success      SE
#>   <chr>                           <dbl>                            <dbl>   <dbl>
#> 1 no_letter                     0.00804                            0.381 0.00832
#> 2 open_appt                     0.746                              0.441 0.00970
#> 3 specific_appt                 0.246                              0.429 0.00777
#>   lower_bound upper_bound num_assigned level periods
#>         <dbl>       <dbl>        <int> <dbl>   <dbl>
#> 1       0.365       0.398          423  0.95       5
#> 2       0.422       0.460         1568  0.95       5
#> 3       0.414       0.444         1526  0.95       5

The width of the confidence intervals can be changed in the level argument of the summary method, but can also be calculated by hand using the standard error. Below, we specify 80% confidence intervals or an \(\alpha = 0.2\).

# Inside Summary Call
summary(sim, level = 0.8) |>
  select(estimated_probability_of_success, SE, lower_bound, upper_bound, level) |>
  print(width = Inf)
#> # A tibble: 3 × 5
#>   estimated_probability_of_success      SE lower_bound upper_bound level
#>                              <dbl>   <dbl>       <dbl>       <dbl> <dbl>
#> 1                            0.381 0.00832       0.371       0.392   0.8
#> 2                            0.441 0.00970       0.429       0.454   0.8
#> 3                            0.429 0.00777       0.419       0.439   0.8

# By hand
quantile <- qnorm(0.2 / 2, lower.tail = FALSE)
sim_summary |>
  mutate(
    lower_bound = estimated_probability_of_success - SE * quantile,
    upper_bound = estimated_probability_of_success + SE * quantile
  ) |>
  select(estimated_probability_of_success, SE, lower_bound, upper_bound)
#> # A tibble: 3 × 4
#>   estimated_probability_of_success      SE lower_bound upper_bound
#>                              <dbl>   <dbl>       <dbl>       <dbl>
#> 1                            0.381 0.00832       0.371       0.392
#> 2                            0.441 0.00970       0.429       0.454
#> 3                            0.429 0.00777       0.419       0.439

Plotting

Plot methods display the results. These methods use {ggplot2} to create visualizations. The mab class supports 3 plot types, "arm", "assign" and "estimate", each providing different information about the results. These plots can be built with + like any {ggplot2} object, and arguments to the main geom* can be specified through the ... argument. The geoms used are geom_line(), and geom_errorbarh().

type = "arm" provides a view of each arm over time, its relative probability of being the best, or the UCB1 value.

plot(sim, type = "arm")

type = "assign" showcases the cumulative proportion of the data, assigned to each arm over time

plot(sim, type = "assign")

type = "estimate" shows error bars on the AIPW using the estimated variances. Just like the summary(), the confidence level can be specified.

plot(sim, type = "estimate", level = 0.9, height = 0.4) +
  scale_x_continuous(breaks = seq(0, 1, .1), limits = range(0, 1))

All of the data used to create these plots and summary statistics is present in the original output, so you can easily create your own more detailed versions. Additionally, the information provided should also let you calculate other adaptive estimators, if you do not want to use the AIPW from Hadad et al. (2021) or want to specify another one of their proposed adaptive variance allocation methods.

Multiple Simulations

multiple_mab_simulation() is a wrapper function for executing multiple trials under the same settings. It is used to estimate the variance in the outcomes that occurs from the simulation process. Depending on the number of trials specified and the size of your dataset, running multiple simulations can be memory intensive, and have a long run time, so we recommend that running a single simulation first to estimate how long multiple simulations will take.

multiple_mab_simulation() has 3 new arguments but is otherwise the same as single_mab_simulation(). These arguments are:

Below, we re-simulate the TANF experiment as 100 MAB trials:

set.seed(532454)
seeds <- sample.int(1000000, 100, replace = FALSE)

time <- system.time(
  multiple_sims <- multiple_mab_simulation(
    data = tanf,
    assignment_method = "Date",
    time_unit = "Month",
    period_length = 1,
    algorithm = "Thompson",
    whole_experiment = FALSE, perfect_assignment = TRUE,
    prior_periods = "All",
    blocking = TRUE, block_cols = c("service_center"),
    data_cols = c(
      id_col = "ic_case_id",
      date_col = "appt_date",
      success_col = "success",
      condition_col = "condition",
      month_col = "recert_month",
      success_date_col = "date_of_recert",
      assignment_date_col = "letter_sent_date"
    ),
    keep_data = TRUE, times = 100, seeds = seeds
  )
)

Here we’ve kept keep_data = TRUE to showcase the memory impact of saving the data from each iteration. The full object size with all the data is 134 megabytes (MiB). However, when we remove the data from the object, the size becomes just 963 kilobytes (KiB), making it 142 times smaller than the original size.

Executing those 100 simulations took 20.58 seconds. To speed up this process multiple_mab_simulation() supports parallel processing via the furrr and future packages. To run the function in parallel, call future::plan() before running it and specify the number of cores you want to use.

set.seed(532454)
seeds <- sample.int(1000000, 100, replace = FALSE)
plan("multisession", workers = 4)
parallel_time <- system.time(
  multiple_sims <- multiple_mab_simulation(
    data = tanf,
    assignment_method = "Date",
    time_unit = "Month",
    period_length = 1,
    algorithm = "Thompson",
    whole_experiment = FALSE, perfect_assignment = TRUE,
    prior_periods = "All",
    blocking = TRUE, block_cols = c("service_center"),
    data_cols = c(
      id_col = "ic_case_id",
      date_col = "appt_date",
      success_col = "success",
      condition_col = "condition",
      month_col = "recert_month",
      success_date_col = "date_of_recert",
      assignment_date_col = "letter_sent_date"
    ),
    keep_data = FALSE, times = 100, seeds = seeds
  )
)
plan("sequential")

Using 4 cores, it only takes 8.71 seconds to run 100 simulations. Your core usage is determined by your system requirements. To see how many cores are available on your system call future::availableCores().

Using {future} comes with some new complexities. Generally, Windows users should use plan("multisession"), while Linux and MacOS users should use plan("multisession") or plan("multicore"). If you are using a High Performance Computing cluster, use the future.batchtools backend with the proper plan for your HPC scheduler. For details on how futures work, and specific future backends, please consult the future documentation.

Analyzing Repeated Simulations

multiple_mab_simulation() returns the mostly the same elements as single_mab_simulation() except for final_data. When keep_data = TRUE the final_data element is a nested data frame of all the final_data from each trial, and when keep_data = FALSE, the function returns only objects bandits, assignment_probs, estimates, assignment_quantities and settings. assignment_quantities is a new object, which is the same structure as bandits and assignment_probs but contains the number of units assigned to each treatment for each trial.

The object is also a custom multiple.mab class containing similar generic methods as the mab class, including print() and summary().

class(multiple_sims)
#> [1] "multiple.mab" "list"
print(multiple_sims)
#> Summary for MAB Procedure: 
#>  ----------------------------------------------------- 
#> Bandit Algorithm:      thompson 
#> Control Augmentation:  0 
#> Bandit Assignment:     1 
#> Randomized Assignment: 0 
#> Perfect Assignment:    TRUE 
#> Whole Experiment:      FALSE 
#> Blocking Variables:    service_center 
#> Assignment Method:     date 
#> Period Length:         1 month
#> Total Periods:         5 periods
#> Prior Periods:         all periods
#> Number of Treatments:  3 
#> Trials Conducted:      100 trials
#> Keep Final Data:       FALSE 
#> -----------------------------------------------------

For multiple.mab objects, summary() returns the average for AIPW estimates and 2 estimates of their standard error. One estimate is the average of the standard errors across each trial, with the variances averaged first \(\left(\sqrt{\frac{1}{\text{n}}\sum_1^n\mathbb{V}[\text{Estimate}_i]}\right)\). The other estimate, is the sample variance of all of the AIPW estimates across the trials (SE_empirical). 95% confidence intervals are provided using both the empirical cumulative distribution function (eCDF), and the normal CDF using the average standard errors. Additionally, the number of times each treatment arm was selected as the best is displayed, along with the mean and standard deviation of the number of units assigned to each treatment across the trials. The best arm for a given simulation is selected as the treatment arm with the highest Thompson sampling probability or UCB1 value at the end of the trial.

summary(multiple_sims) |> print(width = Inf)
#> # A tibble: 3 × 12
#>   Treatment_Arm average_probability_of_success SE_avg SE_empirical lower_normal
#>   <chr>                                  <dbl>  <dbl>        <dbl>        <dbl>
#> 1 no_letter                              0.383 0.0124      0.00913        0.359
#> 2 open_appt                              0.444 0.0120      0.00555        0.420
#> 3 specific_appt                          0.421 0.0107      0.00774        0.400
#>   upper_normal lower_empirical upper_empirical times_best level
#>          <dbl>           <dbl>           <dbl>      <dbl> <dbl>
#> 1        0.401           0.367           0.402          0  0.95
#> 2        0.455           0.430           0.452         98  0.95
#> 3        0.436           0.405           0.434          2  0.95
#>   average_num_assigned sd_num_assigned
#>                  <dbl>           <dbl>
#> 1                 471.            59.6
#> 2                1890.           238. 
#> 3                1156.           224.

Just like with mab objects, the confidence levels can be changed in the summary() call.

summary(multiple_sims, level = 0.75) |> print(width = Inf)
#> # A tibble: 3 × 12
#>   Treatment_Arm average_probability_of_success SE_avg SE_empirical lower_normal
#>   <chr>                                  <dbl>  <dbl>        <dbl>        <dbl>
#> 1 no_letter                              0.383 0.0124      0.00913        0.369
#> 2 open_appt                              0.444 0.0120      0.00555        0.430
#> 3 specific_appt                          0.421 0.0107      0.00774        0.409
#>   upper_normal lower_empirical upper_empirical times_best level
#>          <dbl>           <dbl>           <dbl>      <dbl> <dbl>
#> 1        0.394           0.374           0.395          0  0.75
#> 2        0.450           0.439           0.449         98  0.75
#> 3        0.430           0.413           0.429          2  0.75
#>   average_num_assigned sd_num_assigned
#>                  <dbl>           <dbl>
#> 1                 471.            59.6
#> 2                1890.           238. 
#> 3                1156.           224.

Plotting Repeated Simulations

multiple.mab objects also have their own plot method with 3 distinct plots: "summary", "hist", and "estimate". Just like with mab these plots are created using {ggplot2}, so can be added to with + and arguments in ... can be passed to the geom* functions inside. The geoms used are geom_bar(), geom_histogram(), and geom_errorbarh().

type = "summary" creates a bar graph showing how many times each treatment arm was selected as the best:

plot(multiple_sims, type = "summary")

type = "hist" creates histograms showcasing the distribution for the AIPW for each treatment arm, over the repeated trials or the distribution of the number of units assigned to each treatment arm. For this function because it also takes advtanage of facet_grid(), additional arguments need to be specified in a list labelled geom for items to go to geom_histogram() and facet for items to go to facet_grid().

plot(multiple_sims, type = "hist", quantity = "estimate", geom = list(bins = 50))

plot(multiple_sims, type = "hist", quantity = "assignment", facet = list(switch = "x"))
#> `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

type = "estimate" is the same as for mab, but now the spread can be shown using either empirical or normal CDF.

plot(multiple_sims, type = "estimate",
  cdf = "empirical", level = 0.99, height = 0.4
) +
  scale_x_continuous(breaks = seq(0, 1, .1), limits = range(0, 1))

All the data used to construct these plots is available in the output, so users can perform custom analyses or make custom plots.

Additional Topics

Large Datasets

Both single_mab_simulation() and multiple_mab_simulation() have support for large data objects through {data.table}. When working with large datasets, data.tables are faster with common operations and more memory efficient than data.frames or tibbles, so using them can result in faster simulations.

To turn a data.frame into a data.table simply call data.table::setDT() on your data.frame. Below, solely in order to gauge execution times, we rbind() the tanf dataset to itself several times to create a large object.

# Prepare the dataset:
tanf_large <- tanf
setDT(tanf_large)
for (i in 1:9) {
  tanf_large <- rbindlist(list(tanf_large, tanf_large))
}

setorder(tanf_large, appt_date)

# Set id to be the row number for uniqueness:
tanf_large[, id := .I] 

Now tanf_large has 1800704 rows to simulate over instead of 3517. For this example, the batch size will be 3000 people, resulting in 601 periods.

When using large datasets in conjunction with Thompson sampling, the direct calculation can fail. In the event of a failure, an attempt will be made to estimate the Thompson sampling probability by simulation draws from the posterior distribution. The number of these draws is set by the ndraws argument, which defaults to 5000, and you can change based on how accurate or fast you want the calculation to be. This simulation still has the potential to fail, and if it does, an error will be thrown. If this occurs, we recommend you change prior_periods from “All” to a lower number which limits the size data used for Thompson sampling in each period, or use the UCB1 as the decision algorithm instead.

set.seed(523432453)
dataframe_time <- system.time(single_mab_simulation(
  data = as.data.frame(tanf_large),
  assignment_method = "Batch",
  period_length = 3000,
  algorithm = "Thomspon",
  whole_experiment = FALSE, perfect_assignment = TRUE,
  prior_periods = "All",
  blocking = FALSE,
  data_cols = c(
    id_col = "id",
    success_col = "success",
    condition_col = "condition"
  ),
  ndraws = 5000
))

datatable_time <- system.time(single_mab_simulation(
  data = tanf_large,
  assignment_method = "Batch",
  period_length = 3000,
  algorithm = "UCB1",
  whole_experiment = FALSE, perfect_assignment = TRUE,
  prior_periods = "All",
  blocking = FALSE,
  data_cols = c(
    id_col = "id",
    success_col = "success",
    condition_col = "condition"
  ),
  ndraws = 5000
))

The data.table version completes in 83.11 seconds while the data.frame version takes 275.99 seconds making the the data.table version 232% faster in this case.

The data.table version may not be faster in all cases. Performance depends on the size of the data, the number of observations in each period, and the total number of periods. For simulations where each period contains fewer than 500 observations or the total size is less than 1,000,000 rows, the data.table version will likely be slower.

References

Hadad, Vitor, David A. Hirshberg, Ruohan Zhan, Stefan Wager, and Susan Athey. 2021. “Confidence Intervals for Policy Evaluation in Adaptive Experiments.” Proceedings of the National Academy of Sciences of the United States of America 118 (15): e2014602118. https://doi.org/10.1073/pnas.2014602118.
Kuleshov, Volodymyr, and Doina Precup. 2014. “Algorithms for Multi-Armed Bandit Problems.” arXiv. https://doi.org/10.48550/arXiv.1402.6028.
Moore, Ryan T., Katherine N. Gan, Karissa Minnich, and David Yokum. 2022. “Anchor Management: A Field Experiment to Encourage Families to Meet Critical Programme Deadlines.” Journal of Public Policy 42 (4): 615–36. https://doi.org/10.1017/S0143814X21000131.
Offer-Westort, Molly, Alexander Coppock, and Donald P. Green. 2021. “Adaptive Experimental Design: Prospects and Applications in Political Science.” American Journal of Political Science 65 (4): 826–44. https://doi.org/10.1111/ajps.12597.
Slivkins, Aleksandrs. 2024. “Introduction to Multi-Armed Bandits.” arXiv. https://doi.org/10.48550/arXiv.1904.07272.

mirror server hosted at Truenetwork, Russian Federation.