In the SimTOST
R package, which is specifically designed
for sample size estimation for bioequivalence studies, hypothesis
testing is based on the Two One-Sided Tests (TOST) procedure. (Sozu et al.
2015) In TOST, the equivalence test is framed as a comparison
between the the null hypothesis of ‘new product is worse by a clinically
relevant quantity’ and the alternative hypothesis of ‘difference between
products is too small to be clinically relevant’. This vignette focuses
on a parallel design, with 3 arms/treatments and 3 primary
endpoints.
Similar to the example presented in Bioequivalence Tests for Parallel Trial Designs: 2 Arms, 3 Endpoints, where equivalence across multiple endpoints was assessed, this vignette extends the framework to trials involving two reference products. This scenario arises when regulators from different regions require comparisons with their locally sourced reference biosimilars.
In many studies, the aim is to evaluate equivalence across multiple primary variables. For example, the European Medicines Agency (EMA) recommends demonstrating equivalence for both the Area Under the Curve (AUC) and the maximum concentration (Cmax) when endpoints. In this scenario, we additionally evaluate equivalence across three treatment arms, including a test biosimilar and two reference products, each sourced from a different regulatory region.
This vignette demonstrates advanced techniques for calculating sample size in such trials, in which multiple endpoints and multiple reference products are considered. As an illustrative example, we use data from the phase 1 trial NCT01922336, which assessed pharmacokinetic (PK) endpoints following a single dose of SB2, its EU-sourced reference product (EU-INF), and its US-sourced reference product (US-INF) (Shin et al. 2015).
PK measure | SB2 | EU-INF | US-INF |
---|---|---|---|
AUCinf (\(\mu\)g*h/mL) | 38,703 \(\pm\) 11,114 | 39,360 \(\pm\) 12,332 | 39,270 \(\pm\) 10,064 |
AUClast (\(\mu\)g*h/mL) | 36,862 \(\pm\) 9133 | 37,022 \(\pm\) 9398 | 37,368 \(\pm\) 8332 |
Cmax (\(\mu\)g/mL) | 127.0 \(\pm\) 16.9 | 126.2 \(\pm\) 17.9 | 129.2 \(\pm\) 18.8 |
As in this vignette, the following inputs are required:
mu_list
),sigma_list
),yname_list
and
arm_names
),list_comparator
),list_y_comparator
).In this example, we simultaneously compare the trial drug (SB2) to each reference biosimilar:
We define the required list objects:
# Mean values for each endpoint and arm
mu_list <- list(
SB2 = c(AUCinf = 38703, AUClast = 36862, Cmax = 127.0),
EUINF = c(AUCinf = 39360, AUClast = 37022, Cmax = 126.2),
USINF = c(AUCinf = 39270, AUClast = 37368, Cmax = 129.2)
)
# Standard deviation values for each endpoint and arm
sigma_list <- list(
SB2 = c(AUCinf = 11114, AUClast = 9133, Cmax = 16.9),
EUINF = c(AUCinf = 12332, AUClast = 9398, Cmax = 17.9),
USINF = c(AUCinf = 10064, AUClast = 8332, Cmax = 18.8)
)
In this analysis, we adopt the Ratio of Means (ROM) approach to evaluate bioequivalence across independent co-primary endpoints.
In this example, we aim to demonstrate:
The comparisons are specified as follows:
# Arms to be compared
list_comparator <- list(
EMA = c("SB2", "EUINF"),
FDA = c("SB2", "USINF")
)
# Endpoints to be compared
list_y_comparator <- list(
EMA = c("AUCinf", "Cmax"),
FDA = c("AUClast", "Cmax")
)
For all endpoints, bioequivalence is established if the 90% confidence intervals for the ratios of the geometric means fall within the equivalence range of 80.00% to 125.00%.
Below we define the equivalence boundaries for each comparison:
list_lequi.tol <- list(
"EMA" = c(AUCinf = 0.8, Cmax = 0.8),
"FDA" = c(AUClast = 0.8, Cmax = 0.8)
)
list_uequi.tol <- list(
"EMA" = c(AUCinf = 1.25, Cmax = 1.25),
"FDA" = c(AUClast = 1.25, Cmax = 1.25)
)
Here, the list_comparator
parameter specifies the arms
being compared, while list_lequi.tol
and
list_uequi.tol
define the lower and upper equivalence
boundaries for the two endpoints.
To calculate the required sample size for testing equivalence under the specified conditions, we use the sampleSize() function. This function computes the total sample size needed to achieve the target power while ensuring equivalence criteria are met.
library(SimTOST)
(N_ss <- sampleSize(power = 0.9, # Target power
alpha = 0.05, # Type I error rate
mu_list = mu_list, # Means for each endpoint and arm
sigma_list = sigma_list, # Standard deviations
list_comparator = list_comparator, # Comparator arms
list_y_comparator = list_y_comparator, # Endpoints to compare
list_lequi.tol = list_lequi.tol, # Lower equivalence boundaries
list_uequi.tol = list_uequi.tol, # Upper equivalence boundaries
dtype = "parallel", # Trial design type
ctype = "ROM", # Test type: Ratio of Means
vareq = TRUE, # Assume equal variances
lognorm = TRUE, # Log-normal distribution assumption
nsim = 1000, # Number of stochastic simulations
seed = 1234)) # Random seed for reproducibility
#> Sample Size Calculation Results
#> -------------------------------------------------------------
#> Study Design: parallel trial targeting 90% power with a 5% type-I error.
#>
#> Comparisons:
#> SB2 vs. EUINF
#> - Endpoints Tested: AUCinf, Cmax
#> (multiple co-primary endpoints, m = 2 )
#> SB2 vs. USINF
#> - Endpoints Tested: AUClast, Cmax
#> (multiple co-primary endpoints, m = 2 )
#> -------------------------------------------------------------
#> Parameter Value
#> Total Sample Size 117
#> Achieved Power 90
#> Power Confidence Interval 87.9 - 91.8
#> -------------------------------------------------------------
A total sample size of 117 patients (or 39 per arm) is required to demonstrate equivalence.
In this example, we aim to establish equivalence for at least two out of three primary endpoints while accounting for multiplicity using the Bonferroni adjustment. The following assumptions are made:
vareq = TRUE
).ctype = "ROM"
).dtype = "parallel"
).lognorm = TRUE
).rho = 0
).adjust = "bon"
).k = 2
).The comparisons and equivalence boundaries are defined as follows:
# Endpoints to be compared for each comparator
list_y_comparator <- list(
EMA = c("AUCinf", "AUClast", "Cmax"),
FDA = c("AUCinf", "AUClast", "Cmax")
)
# Define lower equivalence boundaries for each comparator
list_lequi.tol <- list(
EMA = c(AUCinf = 0.8, AUClast = 0.8, Cmax = 0.8),
FDA = c(AUCinf = 0.8, AUClast = 0.8, Cmax = 0.8)
)
# Define upper equivalence boundaries for each comparator
list_uequi.tol <- list(
EMA = c(AUCinf = 1.25, AUClast = 1.25, Cmax = 1.25),
FDA = c(AUCinf = 1.25, AUClast = 1.25, Cmax = 1.25)
)
Finally, we calculate the required sample size:
(N_mp <- sampleSize(power = 0.9, # Target power
alpha = 0.05, # Type I error rate
mu_list = mu_list, # Means for each endpoint and arm
sigma_list = sigma_list, # Standard deviations
list_comparator = list_comparator, # Comparator arms
list_y_comparator = list_y_comparator, # Endpoints to compare
list_lequi.tol = list_lequi.tol, # Lower equivalence boundaries
list_uequi.tol = list_uequi.tol, # Upper equivalence boundaries
k = 2, # Number of endpoints required to demonstrate equivalence
adjust = "bon", # Bonferroni adjustment for multiple comparisons
dtype = "parallel", # Trial design type (parallel group)
ctype = "ROM", # Test type: Ratio of Means
vareq = TRUE, # Assume equal variances across arms
lognorm = TRUE, # Log-normal distribution assumption
nsim = 1000, # Number of stochastic simulations
seed = 1234)) # Random seed for reproducibility
#> Sample Size Calculation Results
#> -------------------------------------------------------------
#> Study Design: parallel trial targeting 90% power with a 5% type-I error.
#>
#> Comparisons:
#> SB2 vs. EUINF
#> - Endpoints Tested: AUCinf, AUClast, Cmax
#> (multiple primary endpoints, k = 2 )
#> - Multiplicity Correction: Bonferroni
#> Adjusted Significance Levels: alpha = 0.0167
#>
#> SB2 vs. USINF
#> - Endpoints Tested: AUCinf, AUClast, Cmax
#> (multiple primary endpoints, k = 2 )
#> - Multiplicity Correction: Bonferroni
#> Adjusted Significance Levels: alpha = 0.0167
#>
#> -------------------------------------------------------------
#> Parameter Value
#> Total Sample Size 93
#> Achieved Power 90
#> Power Confidence Interval 87.9 - 91.8
#> -------------------------------------------------------------
In this example, we build upon the previous scenario but introduce unequal allocation rates across treatment arms. Specifically, we require that the number of patients in the new treatment arm is double the number in each of the reference arms.
This can be achieved by specifying the treatment allocation rate
parameter (TAR
). Rates are provided as a vector, for
example: TAR = c(2, 1, 1)
. This ensures that for every two
patients assigned to the new treatment arm, one patient is assigned to
each reference arm.
(N_mp2 <- sampleSize(power = 0.9, # Target power
alpha = 0.05, # Type I error rate
mu_list = mu_list, # Means for each endpoint and arm
sigma_list = sigma_list, # Standard deviations
list_comparator = list_comparator, # Comparator arms
list_y_comparator = list_y_comparator, # Endpoints to compare
list_lequi.tol = list_lequi.tol, # Lower equivalence boundaries
list_uequi.tol = list_uequi.tol, # Upper equivalence boundaries
k = 2, # Number of endpoints required to demonstrate equivalence
adjust = "bon", # Bonferroni adjustment for multiple comparisons
TAR = c("SB2" = 2, "EUINF" = 1, "USINF" = 1), # Treatment allocation rates
dtype = "parallel", # Trial design type (parallel group)
ctype = "ROM", # Test type: Ratio of Means
vareq = TRUE, # Assume equal variances across arms
lognorm = TRUE, # Log-normal distribution assumption
nsim = 1000, # Number of stochastic simulations
seed = 1234)) # Random seed for reproducibility
#> Sample Size Calculation Results
#> -------------------------------------------------------------
#> Study Design: parallel trial targeting 90% power with a 5% type-I error.
#>
#> Comparisons:
#> SB2 vs. EUINF
#> - Endpoints Tested: AUCinf, AUClast, Cmax
#> (multiple primary endpoints, k = 2 )
#> - Multiplicity Correction: Bonferroni
#> Adjusted Significance Levels: alpha = 0.0167
#>
#> SB2 vs. USINF
#> - Endpoints Tested: AUCinf, AUClast, Cmax
#> (multiple primary endpoints, k = 2 )
#> - Multiplicity Correction: Bonferroni
#> Adjusted Significance Levels: alpha = 0.0167
#>
#> -------------------------------------------------------------
#> Parameter Value
#> Total Sample Size 96
#> Achieved Power 90
#> Power Confidence Interval 87.9 - 91.8
#> -------------------------------------------------------------
Results indicate that 48 patients are required for the active treatment arm (SB2), and 24 patients are required for each reference arm. The total sample size required is 96, which is larger compared to the trial with an equal allocation ratio, for which the total sample size was 93.
In the examples above, the sample size calculations assumed that all enrolled patients complete the trial. However, in practice, a certain percentage of participants drop out, which can impact the required sample size. To account for this, we consider a 20% dropout rate across all treatment arms.
(N_mp3 <- sampleSize(power = 0.9, # Target power
alpha = 0.05, # Type I error rate
mu_list = mu_list, # Means for each endpoint and arm
sigma_list = sigma_list, # Standard deviations
list_comparator = list_comparator, # Comparator arms
list_y_comparator = list_y_comparator, # Endpoints to compare
list_lequi.tol = list_lequi.tol, # Lower equivalence boundaries
list_uequi.tol = list_uequi.tol, # Upper equivalence boundaries
k = 2, # Number of endpoints required to demonstrate equivalence
adjust = "bon", # Bonferroni adjustment for multiple comparisons
TAR = c("SB2" = 2, "EUINF" = 1, "USINF" = 1), # Treatment allocation rates
dropout = c("SB2" = 0.20, "EUINF" = 0.20, "USINF" = 0.20), # Expected dropout rates
dtype = "parallel", # Trial design type (parallel group)
ctype = "ROM", # Test type: Ratio of Means
vareq = TRUE, # Assume equal variances across arms
lognorm = TRUE, # Log-normal distribution assumption
nsim = 1000, # Number of stochastic simulations
seed = 1234)) # Random seed for reproducibility
#> Sample Size Calculation Results
#> -------------------------------------------------------------
#> Study Design: parallel trial targeting 90% power with a 5% type-I error.
#>
#> Comparisons:
#> SB2 vs. EUINF
#> - Endpoints Tested: AUCinf, AUClast, Cmax
#> (multiple primary endpoints, k = 2 )
#> - Multiplicity Correction: Bonferroni
#> Adjusted Significance Levels: alpha = 0.0167
#>
#> SB2 vs. USINF
#> - Endpoints Tested: AUCinf, AUClast, Cmax
#> (multiple primary endpoints, k = 2 )
#> - Multiplicity Correction: Bonferroni
#> Adjusted Significance Levels: alpha = 0.0167
#>
#> -------------------------------------------------------------
#> Parameter Value
#> Total Sample Size 120
#> Achieved Power 90
#> Power Confidence Interval 87.9 - 91.8
#> -------------------------------------------------------------
Considering a 20% dropout rate, the total sample size required is 120.