README

The goal of splinetrials is to analyze longitudinal clinical data applying natural cubic splines to a continuous time variable.

Installation

install.packages("splinetrials")

# install.packages("pak")
pak::pak("NikKrieger/splinetrials")

Setup

library(survival)
library(mmrm)
library(splinetrials)
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(ggplot2)

Overview

The NCS model is a longitudinal mixed model of repeated measures (MMRM) with random effects marginalized out, additive baseline covariates, and time parameterized using natural cubic splines.

Demo

Data

The package functions accept data sets with one row per patient per scheduled visit (including baseline).

pbcseq_mod <- 
  survival::pbcseq |> 
  mutate(
    years = day / 365.25,
    visit_index =
      bin_timepoints( # splinetrials helper function yielding an ordered factor
        observed = years,
        scheduled = c(0, 0.5, 1:round(max(years))),
        labels = c("Baseline", paste(c(0.5, 1:round(max(years))), "yr"))
      ),
    sch_visit_years = c(0, 0.5, 1:round(max(years)))[as.numeric(visit_index)],
    sch_visit_label = as.character(visit_index),
    trt = factor(trt, labels = c("control", "treatment"))
  ) |> 
  arrange(id, visit_index, abs(years - sch_visit_years)) |>
  distinct(id, visit_index, .keep_all = TRUE) |> 
  select(id, visit_index:sch_visit_label, trt, everything()) |> 
  as_tibble()

pbcseq_mod
#> # A tibble: 1,906 × 23
#>       id visit_index sch_visit_years sch_visit_label trt     futime status   age
#>    <int> <ord>                 <dbl> <chr>           <fct>    <int>  <int> <dbl>
#>  1     1 Baseline                0   Baseline        treatm…    400      2  58.8
#>  2     1 0.5 yr                  0.5 0.5 yr          treatm…    400      2  58.8
#>  3     2 Baseline                0   Baseline        treatm…   5169      0  56.4
#>  4     2 0.5 yr                  0.5 0.5 yr          treatm…   5169      0  56.4
#>  5     2 1 yr                    1   1 yr            treatm…   5169      0  56.4
#>  6     2 2 yr                    2   2 yr            treatm…   5169      0  56.4
#>  7     2 5 yr                    5   5 yr            treatm…   5169      0  56.4
#>  8     2 6 yr                    6   6 yr            treatm…   5169      0  56.4
#>  9     2 7 yr                    7   7 yr            treatm…   5169      0  56.4
#> 10     2 8 yr                    8   8 yr            treatm…   5169      0  56.4
#> # ℹ 1,896 more rows
#> # ℹ 15 more variables: sex <fct>, day <int>, ascites <int>, hepato <int>,
#> #   spiders <int>, edema <dbl>, bili <dbl>, chol <int>, albumin <dbl>,
#> #   alk.phos <int>, ast <dbl>, platelet <int>, protime <dbl>, stage <int>,
#> #   years <dbl>

Notice how the helper function splinetrials::bin_timepoints() creates the visit index, categorizing follow-up visits based on when they were scheduled (see ?survival::pbcseq).

Basic NCS analysis

Once we have these variables created, we are ready for a full NCS analysis using ncs_analysis(). This produces a table of summary statistics, including LS means and confidence intervals:

pbc_spline_analysis <-
  ncs_analysis(
    data = pbcseq_mod,
    response = alk.phos,
    subject = id,
    arm = trt,
    control_group = "control",
    time_observed_continuous = years,
    time_observed_index = visit_index,
    time_scheduled_continuous = sch_visit_years,
    time_scheduled_label = sch_visit_label,
    covariates = ~ age + sex + bili + albumin + ast + protime
  )
#> → The covariance structure `"us"` did not successfully converge.
#> Warning in mmrm::mmrm(method = "Satterthwaite", formula = alk.phos ~
#> spline_fn(years)[, : Divergence with optimizer L-BFGS-B due to problems:
#> L-BFGS-B needs finite values of 'fn'
#> mmrm() registered as emmeans extension
pbc_spline_analysis
#> # A tibble: 32 × 32
#>    arm     time         n   est    sd    se lower upper response_est response_se
#>    <fct>   <chr>    <int> <dbl> <dbl> <dbl> <dbl> <dbl>        <dbl>       <dbl>
#>  1 control Baseline   154 1943. 2102. 169.  1611. 2275.        1424.        77.0
#>  2 control 0.5 yr     131 1574. 1134.  99.1 1379. 1768.        1377.        71.9
#>  3 control 1 yr       130 1543.  934.  81.9 1382. 1703.        1330.        68.6
#>  4 control 2 yr       111 1481. 1076. 102.  1281. 1681.        1243.        67.2
#>  5 control 3 yr        83 1252.  662.  72.7 1110. 1395.        1169.        69.1
#>  6 control 4 yr        75 1182.  716.  82.7 1020. 1344.        1107.        71.0
#>  7 control 5 yr        61 1250.  902. 115.  1023. 1476.        1058.        71.1
#>  8 control 6 yr        56 1073.  589.  78.7  919. 1228.        1019.        69.4
#>  9 control 7 yr        47 1022.  601.  87.6  851. 1194.         989.        66.6
#> 10 control 8 yr        37 1021.  554.  91.1  842. 1199.         968.        64.4
#> # ℹ 22 more rows
#> # ℹ 22 more variables: response_df <dbl>, response_lower <dbl>,
#> #   response_upper <dbl>, change_est <dbl>, change_se <dbl>, change_df <dbl>,
#> #   change_lower <dbl>, change_upper <dbl>, change_test_statistic <dbl>,
#> #   change_p_value <dbl>, diff_est <dbl>, diff_se <dbl>, diff_df <dbl>,
#> #   diff_lower <dbl>, diff_upper <dbl>, diff_test_statistic <dbl>,
#> #   diff_p_value <dbl>, percent_slowing_est <dbl>, …

Note that the above warnings are expected, as splinetrials and mmrm::mmrm() iterate through different covariance structures and optimizers until a converging model is achieved.

We can plot the resulting model against the data by feeding the resulting data set to ncs_plot_means():

p1 <- ncs_plot_means(pbc_spline_analysis)
p1

Methods

The parameterization has additive effects and pairwise interactions for study arm and time point (i.e., spline basis functions). The model is constrained so that all study arms have equal means at baseline: notice that the same line passes through both the control group’s and the treatment group’s modeled mean baseline estimates:

p1 +
  geom_hline(
    aes(yintercept = response_est),
    data = filter(pbc_spline_analysis, time == "Baseline"),
    linetype = 3
  )

NCS subgroup analysis

Run a NCS analysis with subgroups using ncs_analysis_subgroup(). The data set should include a categorical variable to indicate subgroup membership. In this case, the sex variable serves as the subgroup.

subgroup_analysis_results <-
  ncs_analysis_subgroup(
    data = pbcseq_mod,
    response = alk.phos,
    subject = id,
    arm = trt,
    control_group = "control",
    subgroup = sex,
    subgroup_comparator = "f",
    time_observed_continuous = years,
    time_observed_index = visit_index,
    time_scheduled_continuous = sch_visit_years,
    time_scheduled_label = sch_visit_label,
    covariates = ~ age + bili + albumin + ast + protime,
    cov_structs = "ar1h",
    return_models = TRUE
  )
#> mmrm() registered as car::Anova extension
#> Warning in mmrm::mmrm(formula = alk.phos ~ spline_fn(years)[, 1] +
#> spline_fn(years)[, : Divergence with optimizer L-BFGS-B due to problems:
#> L-BFGS-B needs finite values of 'fn'
#> Warning in mmrm::mmrm(formula = alk.phos ~ spline_fn(years)[, 1] +
#> spline_fn(years)[, : Divergence with optimizer L-BFGS-B due to problems:
#> L-BFGS-B needs finite values of 'fn'

This function returns a list of multiple analyses, which are better looked at one at a time:

Between subgroups and within subgroups

The between and within tables contain the same data except for the treatment effect columns. Note also that they are sorted differently.

between

The between table calculates treatment effects between the subgroups within each study arm:

subgroup_analysis_results$between
#> # A tibble: 62 × 30
#>    arm       time     subgroup     n   est    sd    se lower upper response_est
#>    <fct>     <chr>    <fct>    <int> <dbl> <dbl> <dbl> <dbl> <dbl>        <dbl>
#>  1 control   Baseline f          139 1966. 2066. 175.  1622. 2309.        1429.
#>  2 control   Baseline m           15 1734. 2478. 640.   480. 2989.        1640.
#>  3 treatment Baseline f          137 1950. 2151. 184.  1590. 2310.        1429.
#>  4 treatment Baseline m           21 2486. 2385. 521.  1466. 3506.        1640.
#>  5 control   0.5 yr   f          119 1619. 1162. 106.  1410. 1827.        1377.
#>  6 control   0.5 yr   m           12 1134.  714. 206.   730. 1538.        1572.
#>  7 treatment 0.5 yr   f          108 1348.  818.  78.7 1194. 1502.        1385.
#>  8 treatment 0.5 yr   m           17 1646.  821. 199.  1256. 2037.        1592.
#>  9 control   1 yr     f          118 1575.  944.  86.9 1405. 1745.        1325.
#> 10 control   1 yr     m           12 1229.  796. 230.   778. 1679.        1506.
#> # ℹ 52 more rows
#> # ℹ 20 more variables: response_se <dbl>, response_df <dbl>,
#> #   response_lower <dbl>, response_upper <dbl>, change_est <dbl>,
#> #   change_se <dbl>, change_df <dbl>, change_lower <dbl>, change_upper <dbl>,
#> #   change_test_statistic <dbl>, change_p_value <dbl>, diff_subgroup_est <dbl>,
#> #   diff_subgroup_se <dbl>, diff_subgroup_df <dbl>, diff_subgroup_lower <dbl>,
#> #   diff_subgroup_upper <dbl>, diff_subgroup_test_statistic <dbl>, …

within

The within table calculates treatment effects between each study arm within each subgroup:

subgroup_analysis_results$within
#> # A tibble: 62 × 33
#>    arm     time     subgroup     n   est    sd    se lower upper response_est
#>    <fct>   <chr>    <fct>    <int> <dbl> <dbl> <dbl> <dbl> <dbl>        <dbl>
#>  1 control Baseline f          139 1966. 2066. 175.  1622. 2309.        1429.
#>  2 control 0.5 yr   f          119 1619. 1162. 106.  1410. 1827.        1377.
#>  3 control 1 yr     f          118 1575.  944.  86.9 1405. 1745.        1325.
#>  4 control 2 yr     f          100 1527. 1109. 111.  1310. 1744.        1230.
#>  5 control 3 yr     f           74 1271.  685.  79.6 1115. 1427.        1151.
#>  6 control 4 yr     f           68 1193.  743.  90.2 1016. 1369.        1088.
#>  7 control 5 yr     f           54 1297.  949. 129.  1044. 1550.        1040.
#>  8 control 6 yr     f           49 1093.  619.  88.5  920. 1266.        1005.
#>  9 control 7 yr     f           40 1049.  641. 101.   850. 1247.         981.
#> 10 control 8 yr     f           31 1088.  575. 103.   885. 1290.         969.
#> # ℹ 52 more rows
#> # ℹ 23 more variables: response_se <dbl>, response_df <dbl>,
#> #   response_lower <dbl>, response_upper <dbl>, change_est <dbl>,
#> #   change_se <dbl>, change_df <dbl>, change_lower <dbl>, change_upper <dbl>,
#> #   change_test_statistic <dbl>, change_p_value <dbl>, diff_arm_est <dbl>,
#> #   diff_arm_se <dbl>, diff_arm_df <dbl>, diff_arm_lower <dbl>,
#> #   diff_arm_upper <dbl>, diff_arm_test_statistic <dbl>, …

Plotting the means

We can supply either of the above result entries (i.e., between or within) to ncs_plot_means_subgroup(). This function creates a grid of plots that again depict the actual and modeled mean responses at each timepoint. Each column of plots contains a study arm and each row of plots contains a subgroup.

We again include horizontal lines to demonstrate that the study arms all have the same modeled mean baseline estimate:

ncs_plot_means_subgroup(subgroup_analysis_results$between) +
  geom_hline(
    aes(yintercept = response_est),
    data = filter(subgroup_analysis_results$between, time == "Baseline"),
    linetype = 3,
    lwd = 0.5
  )

Type-III ANOVA

subgroup_analysis_results$type3
#> # A tibble: 14 × 6
#>    effect             chisquare_test_stati…¹    df p_value correlation optimizer
#>    <chr>                               <dbl> <int>   <dbl> <chr>       <chr>    
#>  1 spline_fn(years)[…              30.2          1 3.84e-8 heterogene… mmrm+tmb 
#>  2 spline_fn(years)[…              13.5          1 2.43e-4 heterogene… mmrm+tmb 
#>  3 sex                              1.26         1 2.62e-1 heterogene… mmrm+tmb 
#>  4 age                             23.5          1 1.24e-6 heterogene… mmrm+tmb 
#>  5 bili                             6.08         1 1.37e-2 heterogene… mmrm+tmb 
#>  6 albumin                          7.16         1 7.45e-3 heterogene… mmrm+tmb 
#>  7 ast                              2.95         1 8.57e-2 heterogene… mmrm+tmb 
#>  8 protime                          1.76         1 1.85e-1 heterogene… mmrm+tmb 
#>  9 spline_fn(years)[…               1.20         1 2.74e-1 heterogene… mmrm+tmb 
#> 10 spline_fn(years)[…               2.70         1 1.01e-1 heterogene… mmrm+tmb 
#> 11 spline_fn(years)[…               0.000212     1 9.88e-1 heterogene… mmrm+tmb 
#> 12 spline_fn(years)[…               1.29         1 2.56e-1 heterogene… mmrm+tmb 
#> 13 spline_fn(years)[…               0.0588       1 8.08e-1 heterogene… mmrm+tmb 
#> 14 spline_fn(years)[…               0.0164       1 8.98e-1 heterogene… mmrm+tmb 
#> # ℹ abbreviated name: ¹chisquare_test_statistic

Subgroup interaction test

The optional interaction result contains an ANOVA comparing the original model fit to a reduced version:

subgroup_analysis_results$interaction
#>           model      aic      bic    loglik -2*log(l) test_statistic df
#> 1 reduced model 29201.78 29314.07 -14570.89  29141.78             NA NA
#> 2    full model 29205.64 29325.42 -14570.82  29141.64      0.1351363  2
#>    p_value                          correlation optimizer
#> 1       NA heterogeneous autoregressive order 1  mmrm+tmb
#> 2 0.934664 heterogeneous autoregressive order 1  mmrm+tmb

return_models = TRUE

Lastly, the user can optionally obtain the original analysis model object as well as those used for the subgroup interaction test, which are returned as the analysis_model, full, and reduced entries. These are mmrm objects created using mmrm::mmrm(). Here is analysis_model, for example:

subgroup_analysis_results$analysis_model
#> mmrm fit
#> 
#> Formula:     alk.phos ~ spline_fn(years)[, 1] + spline_fn(years)[, 2] + sex +  
#>     age + bili + albumin + ast + protime + spline_fn(years)[,  
#>     1]:sex + spline_fn(years)[, 2]:sex + spline_fn(years)[, 1]:trt +  
#>     spline_fn(years)[, 2]:trt + spline_fn(years)[, 1]:sex:trt +  
#>     spline_fn(years)[, 2]:sex:trt
#> Data:        dplyr::mutate(pbcseq_mod, sex = factor(sex, c("f", "m"))) (used 
#> 1860 observations from 312 subjects with maximum 16 timepoints)
#> Covariance:  heterogeneous auto-regressive order one (17 variance parameters)
#> Inference:   REML
#> Deviance:    29002.11
#> 
#> Coefficients: 
#>                             (Intercept)                   spline_fn(years)[, 1] 
#>                             2721.322358                             -815.000265 
#>                   spline_fn(years)[, 2]                                    sexm 
#>                             -217.027228                              210.533944 
#>                                     age                                    bili 
#>                              -18.112882                               22.507712 
#>                                 albumin                                     ast 
#>                              -95.578177                                1.296681 
#>                                 protime              spline_fn(years)[, 1]:sexm 
#>                              -28.763036                             -458.009692 
#>              spline_fn(years)[, 2]:sexm      spline_fn(years)[, 1]:trttreatment 
#>                             -478.555152                              -53.251556 
#>      spline_fn(years)[, 2]:trttreatment spline_fn(years)[, 1]:sexm:trttreatment 
#>                             -299.228629                              113.096456 
#> spline_fn(years)[, 2]:sexm:trttreatment 
#>                              -78.148527 
#> 
#> Model Inference Optimization:
#> Converged with code 0 and message: convergence: rel_reduction_of_f <= factr*epsmch

Appendix: details of the analysis results tables

Basic NCS analysis

The basic NCS analysis results table pbc_spline_analysis has one row per study arm per timepoint of interest and the following columns:

Subgroup analysis

between and within tables

The between-subgroup and within-subgroup tables have one row per study arm per timepoint per subgroup and the following columns:

type3 column descriptions

The Type-III ANOVA table has one row per effect in the main analysis model and the following columns:

interaction column descriptions

The subgroup interaction test table will have two rows: the first will represent the reduced model and the second will depict the full model. Likelihood ratio test results will only be in the latter row. The table will contain the following columns:

splinetrials

Installation

Setup

Overview

Demo

Data

Basic NCS analysis

Methods

NCS subgroup analysis

Between subgroups and within subgroups

`between`

`within`

Plotting the means

Type-III ANOVA

Subgroup interaction test

`return_models = TRUE`

Appendix: details of the analysis results tables

Basic NCS analysis

Subgroup analysis

`between` and `within` tables

`type3` column descriptions

`interaction` column descriptions

splinetrials

Installation

Setup

Overview

Demo

Data

Basic NCS analysis

Methods

NCS subgroup analysis

Between subgroups and within subgroups

between

within

Plotting the means

Type-III ANOVA

Subgroup interaction test

return_models = TRUE

Appendix: details of the analysis results tables

Basic NCS analysis

Subgroup analysis

between and within tables

type3 column descriptions

interaction column descriptions

`between`

`within`

`return_models = TRUE`

`between` and `within` tables

`type3` column descriptions

`interaction` column descriptions