Reproducible ILD workflows with tidyILD provenance

tidyILD records provenance at each step: data preparation, centering, lags, alignment, weighting, and model fitting. This vignette shows how to inspect history, generate methods text, build a report, export provenance, and compare two pipelines.

1. Prepare data

library(tidyILD)
set.seed(1)
d <- ild_simulate(n_id = 8, n_obs_per = 10, irregular = TRUE, seed = 42)
x <- ild_prepare(d, id = "id", time = "time", gap_threshold = 7200)

2. Center and lag

x <- ild_center(x, y)
x <- ild_lag(x, y, n = 1, mode = "gap_aware", max_gap = 7200)

3. Fit model

fit <- ild_lme(y ~ y_bp + y_wp + (1 | id), data = x, ar1 = FALSE, warn_no_ar1 = FALSE)
#> Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :
#> Model failed to converge with max|grad| = 0.0265615 (tol = 0.002, component 1)
#> Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, : Model is nearly unidentifiable: very large eigenvalue
#>  - Rescale variables?

4. Run diagnostics

diag <- ild_diagnostics(fit, type = c("residual_acf", "qq"))

5. Inspect ild_history()

ild_history() prints a human-readable log of all preprocessing and analysis steps recorded on the object.

ild_history(x)
#> ILD provenance (tidyILD 0.4.1)
#>   1. ild_prepare @ 2026-06-12T14:45:00
#>       args: id, time, gap_threshold, duplicate_handling, input_format, wide_cols, wide_names_pattern, wide_time_parser, wide_time_format, wide_keep_cols
#>       outputs: n_id, n_obs, spacing_class, source_was_tsibble, source_was_wide, wide_n_measures, wide_n_time, tsibble_interval_declared
#>   2. ild_center @ 2026-06-12T14:45:00
#>       args: vars, type, naming
#>       outputs: created
#>   3. ild_lag @ 2026-06-12T14:45:00
#>       args: vars, n, mode, max_gap, window, resolution
#>       outputs: created

For a model fit, provenance includes the source data steps plus the analysis step:

ild_history(fit)
#> ILD analysis provenance (tidyILD 0.4.1)
#>   [Source data steps: 3]
#>   1. ild_lme @ 2026-06-12T14:45:01
#>       args: formula, ar1, correlation_class, method, backend
#>       outputs: n_obs, n_id, fit_engine, backend_version

6. Generate ild_methods()

ild_methods() turns provenance into a single methods-style paragraph suitable for a manuscript.

ild_methods(fit)
#> [1] "Data were prepared using ild_prepare() with participant ID id, time variable time, and a gap threshold of 7200 units (spacing class: irregular-ish) (N = 8 persons, n = 80 observations). Predictor(s) y were person-mean centered using ild_center(), creating y_bp, y_wp. A gap_aware lag of y was computed using ild_lag() with lag 1 and max gap 7200, creating y_lag1. A mixed-effects model was fit using ild_lme() (lmer) with formula y ~ y_bp + y_wp + (1 | id) with AR1 disabled (n = 80 observations, N = 8 persons)."

If you reported fixed effects with cluster-robust SEs (e.g. via tidy_ild_model(fit, se = "robust", robust_type = "CR2")), pass that so the methods text can mention it:

ild_methods(fit, robust_se = "CR2")
#> [1] "Data were prepared using ild_prepare() with participant ID id, time variable time, and a gap threshold of 7200 units (spacing class: irregular-ish) (N = 8 persons, n = 80 observations). Predictor(s) y were person-mean centered using ild_center(), creating y_bp, y_wp. A gap_aware lag of y was computed using ild_lag() with lag 1 and max gap 7200, creating y_lag1. A mixed-effects model was fit using ild_lme() (lmer) with formula y ~ y_bp + y_wp + (1 | id) with AR1 disabled (n = 80 observations, N = 8 persons). Fixed effects were reported with cluster-robust standard errors (CR2)."

7. Run ild_report()

ild_report() assembles a standardized list: meta (n_obs, n_id, engine), methods text, the fixed-effects table, a diagnostics summary, provenance, and an optional export path.

r <- ild_report(fit)
names(r)
#> [1] "meta"                    "methods"                
#> [3] "model_table"             "diagnostics_summary"    
#> [5] "provenance"              "provenance_export_path" 
#> [7] "methods_with_guardrails"
r$meta
#> $n_obs
#> [1] 80
#> 
#> $n_id
#> [1] 8
#> 
#> $engine
#> [1] "lmer"
r$methods
#> [1] "Data were prepared using ild_prepare() with participant ID id, time variable time, and a gap threshold of 7200 units (spacing class: irregular-ish) (N = 8 persons, n = 80 observations). Predictor(s) y were person-mean centered using ild_center(), creating y_bp, y_wp. A gap_aware lag of y was computed using ild_lag() with lag 1 and max gap 7200, creating y_lag1. A mixed-effects model was fit using ild_lme() (lmer) with formula y ~ y_bp + y_wp + (1 | id) with AR1 disabled (n = 80 observations, N = 8 persons)."
r$model_table
#> # A tibble: 3 × 18
#>   term   component effect_level estimate std_error  conf_low conf_high statistic
#>   <chr>  <chr>     <chr>           <dbl>     <dbl>     <dbl>     <dbl>     <dbl>
#> 1 (Inte… fixed     population   5.29e-17  5.16e-17 -4.82e-17  1.54e-16   1.03e 0
#> 2 y_bp   fixed     between      1   e+ 0  5.66e-17  1   e+ 0  1   e+ 0   1.77e16
#> 3 y_wp   fixed     within       1.00e+ 0  8.24e-17  1   e+ 0  1.00e+ 0   1.21e16
#> # ℹ 10 more variables: p_value <dbl>, interval_type <chr>, engine <chr>,
#> #   model_class <chr>, rhat <dbl>, ess_bulk <dbl>, ess_tail <dbl>, pd <dbl>,
#> #   rope_low <dbl>, rope_high <dbl>

The return schema is stable: meta, methods, model_table, diagnostics_summary, provenance, provenance_export_path.

8. Export provenance

Export the full provenance (data + analysis steps) to JSON or YAML for reproducibility supplements or archiving.

tmp <- tempfile(fileext = ".json")
ild_export_provenance(fit, tmp, format = "json")
readLines(tmp, n = 20)
#>  [1] "{"                                              
#>  [2] "  \"version\": \"0.4.1\","                      
#>  [3] "  \"schema_version\": \"1\","                   
#>  [4] "  \"object_type\": \"ild_model\","              
#>  [5] "  \"source_data_provenance\": {"                
#>  [6] "    \"version\": \"0.4.1\","                    
#>  [7] "    \"schema_version\": \"1\","                 
#>  [8] "    \"object_type\": \"ild_data\","             
#>  [9] "    \"steps\": ["                               
#> [10] "      {"                                        
#> [11] "        \"step_id\": \"1\","                    
#> [12] "        \"step\": \"ild_prepare\","             
#> [13] "        \"timestamp\": \"2026-06-12T14:45:00\","
#> [14] "        \"args\": {"                            
#> [15] "          \"id\": \"id\","                      
#> [16] "          \"time\": \"time\","                  
#> [17] "          \"gap_threshold\": 7200,"             
#> [18] "          \"duplicate_handling\": \"first\","   
#> [19] "          \"input_format\": \"long\","          
#> [20] "          \"wide_cols\": null,"

With ild_report(), you can export in one call:

tmp2 <- tempfile(fileext = ".yaml")
r2 <- ild_report(fit, export_provenance_path = tmp2)
r2$provenance_export_path
#> [1] "/var/folders/3s/mxnfwt994sz7kmhmwt__s12m0000gn/T//RtmpLrRoDb/file2d03ad4a772.yaml"

9. Compare two pipelines

Use ild_compare_pipelines() to see how two objects differ (e.g. different gap thresholds, lag modes, or model formula).

x2 <- ild_prepare(d, id = "id", time = "time", gap_threshold = 3600)
x2 <- ild_center(x2, y)
x2 <- ild_lag(x2, y, n = 1, mode = "index")
fit2 <- ild_lme(y ~ y_bp + y_wp + (1 | id), data = x2, ar1 = FALSE, warn_no_ar1 = FALSE)
#> Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :
#> Model failed to converge with max|grad| = 0.0265615 (tol = 0.002, component 1)
#> Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, : Model is nearly unidentifiable: very large eigenvalue
#>  - Rescale variables?

cmp <- ild_compare_pipelines(fit, fit2)
cmp
#> Pipeline comparison
#>   Different gap_threshold (ild_prepare): 7200 vs 3600
#>   Different mode (ild_lag): gap_aware vs index
#>   Different max_gap (ild_lag): 7200 vs Inf

This makes it easy to document sensitivity analyses or to check what changed between two analyses.