---
title: "Estimating ITS parameters from pre-intervention data"
author: "PITS package"
date: "`r Sys.Date()`"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Estimating ITS parameters from pre-intervention data}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r setup, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE, comment = "#>",
  fig.width = 7, fig.height = 4.5
)
library(PITS)
```

## Overview

Before running a power simulation, you need three **nuisance parameters**
estimated from your pre-intervention time-series data:

| Parameter | Symbol | Meaning |
|-----------|--------|---------|
| Baseline  | $\beta_0$ | Mean outcome at the start of the pre-intervention period |
| Residual SD | $\sigma$ | Outcome variability from one time point to the next |
| Autocorrelation | $\rho$ | AR(1) correlation between consecutive observations |

A fourth quantity, `trend_pre`, checks whether the outcome was stable
before the intervention. It should be close to zero.

These parameters are **not** the intervention effect — that is your clinical
hypothesis (`level_change`), which must be set separately based on
clinical judgement or published evidence.

---

## Using individual estimation functions

PITS provides individual functions for each parameter, useful when you want
fine-grained control or wish to estimate only one quantity.

```{r individual-functions}
data("example_cfr_data")
outcome <- example_cfr_data$outcome

# Baseline: intercept of a linear trend fit at t = 1
b <- estimate_baseline(outcome)
cat("Baseline:", round(b, 3), "\n")

# Sigma: residual SD after detrending
s <- estimate_sigma(outcome)
cat("Sigma:   ", round(s, 3), "\n")

# Rho: lag-1 autocorrelation of residuals
r <- estimate_rho(outcome)
cat("Rho:     ", round(r, 3), "\n")

# Pre-trend: slope per time unit (should be near zero)
t_pre <- estimate_trend(outcome)
cat("Trend:   ", round(t_pre, 4), "per month\n")
```

### When to use individual functions

- You have estimates from the literature for some parameters and want to
  estimate only the remainder
- You want to compare estimates across different subsets of the pre-period
- You are building a custom simulation pipeline

---

## Using the all-in-one wrapper

For most use cases, `estimate_its_params()` is the most convenient entry
point. It accepts either a data frame or a plain numeric vector.

```{r all-in-one}
# From a data frame with 'time' and 'outcome' columns
params <- estimate_its_params(example_cfr_data)
```

```{r all-in-one-vector}
# Or from a plain numeric vector (time index auto-generated)
params2 <- estimate_its_params(example_cfr_data$outcome, verbose = FALSE)
identical(params$sigma, params2$sigma)
```

The returned list plugs directly into `calculate_power()`:

```{r params-to-power, eval = FALSE}
result <- calculate_power(
  n_pre        = params$n_pre,
  n_post       = 30,
  baseline     = params$baseline,
  level_change = -3,          # <-- your clinical hypothesis
  sigma        = params$sigma,
  rho          = params$rho
)
```

---

## Custom column names

If your data uses different column names, pass them explicitly:

```{r custom-cols}
# Rename for illustration
alt_data <- example_cfr_data
names(alt_data) <- c("month", "cfr_pct")

params_alt <- estimate_its_params(
  alt_data,
  outcome_col = "cfr_pct",
  time_col    = "month",
  verbose     = FALSE
)
cat("Sigma:", round(params_alt$sigma, 3), "\n")
```

---

## Diagnostic plots

Always inspect the pre-intervention data before trusting parameter estimates.
`diagnose_params()` produces a 2×2 panel covering the four key checks:

```{r diag-full, fig.width = 8, fig.height = 6,
    fig.cap = "Diagnostic panel: (1) observed series with fitted trend — checks for outliers and non-linearity; (2) residuals over time — checks for heteroscedasticity; (3) Q-Q plot — checks for normality of residuals; (4) residual ACF — quantifies autocorrelation structure."}
out <- diagnose_params(example_cfr_data)
```

What to look for:

1. **Observed + trend**: a roughly flat or very gently sloped trend line.
   A steep pre-trend (> 5% of baseline per period) may indicate confounding.
2. **Residuals over time**: random scatter around zero, no funnelling or
   systematic patterns.
3. **Q-Q plot**: points close to the diagonal. Heavy tails are acceptable
   for monthly rates; severe departures suggest the Gaussian model may not
   hold and counts-based modelling (Poisson/Negative Binomial) should be
   considered.
4. **Residual ACF**: a single large bar at lag 1 followed by rapid decay
   is consistent with AR(1). Multiple large bars suggest higher-order
   autocorrelation; consider a longer pre-period or seasonal adjustment.

---

## What if I have no pre-intervention data?

If historical data are not available, use `simulate_predata()` to generate
a synthetic pre-intervention series and explore how different parameter
assumptions affect power. Then use the literature or expert elicitation to
justify your choices.

```{r simulate-predata}
# Simulate pre-intervention data with assumed parameters
pre_synthetic <- simulate_predata(
  n        = 24,
  baseline = 15,
  sigma    = 2.0,
  rho      = 0.40,
  seed     = 42
)
head(pre_synthetic)
```

```{r plot-synthetic, fig.cap = "Synthetic pre-intervention series (baseline = 15, sigma = 2, rho = 0.40)."}
plot(pre_synthetic$time, pre_synthetic$outcome,
     type = "o", pch = 16, col = "steelblue",
     xlab = "Time", ylab = "Outcome",
     main = "Synthetic pre-intervention data",
     las = 1, bty = "l")
abline(h = 15, lty = 2, col = "grey50")
grid(NULL, NULL, lwd = 0.6, col = rgb(0, 0, 0, 0.1))
```

You can then recover the estimated parameters from this synthetic series
to verify your simulation is self-consistent:

```{r verify-synthetic}
recovered <- estimate_its_params(pre_synthetic, verbose = FALSE)
cat(sprintf("Target  → baseline: 15.00  sigma: 2.000  rho: 0.400\n"))
cat(sprintf("Recovered → baseline: %.2f  sigma: %.3f  rho: %.3f\n",
            recovered$baseline, recovered$sigma, recovered$rho))
```

Small discrepancies are expected from a single 24-point realisation.
Longer pre-periods (n = 48–60) yield more reliable estimates.

---

## Handling non-standard situations

### Date-indexed time column

`estimate_its_params()` accepts date columns and converts them to a
sequential integer index automatically:

```{r date-example, eval = FALSE}
# If your data has a 'date' column of class Date:
params <- estimate_its_params(
  my_data,
  outcome_col = "cfr",
  time_col    = "date"     # Date objects are handled automatically
)
```

### Missing values

Missing values in the outcome are removed with a warning before fitting:

```{r missing-values}
data_with_na <- example_cfr_data
data_with_na$outcome[c(5, 12)] <- NA

params_na <- suppressWarnings(
  estimate_its_params(data_with_na, verbose = FALSE)
)
cat("n_pre after removing NAs:", params_na$n_pre, "\n")
```

### Very short pre-periods

Fewer than 12 observations produces a warning. Estimates become
increasingly unreliable as the pre-period shrinks, and are essentially
meaningless below n = 8.

```{r short-preperiod, warning = TRUE}
params_short <- estimate_its_params(
  example_cfr_data$outcome[1:9],
  verbose = FALSE
)
```

If you genuinely have fewer than 12 pre-intervention observations,
consider whether the ITS design is appropriate, or whether a
controlled ITS (with a comparison series) would be more robust.

---

## Typical parameter ranges for monthly hospital data

As a reference when pre-intervention data are unavailable:

| Parameter | Typical range | Notes |
|-----------|--------------|-------|
| `sigma` | 10–20% of baseline | Higher for small hospitals or rare outcomes |
| `rho` (monthly) | 0.30–0.50 | Use 0.40 as a conservative default |
| `rho` (weekly) | 0.50–0.80 | Consider aggregating to monthly |
| `rho` (daily) | 0.70–0.90 | Strong case for monthly aggregation |

---

## References

- Lopez Bernal J, et al. (2017). *Int J Epidemiol* 46:348–355.
- Wagner AK, et al. (2002). *J Clin Pharm Ther* 27:299–309.
- Zhang F, et al. (2011). *J Clin Epidemiol* 64:1252–1261.