Clustering Individualized Survival Curves with unsurv

Imad EL BADISY

2026-03-12

The unsurv package clusters full survival trajectories rather than baseline covariates. This vignette walks through a minimal workflow: simulate curves, fit the model, choose the number of clusters automatically, predict new samples, and assess stability.

Simulate individualized survival curves

We create three prognosis groups with different exponential hazards and add noise so the curves are not perfectly smooth.

library(unsurv)

set.seed(2026)
n <- 150
Q <- 60
times <- seq(0, 5, length.out = Q)

group <- sample(1:3, n, TRUE, prob = c(0.35, 0.4, 0.25))
haz   <- c(0.18, 0.45, 0.8)[group]

S <- sapply(times, function(t) exp(-haz * t))
S <- S + matrix(rnorm(n * Q, 0, 0.02), nrow = n)
S[S < 0] <- 0
S[S > 1] <- 1

Fit clustering with automatic K selection

Leaving K = NULL lets unsurv pick the number of clusters using the mean silhouette over 2:K_max.

fit <- unsurv(S, times, K = NULL, K_max = 6, distance = "L2",
              enforce_monotone = TRUE, smooth_median_width = 5,
              standardize_cols = TRUE, eps_jitter = 0.0005)
fit
#> unsurv (PAM) fit
#>   K:3
#>   distance:L2 silhouette_mean:0.810
#>   n:150 Q:60

Key slots:

Visualize medoids

plot(fit)
Cluster medoid survival curves.
Cluster medoid survival curves.

For a ggplot2 version:

library(ggplot2)
autoplot(fit)
Medoid curves via ggplot2 autoplot.
Medoid curves via ggplot2 autoplot.

Predict cluster membership for new curves

New curves must use the same time grid as the fit. Preprocessing (clamping, monotonicity, smoothing, standardization) is reused automatically.

new_curves <- S[1:5, ]
predict(fit, new_curves)
#> [1] 1 1 2 2 1

Stability assessment

Resampling gives a sense of how stable the clustering is to perturbations.

stab <- unsurv_stability(
  S, times, fit,
  B = 20, frac = 0.7,
  mode = "subsample",
  jitter_sd = 0.01,
  weight_perturb = 0.05,
  return_distribution = TRUE
  )
stab$mean
#> [1] 1

Higher mean ARI indicates more reproducible clusters.

Tips and troubleshooting

Reproducibility

Set seed inside unsurv() for deterministic PAM initialization and silhouette selection. Vignette figures may differ slightly because of noise added to simulated curves.

mirror server hosted at Truenetwork, Russian Federation.