The unsurv package clusters full survival trajectories rather than baseline covariates. This vignette walks through a minimal workflow: simulate curves, fit the model, choose the number of clusters automatically, predict new samples, and assess stability.
We create three prognosis groups with different exponential hazards and add noise so the curves are not perfectly smooth.
Leaving K = NULL lets unsurv pick the
number of clusters using the mean silhouette over
2:K_max.
fit <- unsurv(S, times, K = NULL, K_max = 6, distance = "L2",
enforce_monotone = TRUE, smooth_median_width = 5,
standardize_cols = TRUE, eps_jitter = 0.0005)
fit
#> unsurv (PAM) fit
#> K:3
#> distance:L2 silhouette_mean:0.810
#> n:150 Q:60Key slots:
K: chosen cluster countclusters: assignments for each curvemedoids: representative survival curvessilhouette_mean: average silhouette widthFor a ggplot2 version:
New curves must use the same time grid as the fit. Preprocessing (clamping, monotonicity, smoothing, standardization) is reused automatically.
Resampling gives a sense of how stable the clustering is to perturbations.
stab <- unsurv_stability(
S, times, fit,
B = 20, frac = 0.7,
mode = "subsample",
jitter_sd = 0.01,
weight_perturb = 0.05,
return_distribution = TRUE
)
stab$mean
#> [1] 1Higher mean ARI indicates more reproducible clusters.
times is strictly increasing and matches the
number of columns in S.enforce_monotone = TRUE or increase
smooth_median_width to an odd integer ≥ 3.distance = "L1" when robustness to large deviations
at a few time points is desired.weights lets you emphasize clinically important
time windows. Weights are normalized internally.Set seed inside unsurv() for deterministic
PAM initialization and silhouette selection. Vignette figures may differ
slightly because of noise added to simulated curves.