Help for package traj

Title:

Feature-Based Clustering of Longitudinal Trajectories

Version:

3.0.0

Description:

Identifies clusters of individual longitudinal trajectories. In the spirit of Leffondre et al. (2004), the procedure involves identifying each trajectory to a point in the space of measures. In this context, a measure is a quantity meant to capture a certain characteristic feature of the trajectory. The points in the space of measures are then clustered using a version of spectral clustering.

License:

MIT + file LICENSE

URL:

https://CRAN.R-project.org/package=traj

Encoding:

UTF-8

RoxygenNote:

7.3.2

Suggests:

knitr, rmarkdown, testthat (≥ 3.0.0)

Config/testthat/edition:

Imports:

stats, cluster, clusterCrit, fclust, igraph, e1071

Depends:

R (≥ 2.10)

LazyData:

true

NeedsCompilation:

Packaged:

2026-01-28 21:32:16 UTC; Moi

Author:

Marie-Pierre Sylvestre [aut], Laurence Boulanger [aut, cre], Gillis Delmas Tchouangue Dinkou [ctb], Dan Vatnik [ctb]

Maintainer:

Laurence Boulanger <laurence.boulanger@umontreal.ca>

Repository:

CRAN

Date/Publication:

2026-01-28 21:50:02 UTC

traj: Feature-Based Clustering of Longitudinal Trajectories

Description

Author(s)

Maintainer: Laurence Boulanger laurence.boulanger@umontreal.ca

Authors:

Marie-Pierre Sylvestre marie-pierre.sylvestre@umontreal.ca

Other contributors:

Gillis Delmas Tchouangue Dinkou [contributor]
Dan Vatnik [contributor]

Plot `trajClusters` object

Description

Plots the curves corresponding to (or closest to) the centroids of the clusters and plots a random sample from each groups.

Usage

## S3 method for class 'trajClusters'
plot(x, sample.size = 5, ask = TRUE, which.plots = NULL, ...)

scatterplots(x, ask = TRUE, ...)

CVIplot(x, ...)

Arguments

x

object of class trajClusters as returned by the function trajClusters().

sample.size

the number of random trajectories to be randomly sampled from each cluster. Defaults to 5.

ask

logical. If TRUE, the user is asked before each plot. Defaults to TRUE.

which.plots

either NULL or a vector of integers. If NULL, every available plot is displayed. If a vector is supplied, only the corresponding plots will be displayed.

...

other parameters to be passed through to plotting functions.

Examples

## Not run: 
data("trajdata")
trajdata.noGrp <- trajdata[, -which(colnames(trajdata) == "Group")] #remove the Group column

m = trajMeasures(trajdata.noGrp, ID = TRUE)
c3 = trajClusters(m, nclusters = 3)

plot(c3)

#The pointwise mean trajectories correspond to the third and fourth displayed plots.

c4 = trajClusters(m, nclusters = 4)

plot(c4, which.plots = 3:4)


## End(Not run)

Classify the Longitudinal Data Based on the Measures.

Description

Classifies the trajectories by applying a nonparametric clustering algorithm to the measures computed by trajMeasures().

Usage

trajClusters(
  Measures,
  select = NULL,
  fuzzy = FALSE,
  nclusters = NULL,
  nstart = 50
)

## S3 method for class 'trajClusters'
print(x, ...)

## S3 method for class 'trajClusters'
summary(object, ...)

Arguments

Measures

object of class trajMeasures as returned by the function trajMeasures().

select

an optional vector of positive integers corresponding to the measures to use in the clustering. Defaults to NULL, which uses all the measures contained in Measures.

fuzzy

logical. If FALSE, each trajectory is assigned to a unique group. If TRUE, each trajectory is assigned a "degree of membership" to each group. Defaults to FALSE.

nclusters

The desired number of clusters. If NULL, clustering is carried out for every number of clusters between 2 and (up to) 8 and the "best" number of clusters is used, as judged by the combination of three internal cluster validity indices. See section 'Value' for more details. Defaults to NULL.

nstart

The number of random starts. Defaults to 50.

x

object of class trajClusters.

...

further arguments passed to or from other methods.

object

object of class trajClusters.

Details

The spectral clustering algorithm presented in Meila (2005) is implemented in which the similarity matrix S is built from a binary K nearest neighbors similarity function (S=(W+W^T)/2, where W_{ij}=1 if data point j is among the nearest points to data point i and W_{ij}=0 otherwise).

Value

An object of class trajClusters; a list containing the result of the clustering, as well as a curated form of the arguments. If nclusters is set to NULL, clustering is carried out for each number k of clusters between 2 and (up to) 8 and a plot is produced representing the value of three internal cluster validity indices (C-index, Calinski-Harabasz, Wemmert-Gancarski) as a function of k. As in the 'KmL' package of Genolini et al., these validity indices are presented on a scale from 0 to 1, with 1 corresponding to the highest validity score and 0 corresponding to the lowest. From this, a "best" value of k is determined using a ranked voting system.

References

Genolini, C. et al., kml: K-Means for Longitudinal Data, https://CRAN.R-project.org/package=kml

Meila, M., Spectral Clustering. Handbook of Cluster Analysis, Chapter 7, Chapman and Hall/CRC, 2005.

Examples

## Not run: 
data("trajdata")
trajdata.noGrp <- trajdata[, -which(colnames(trajdata) == "Group")] # remove the Group column

m = trajMeasures(trajdata.noGrp, ID = TRUE, measures = 1:19)

s2.3 <- trajClusters(m, nclusters = 3)
plot(s2.3)

#'s2.4 <- trajClusters(m, nclusters = 4)
plot(s2.4)

#'s2.5 <- trajClusters(m, nclusters = 5)
plot(s2.5)

groups <- s2.4 <- trajClusters(m, nclusters = 4)$partition

## End(Not run)

Compute Measures for Identifying Patterns of Change in Longitudinal Data

Description

trajMeasures computes up to 20 measures for each longitudinal trajectory. See Details for the list of measures.

Usage

trajMeasures(
  Data,
  Time = NULL,
  ID = FALSE,
  measures = c(1:10, 12:20),
  midpoint = NULL,
  cap.outliers = FALSE
)

## S3 method for class 'trajMeasures'
print(x, ...)

## S3 method for class 'trajMeasures'
summary(object, ...)

Arguments

Data

a matrix or data frame in which each row contains the longitudinal data (trajectories).

Time

either NULL, a vector or a matrix/data frame of the same dimension as Data. If a vector, matrix or data frame is supplied, its entries are assumed to be measured at the times of the corresponding cells in Data. When set to NULL (the default), the times are assumed equidistant.

ID

logical. Set to TRUE if the first columns of Data and Time corresponds to an ID variable identifying the trajectories. Defaults to FALSE.

measures

a vector containing the numerical identifiers of the measures to compute. The default, c(1:10,12:20), excludes the measure which require specifying a midpoint.

midpoint

specifies which column of Time to use as the midpoint in measure 11 Can be NULL, an integer or a vector of integers of length the number of rows in Time. The default is NULL, in which case the midpoint is the time closest to the median of the Time vector specific to each trajectory.

cap.outliers

logical. If TRUE, extreme values of the measures will be capped. Defaults to FALSE.

x

object of class trajMeasures.

...

further arguments passed to or from other methods.

object

object of class trajMeasures.

Details

Each trajectory must have a minimum of 3 observations, otherwise it is omitted from the analysis. The 20 measures and their numerical identifiers are listed below. Please refer to the vignette for the specific formulas used to compute them.

Maximum
Minimum
Range
Mean value
Standard deviation
Slope of the affine approximation
Intercept of the affine approximation
Proportion of variance explained by the affine approximation
Rate of intersection with the best affine approximation
Net variation per unit of time
Late variation to early variation contrast
Total variation per unit time
Spikiness
Maximum of the first derivative
Minimum of the first derivative
Standard deviation of the first derivative
First derivative's net variation per unit of time
Maximum of the second derivative
Minimum of the second derivative
Standard deviation of the second derivative

If 'cap.outliers' is set to TRUE, Nishiyama's improved Chebychev bound for continuous distributions is used to determine extreme values for each measure, corresponding to a 0.3% probability threshold. Extreme values beyond the threshold are then capped to the 0.3% probability threshold (see vignette for more details).

Value

An object of class trajMeasures; a list containing the values of the measures, a table of the outliers which have been capped, as well as a curated form of the function's arguments.

References

Leffondre K, Abrahamowicz M, Regeasse A, Hawker GA, Badley EM, McCusker J, Belzile E. Statistical measures were proposed for identifying longitudinal patterns of change in quantitative health indicators. J Clin Epidemiol. 2004 Oct;57(10):1049-62. doi: 10.1016/j.jclinepi.2004.02.012. PMID: 15528056.

Nishiyama T, Improved Chebyshev inequality: new probability bounds with known supremum of PDF, arXiv:1808.10770v2 stat.ME https://doi.org/10.48550/arXiv.1808.10770

Examples

## Not run: 
data("trajdata")
trajdata.noGrp <- trajdata[, -which(colnames(trajdata) == "Group")] #remove the Group column

m1 = trajMeasures(trajdata.noGrp, ID = TRUE, measures = 19, midpoint = NULL)
m2 = trajMeasures(trajdata.noGrp, ID = TRUE, measures = 19, midpoint = 3)

identical(m1$measures, m2$measures)

## End(Not run)

Select a Subset of the Measures Using a Similarity Index on the Set of Clusterings

Description

This function examines the effect of reducing the number of measures on which the trajectories are clustered. Specifically, starting from a clustering C in the form of an object of class trajClusters and a choice of a similarity index to compare clusterings, this function finds the subset of measures which results in the clustering most similar to C.

Usage

trajReduce(Measures, Clusters, index = "ARI", keep = 3)

Arguments

Measures

object of class trajMeasures as returned by trajMeasures.

Clusters

object of class trajClusters as returned by trajClusters.

index

The similarity index. Either "ARI" for the Adjusted Rand Index of Hubert and Arabie (1985), "nVId" for the normalized variation of information distance (eg. Meila (2007)) or "nSJd" for the normalized split/joint distance of van Dongen (2000).

keep

The number of measures to keep. Defaults to 3.

Details

The Rand index ranges from 0 to 1 with 0 indicating identical clusters and 1 indicating maximally different clusters. The normalized variation of information distance (nVId) and normalized split-join distance (nSJd) and have the opposite interpretation with 0 indicating maximally different clusters and 1 indicating identical clusters. Therefor, to facilitate comparison, we plot 1 - nVId (resp. 1 - nSJd) instead of nVId (resp. nSJd).

References

Hubert L, Arabie P. Comparing partitions. Journal of Classification 2:193-218, 1985.

Meila M. Comparing clusterings – an information based distance. Journal of Multivariate Analysis, 98, pp 873-895, 2007.

van Dongen S. Performance criteria for graph clustering and Markov cluster experiments. Technical Report INS-R0012, National Research Institute for Mathematics and Computer Science in the Netherlands, Amsterdam, May 2000.

Examples

## Not run: 
data("trajdata")
trajdata.noGrp <- trajdata[, -which(colnames(trajdata) == "Group")] #remove the Group column

m = trajMeasures(trajdata.noGrp, ID = TRUE)
trajReduce(m)

## End(Not run)

trajdata

Description

An artificially created data set with 130 trajectories split into four groups, labelled A, B, C, D according to the data generating process.

Usage

trajdata

Format

This data frame has 130 rows and the following 7 columns:

ID: An identification variable that runs from 1 to 130.
Group: A character variable that's either "A", "B", "C" or "D" depending on which of the four data generating process the trajectory is coming from.
X1: The observation of the trajectory at time t = 1.
X2: The observation of the trajectory at time t = 2.
X3: The observation of the trajectory at time t = 3.
X4: The observation of the trajectory at time t = 4.
X5: The observation of the trajectory at time t = 5.
X6: The observation of the trajectory at time t = 6.

traj: Feature-Based Clustering of Longitudinal Trajectories

Description

Author(s)

See Also

Plot trajClusters object

Description

Usage

Arguments

Examples

Classify the Longitudinal Data Based on the Measures.

Description

Usage

Arguments

Details

Value

References

Examples

Compute Measures for Identifying Patterns of Change in Longitudinal Data

Description

Usage

Arguments

Details

Value

References

Examples

Select a Subset of the Measures Using a Similarity Index on the Set of Clusterings

Description

Usage

Arguments

Details

References

Examples

trajdata

Description

Usage

Format

Plot `trajClusters` object