Tools for cleaning high-frequency real-time location tracking data.
trackclean was developed to process data from playground
movement research, but applies to any study collecting high-frequency
positional data from people moving within a defined space — classrooms,
sports facilities, rehabilitation settings, and similar
environments.
# Install from CRAN
install.packages("trackclean")
# Or install the development version from GitHub
# install.packages("devtools")
devtools::install_github("tomasbil/trackclean")The package includes a small example dataset that can be used to trial the full pipeline without any real data. It simulates 10 children tracked during a school recess on a 40m × 60m playground using a UWB positioning system.
library(trackclean)
library(readr)
raw_data <- read_csv(system.file("extdata", "raw_tracking_data.csv", package = "trackclean"))
id_mapping <- system.file("extdata", "id_mapping.csv", package = "trackclean")The example dataset includes: - 10 participants with raw tag IDs
1–10, mapped to child IDs 5001–5010 - ~13.5 minutes of data
(11:45:00–11:58:30), with observations both inside and outside the
analysis window - Sub-second timestamps causing multiple readings per
second — handled by standardize_to_seconds() - Randomly
dropped seconds creating gaps — handled by
interpolate_gaps() - One tag replacement: participant 5003
starts on raw tag ID 3, which is swapped to raw tag ID 11 at 11:51:00 —
handled by fix_tag_replacement()
Analysis parameters for this dataset:
| Parameter | Value |
|---|---|
analyze_start |
"2025-03-18 11:47:00" |
analyze_end |
"2025-03-18 11:57:00" |
bell_start |
"2025-03-18 11:53:00" |
bell_end |
"2025-03-18 11:58:00" |
| Tag replacement | raw_id 3 → raw_id 11 at "2025-03-18 11:51:00" |
Raw tracking data
(raw_tracking_data.csv):
| ID | At | X | Y |
|---|---|---|---|
| 1 | 2025-03-18 11:45:00.00 | 5.000 | 10.000 |
| 1 | 2025-03-18 11:45:01.00 | 5.383 | 10.239 |
| 1 | 2025-03-18 11:45:01.47 | 5.341 | 10.261 |
| … |
ID: raw tag ID as assigned by the tracking systemAt: timestamp (POSIXct-readable, sub-second precision
supported)X, Y: position in metersID mapping (id_mapping.csv):
| raw_id | child_id |
|---|---|
| 1 | 5001 |
| 3 | 5003 |
| 11 | 5003 |
| … |
raw_id: tag ID as it appears in the raw datachild_id: standardized participant ID to use in
analysischild_id)If a participant’s tag was replaced during data collection, run this before the main pipeline:
raw_data <- fix_tag_replacement(
data = raw_data,
original_id = 3,
replacement_id = 11,
replacement_time = "2025-03-18 11:51:00"
)This will: - Keep observations from tag 3 before 11:51 - Rename tag 11 observations from 11:51 onwards to tag 3 - Remove tag 3 observations from 11:51 onwards (duplicate/invalid) - Remove tag 11 observations before 11:51 (not yet attached)
Create a CSV file with two columns mapping raw device IDs to your participant IDs:
raw_id,child_id
1,5001
2,5002
3,5003
Or use the bundled example file:
id_mapping <- system.file("extdata", "id_mapping.csv", package = "trackclean")library(trackclean)
library(readr)
raw_data <- read_csv(system.file("extdata", "raw_tracking_data.csv", package = "trackclean"))
# Fix tag replacement first (if applicable)
raw_data <- fix_tag_replacement(
data = raw_data,
original_id = 3,
replacement_id = 11,
replacement_time = "2025-03-18 11:51:00"
)
cleaned_data <- clean_playground_data(
data = raw_data,
id_mapping = system.file("extdata", "id_mapping.csv", package = "trackclean"),
analyze_start = "2025-03-18 11:47:00",
analyze_end = "2025-03-18 11:57:00",
bell_start = "2025-03-18 11:53:00",
bell_end = "2025-03-18 11:58:00",
output_file = "cleaned_data.csv"
)For more control, run each step separately:
# Step 1: Map IDs
data <- map_ids(raw_data, id_mapping)
# Step 2: Mark time periods
data <- mark_time_periods(
data,
analyze_start = "2025-03-18 11:47:00",
analyze_end = "2025-03-18 11:57:00",
bell_start = "2025-03-18 11:53:00",
bell_end = "2025-03-18 11:58:00"
)
# Step 3: Standardize to seconds
data <- standardize_to_seconds(data)
# Step 4: Interpolate gaps
data <- interpolate_gaps(
data,
max_gap_small = 10,
max_position_change = 0.3
)The package uses a two-phase approach to handle missing data:
Phase 1: Interpolates small gaps (≤10 seconds by default) - Uses linear interpolation between known points - Appropriate for brief signal losses
Phase 2: Interpolates larger gaps conditionally - Only when position change between endpoints is minimal (≤30cm by default) - Indicates the participant remained stationary during the gap - Prevents false movement estimates for longer signal dropouts
All functions provide: - Progress messages and summaries - Data integrity checks - Row count validation - Clear flagging of imputed vs. original data
| Function | Purpose |
|---|---|
clean_playground_data() |
Complete pipeline in one call |
fix_tag_replacement() |
Fix tag replacements (run before pipeline) |
map_ids() |
Map raw device IDs to participant IDs |
mark_time_periods() |
Create Analyze and Bell columns |
standardize_to_seconds() |
Aggregate to one-second intervals |
interpolate_gaps() |
Two-phase gap interpolation |
The cleaned dataset includes these flags:
id_code: Standardized participant IDAnalyze: 1 if within analysis period, 0 otherwiseBell: 1 if within bell period, 0 otherwise (if
specified)n_entries: Original number of signals in that
secondstandardized: 1 if multiple signals were averaged, 0
otherwiseimputed: 1 if row added via phase 1 interpolationimputed_large: 1 if row added via phase 2
interpolationcleaned_data <- clean_playground_data(
data = raw_data,
id_mapping = "id_mapping.csv",
analyze_start = "2025-03-18 11:47:00",
analyze_end = "2025-03-18 11:57:00",
max_gap_small = 5, # Phase 1: ≤5 seconds
max_gap_large = 30, # Phase 2: ≤30 seconds max
max_position_change = 0.5 # Phase 2: ≤50cm movement
)Tomas Bilevicius
CC BY 4.0 — you are free to use, share, and adapt this package for any purpose, including commercially, as long as you give appropriate credit to the author.