| Title: | Tools for Cleaning High-Frequency Real-Time Location Tracking Data |
| Version: | 0.1.0 |
| Description: | Provides data cleaning and preprocessing tools for high-frequency positional data from real-time location tracking systems (UWB, RFID, and similar technologies), with functions for ID mapping, time period marking, data standardization, and two-phase conditional gap interpolation. See Bilevicius (2026) <doi:10.5281/zenodo.20783488>. |
| License: | CC BY 4.0 |
| URL: | https://github.com/tomasbil/trackclean |
| BugReports: | https://github.com/tomasbil/trackclean/issues |
| Encoding: | UTF-8 |
| Language: | en-US |
| RoxygenNote: | 7.3.3 |
| Imports: | dplyr, lubridate, magrittr, readr, rlang, tibble |
| Suggests: | testthat (≥ 3.0.0) |
| Config/testthat/edition: | 3 |
| NeedsCompilation: | no |
| Packaged: | 2026-06-26 08:09:33 UTC; tomas |
| Author: | Tomas Bilevicius [aut, cre] |
| Maintainer: | Tomas Bilevicius <Tomas.Bilevicius@vub.be> |
| Repository: | CRAN |
| Date/Publication: | 2026-07-01 09:40:02 UTC |
trackclean: Tools for Cleaning High-Frequency Real-Time Location Tracking Data
Description
A toolkit for cleaning high-frequency positional data from real-time location tracking systems (UWB, RFID, and similar technologies). Originally developed for playground movement research, but applicable to any study collecting high-frequency positional data from people moving within a defined space. Provides functions for ID mapping, time period marking, data standardization, and two-phase gap interpolation.
Main Functions
Complete Pipeline:
-
clean_playground_data: Master function running the complete pipeline
Data Preparation:
-
fix_tag_replacement: Fix tag replacements before cleaning
Individual Pipeline Steps:
-
map_ids: Map raw tracking IDs to standardized participant IDs -
mark_time_periods: Mark analysis and event time periods -
standardize_to_seconds: Standardize data to one-second intervals -
interpolate_gaps: Two-phase gap interpolation
Typical Workflow
Prepare an ID mapping CSV file with columns: raw_id, child_id
Load your raw tracking data
(Optional) Fix any tag replacements with
fix_tag_replacement()Run
clean_playground_data()with appropriate parametersAnalyze the cleaned data
Example
library(trackclean)
library(readr)
raw_data <- read_csv(system.file("extdata", "raw_tracking_data.csv",
package = "trackclean"))
# Fix tag replacement (if needed)
raw_data <- fix_tag_replacement(
data = raw_data,
original_id = 3,
replacement_id = 11,
replacement_time = "2025-03-18 11:51:00"
)
# Complete pipeline
cleaned_data <- clean_playground_data(
data = raw_data,
id_mapping = system.file("extdata", "id_mapping.csv", package = "trackclean"),
analyze_start = "2025-03-18 11:47:00",
analyze_end = "2025-03-18 11:57:00",
bell_start = "2025-03-18 11:53:00",
bell_end = "2025-03-18 11:58:00"
)
Author(s)
Maintainer: Tomas Bilevicius Tomas.Bilevicius@vub.be
See Also
Useful links:
Pipe operator
Description
See magrittr::%>% for details.
Usage
lhs %>% rhs
Arguments
lhs |
A value or the magrittr placeholder. |
rhs |
A function call using the magrittr semantics. |
Value
The result of calling rhs(lhs).
Clean tracking data (complete pipeline)
Description
Master function that runs the complete data cleaning pipeline:
Map raw IDs to participant IDs
Mark analysis and bell time periods
Standardize to fixed time intervals
Interpolate gaps (two-phase)
Optionally export to CSV
Usage
clean_playground_data(
data,
id_mapping,
exclude_ids = NULL,
analyze_start,
analyze_end,
bell_start = NULL,
bell_end = NULL,
unit = "second",
time_step = 1,
max_gap_small = 10,
max_gap_large = NULL,
max_position_change = 0.3,
output_file = NULL,
verbose = TRUE,
time_col = "At",
x_col = "X",
y_col = "Y",
raw_id_col = "ID",
id_col = "id_code",
analyze_col = "Analyze",
bell_col = "Bell"
)
Arguments
data |
Raw tracking data frame |
id_mapping |
Path to ID mapping CSV file or mapping data frame |
exclude_ids |
Vector of raw IDs to exclude from analysis |
analyze_start |
Start time for analysis period (character or POSIXct) |
analyze_end |
End time for analysis period (character or POSIXct) |
bell_start |
Start time for bell period (optional) |
bell_end |
End time for bell period (optional) |
unit |
Time interval for standardization, passed to
|
time_step |
Expected time step in seconds between consecutive observations
after standardization (default: 1). Must match the numeric value of |
max_gap_small |
Maximum gap for phase 1 interpolation in seconds (default: 10) |
max_gap_large |
Maximum gap for phase 2 interpolation in seconds (default: NULL) |
max_position_change |
Maximum position change for phase 2 in meters (default: 0.3) |
output_file |
Path to save cleaned data as CSV (optional) |
verbose |
Print progress messages (default: TRUE) |
time_col |
Name of the timestamp column (default: |
x_col |
Name of the x-coordinate column (default: |
y_col |
Name of the y-coordinate column (default: |
raw_id_col |
Name of the raw device ID column in the input data (default: |
id_col |
Name of the output column for standardized participant IDs (default: |
analyze_col |
Name of the analysis period flag column (default: |
bell_col |
Name of the bell period flag column (default: |
Value
Cleaned data frame
Examples
# Complete pipeline using bundled example data
library(readr)
raw_data <- read_csv(system.file("extdata", "raw_tracking_data.csv",
package = "trackclean"))
cleaned_data <- clean_playground_data(
data = raw_data,
id_mapping = system.file("extdata", "id_mapping.csv", package = "trackclean"),
analyze_start = "2025-03-18 11:47:00",
analyze_end = "2025-03-18 11:57:00",
bell_start = "2025-03-18 11:53:00",
bell_end = "2025-03-18 11:58:00"
)
Fix tag replacements in raw data
Description
Handles cases where a participant's tracking tag was replaced during data collection. Renames observations from the new tag to the original ID and removes invalid observations.
Usage
fix_tag_replacement(
data,
original_id,
replacement_id,
replacement_time,
time_col = "At",
id_col = "ID"
)
Arguments
data |
A data frame with raw tracking data |
original_id |
The participant's original tag ID |
replacement_id |
The new tag ID that replaced the original |
replacement_time |
Time when tag was replaced (POSIXct or character, e.g. |
time_col |
Name of the timestamp column (default: "At") |
id_col |
Name of the ID column (default: "ID") |
Value
Data frame with corrected IDs:
Observations from replacement_id >= replacement_time are renamed to original_id
Observations from original_id >= replacement_time are removed
Observations from replacement_id < replacement_time are removed
Examples
raw_data <- data.frame(
ID = c(159L, 159L, 106L, 106L),
At = as.POSIXct(c("2025-03-18 11:00:00", "2025-03-18 11:30:00",
"2025-03-18 11:00:00", "2025-03-18 11:30:00")),
X = c(1.0, 2.0, 3.0, 4.0),
Y = c(1.0, 2.0, 3.0, 4.0)
)
raw_data <- fix_tag_replacement(
data = raw_data,
original_id = 159,
replacement_id = 106,
replacement_time = "2025-03-18 11:20:00"
)
Interpolate gaps in location tracking data (two-phase approach)
Description
Phase 1: Interpolates small gaps (gap <= max_gap_small seconds) Phase 2: Interpolates larger gaps if position change is <= max_position_change meters
Usage
interpolate_gaps(
data,
time_col = "At",
x_col = "X",
y_col = "Y",
id_col = "id_code",
analyze_col = "Analyze",
time_step = 1,
max_gap_small = 10,
max_gap_large = NULL,
max_position_change = 0.3,
verbose = TRUE
)
Arguments
data |
A data frame with location tracking data |
time_col |
Name of the timestamp column (default: "At") |
x_col |
Name of x-coordinate column (default: "X") |
y_col |
Name of y-coordinate column (default: "Y") |
id_col |
Name of ID column (default: "id_code") |
analyze_col |
Name of column indicating rows to analyze (default: "Analyze") |
time_step |
Expected time step in seconds between consecutive observations
after standardization (default: 1). Set this to match the |
max_gap_small |
Maximum gap size for phase 1 in seconds (default: 10) |
max_gap_large |
Maximum gap size for phase 2 in seconds (default: NULL for no limit) |
max_position_change |
Maximum position change in meters for phase 2 (default: 0.3) |
verbose |
Print progress messages (default: TRUE) |
Details
Uses linear interpolation: X_t = X_start + (k/gap) * (X_end - X_start)
Value
Data frame with interpolated coordinates and flags:
imputed: 1 if row was added in phase 1 (small gaps)
imputed_large: 1 if row was added in phase 2 (large gaps)
n_entries: 0 for imputed rows
standardized: 0 for imputed rows
Examples
standardized_data <- data.frame(
id_code = c(1L, 1L),
At = as.POSIXct(c("2025-03-18 11:00:00", "2025-03-18 11:00:05")),
X = c(1.0, 1.0),
Y = c(2.0, 2.0),
Analyze = c(1L, 1L),
n_entries = c(1L, 1L),
standardized = c(0L, 0L)
)
data_clean <- interpolate_gaps(standardized_data)
Map raw tracking IDs to standardized child IDs
Description
Map raw tracking IDs to standardized child IDs
Usage
map_ids(
data,
mapping,
exclude_ids = NULL,
raw_id_col = "ID",
id_col = "id_code",
analyze_col = "Analyze"
)
Arguments
data |
A data frame with raw tracking data |
mapping |
Either:
|
exclude_ids |
Vector of raw IDs to exclude from analysis (sets Analyze = 0) |
raw_id_col |
Name of the raw ID column in data (default: "ID") |
id_col |
Name of the output column for standardized IDs (default: "id_code") |
analyze_col |
Name of the Analyze column in data (default: "Analyze") |
Value
Data frame with added id_code column and updated Analyze column
Examples
raw_data <- data.frame(
ID = c(1L, 2L, 3L),
At = as.POSIXct("2025-03-18 11:45:00"),
X = c(1.0, 2.0, 3.0),
Y = c(1.0, 2.0, 3.0)
)
id_map <- data.frame(raw_id = c(1L, 2L, 3L), child_id = c(5001L, 5002L, 5003L))
data_mapped <- map_ids(raw_data, id_map)
Mark time periods for analysis and bell time in raw data
Description
Creates binary columns indicating whether each timestamp falls within specified time periods (e.g., recess period, bell ringing period)
Usage
mark_time_periods(
data,
time_col = "At",
analyze_start,
analyze_end,
bell_start = NULL,
bell_end = NULL,
analyze_col = "Analyze",
bell_col = "Bell"
)
Arguments
data |
A data frame with timestamp data |
time_col |
Name of the timestamp column (default: "At") |
analyze_start |
Start time for analysis period (POSIXct or character, e.g. |
analyze_end |
End time for analysis period (POSIXct or character) |
bell_start |
Start time for bell period (POSIXct or character, optional) |
bell_end |
End time for bell period (POSIXct or character, optional) |
analyze_col |
Name of column to create for analysis period (default: "Analyze") |
bell_col |
Name of column to create for bell period (default: "Bell") |
Value
Data frame with added binary columns (1 = within period, 0 = outside period)
Examples
raw_data <- data.frame(
At = as.POSIXct(c("2025-03-18 11:00:00", "2025-03-18 12:00:00",
"2025-03-18 13:00:00")),
ID = 1:3
)
raw_data <- mark_time_periods(
raw_data,
analyze_start = "2025-03-18 10:30:00",
analyze_end = "2025-03-18 12:30:00",
bell_start = "2025-03-18 11:30:00",
bell_end = "2025-03-18 12:30:00"
)
Standardize location data to fixed time intervals
Description
Rounds timestamps to the nearest interval boundary and averages X/Y coordinates if multiple signals fall within the same interval.
Usage
standardize_to_seconds(
data,
time_col = "At",
x_col = "X",
y_col = "Y",
id_col = "id_code",
unit = "second",
verbose = TRUE
)
Arguments
data |
A data frame with location tracking data |
time_col |
Name of the timestamp column (default: "At") |
x_col |
Name of x-coordinate column (default: "X") |
y_col |
Name of y-coordinate column (default: "Y") |
id_col |
Name of ID column (default: "id_code") |
unit |
Time interval to standardize to, passed to
|
verbose |
Print summary statistics (default: TRUE) |
Value
Data frame with one row per (id, interval) with:
X, Y: Averaged coordinates
n_entries: Number of original signals in that interval
standardized: 1 if multiple signals were aggregated, 0 if single
Examples
raw_data <- data.frame(
id_code = c(1L, 1L, 1L),
At = as.POSIXct(c("2025-03-18 11:00:00.0", "2025-03-18 11:00:00.5",
"2025-03-18 11:00:01.0")),
X = c(1.0, 1.2, 2.0),
Y = c(2.0, 2.1, 3.0)
)
standardized_data <- standardize_to_seconds(raw_data)