trackclean

Tools for cleaning high-frequency real-time location tracking data.

trackclean was developed to process data from playground movement research, but applies to any study collecting high-frequency positional data from people moving within a defined space — classrooms, sports facilities, rehabilitation settings, and similar environments.

Installation

# Install from CRAN
install.packages("trackclean")

# Or install the development version from GitHub
# install.packages("devtools")
devtools::install_github("tomasbil/trackclean")

Example Data

The package includes a small example dataset that can be used to trial the full pipeline without any real data. It simulates 10 children tracked during a school recess on a 40m × 60m playground using a UWB positioning system.

library(trackclean)
library(readr)

raw_data   <- read_csv(system.file("extdata", "raw_tracking_data.csv", package = "trackclean"))
id_mapping <- system.file("extdata", "id_mapping.csv", package = "trackclean")

The example dataset includes: - 10 participants with raw tag IDs 1–10, mapped to child IDs 5001–5010 - ~13.5 minutes of data (11:45:00–11:58:30), with observations both inside and outside the analysis window - Sub-second timestamps causing multiple readings per second — handled by standardize_to_seconds() - Randomly dropped seconds creating gaps — handled by interpolate_gaps() - One tag replacement: participant 5003 starts on raw tag ID 3, which is swapped to raw tag ID 11 at 11:51:00 — handled by fix_tag_replacement()

Analysis parameters for this dataset:

Parameter	Value
`analyze_start`	`"2025-03-18 11:47:00"`
`analyze_end`	`"2025-03-18 11:57:00"`
`bell_start`	`"2025-03-18 11:53:00"`
`bell_end`	`"2025-03-18 11:58:00"`
Tag replacement	raw_id 3 → raw_id 11 at `"2025-03-18 11:51:00"`

Expected input format

Raw tracking data (raw_tracking_data.csv):

ID	At	X	Y
1	2025-03-18 11:45:00.00	5.000	10.000
1	2025-03-18 11:45:01.00	5.383	10.239
1	2025-03-18 11:45:01.47	5.341	10.261
…

ID: raw tag ID as assigned by the tracking system
At: timestamp (POSIXct-readable, sub-second precision supported)
X, Y: position in meters

ID mapping (id_mapping.csv):

raw_id	child_id
1	5001
3	5003
11	5003
…

raw_id: tag ID as it appears in the raw data
child_id: standardized participant ID to use in analysis
A participant with a replaced tag appears twice (one row per tag, same child_id)

Quick Start

Optional: Fix Tag Replacements

If a participant’s tag was replaced during data collection, run this before the main pipeline:

raw_data <- fix_tag_replacement(
  data = raw_data,
  original_id = 3,
  replacement_id = 11,
  replacement_time = "2025-03-18 11:51:00"
)

This will: - Keep observations from tag 3 before 11:51 - Rename tag 11 observations from 11:51 onwards to tag 3 - Remove tag 3 observations from 11:51 onwards (duplicate/invalid) - Remove tag 11 observations before 11:51 (not yet attached)

1. Prepare Your ID Mapping

Create a CSV file with two columns mapping raw device IDs to your participant IDs:

raw_id,child_id
1,5001
2,5002
3,5003

Or use the bundled example file:

id_mapping <- system.file("extdata", "id_mapping.csv", package = "trackclean")

2. Run the Complete Pipeline

library(trackclean)
library(readr)

raw_data <- read_csv(system.file("extdata", "raw_tracking_data.csv", package = "trackclean"))

# Fix tag replacement first (if applicable)
raw_data <- fix_tag_replacement(
  data = raw_data,
  original_id = 3,
  replacement_id = 11,
  replacement_time = "2025-03-18 11:51:00"
)

cleaned_data <- clean_playground_data(
  data = raw_data,
  id_mapping = system.file("extdata", "id_mapping.csv", package = "trackclean"),
  analyze_start = "2025-03-18 11:47:00",
  analyze_end   = "2025-03-18 11:57:00",
  bell_start    = "2025-03-18 11:53:00",
  bell_end      = "2025-03-18 11:58:00",
  output_file   = "cleaned_data.csv"
)

3. Use Individual Functions

For more control, run each step separately:

# Step 1: Map IDs
data <- map_ids(raw_data, id_mapping)

# Step 2: Mark time periods
data <- mark_time_periods(
  data,
  analyze_start = "2025-03-18 11:47:00",
  analyze_end   = "2025-03-18 11:57:00",
  bell_start    = "2025-03-18 11:53:00",
  bell_end      = "2025-03-18 11:58:00"
)

# Step 3: Standardize to seconds
data <- standardize_to_seconds(data)

# Step 4: Interpolate gaps
data <- interpolate_gaps(
  data,
  max_gap_small = 10,
  max_position_change = 0.3
)

Key Features

Two-Phase Gap Interpolation

The package uses a two-phase approach to handle missing data:

Phase 1: Interpolates small gaps (≤10 seconds by default) - Uses linear interpolation between known points - Appropriate for brief signal losses

Phase 2: Interpolates larger gaps conditionally - Only when position change between endpoints is minimal (≤30cm by default) - Indicates the participant remained stationary during the gap - Prevents false movement estimates for longer signal dropouts

Quality Assurance

All functions provide: - Progress messages and summaries - Data integrity checks - Row count validation - Clear flagging of imputed vs. original data

Function Reference

Function	Purpose
`clean_playground_data()`	Complete pipeline in one call
`fix_tag_replacement()`	Fix tag replacements (run before pipeline)
`map_ids()`	Map raw device IDs to participant IDs
`mark_time_periods()`	Create Analyze and Bell columns
`standardize_to_seconds()`	Aggregate to one-second intervals
`interpolate_gaps()`	Two-phase gap interpolation

Output Columns

The cleaned dataset includes these flags:

id_code: Standardized participant ID
Analyze: 1 if within analysis period, 0 otherwise
Bell: 1 if within bell period, 0 otherwise (if specified)
n_entries: Original number of signals in that second
standardized: 1 if multiple signals were averaged, 0 otherwise
imputed: 1 if row added via phase 1 interpolation
imputed_large: 1 if row added via phase 2 interpolation

Parameters

Customizable Thresholds

cleaned_data <- clean_playground_data(
  data = raw_data,
  id_mapping = "id_mapping.csv",
  analyze_start = "2025-03-18 11:47:00",
  analyze_end   = "2025-03-18 11:57:00",
  max_gap_small = 5,             # Phase 1: ≤5 seconds
  max_gap_large = 30,            # Phase 2: ≤30 seconds max
  max_position_change = 0.5      # Phase 2: ≤50cm movement
)

Author

Tomas Bilevicius

License

CC BY 4.0 — you are free to use, share, and adapt this package for any purpose, including commercially, as long as you give appropriate credit to the author.

mirror server hosted at Truenetwork, Russian Federation.