trackclean

Tools for cleaning high-frequency real-time location tracking data.

trackclean was developed to process data from playground movement research, but applies to any study collecting high-frequency positional data from people moving within a defined space — classrooms, sports facilities, rehabilitation settings, and similar environments.

Installation

# Install from CRAN
install.packages("trackclean")

# Or install the development version from GitHub
# install.packages("devtools")
devtools::install_github("tomasbil/trackclean")

Example Data

The package includes a small example dataset that can be used to trial the full pipeline without any real data. It simulates 10 children tracked during a school recess on a 40m × 60m playground using a UWB positioning system.

library(trackclean)
library(readr)

raw_data   <- read_csv(system.file("extdata", "raw_tracking_data.csv", package = "trackclean"))
id_mapping <- system.file("extdata", "id_mapping.csv", package = "trackclean")

The example dataset includes: - 10 participants with raw tag IDs 1–10, mapped to child IDs 5001–5010 - ~13.5 minutes of data (11:45:00–11:58:30), with observations both inside and outside the analysis window - Sub-second timestamps causing multiple readings per second — handled by standardize_to_seconds() - Randomly dropped seconds creating gaps — handled by interpolate_gaps() - One tag replacement: participant 5003 starts on raw tag ID 3, which is swapped to raw tag ID 11 at 11:51:00 — handled by fix_tag_replacement()

Analysis parameters for this dataset:

Parameter Value
analyze_start "2025-03-18 11:47:00"
analyze_end "2025-03-18 11:57:00"
bell_start "2025-03-18 11:53:00"
bell_end "2025-03-18 11:58:00"
Tag replacement raw_id 3 → raw_id 11 at "2025-03-18 11:51:00"

Expected input format

Raw tracking data (raw_tracking_data.csv):

ID At X Y
1 2025-03-18 11:45:00.00 5.000 10.000
1 2025-03-18 11:45:01.00 5.383 10.239
1 2025-03-18 11:45:01.47 5.341 10.261

ID mapping (id_mapping.csv):

raw_id child_id
1 5001
3 5003
11 5003

Quick Start

Optional: Fix Tag Replacements

If a participant’s tag was replaced during data collection, run this before the main pipeline:

raw_data <- fix_tag_replacement(
  data = raw_data,
  original_id = 3,
  replacement_id = 11,
  replacement_time = "2025-03-18 11:51:00"
)

This will: - Keep observations from tag 3 before 11:51 - Rename tag 11 observations from 11:51 onwards to tag 3 - Remove tag 3 observations from 11:51 onwards (duplicate/invalid) - Remove tag 11 observations before 11:51 (not yet attached)

1. Prepare Your ID Mapping

Create a CSV file with two columns mapping raw device IDs to your participant IDs:

raw_id,child_id
1,5001
2,5002
3,5003

Or use the bundled example file:

id_mapping <- system.file("extdata", "id_mapping.csv", package = "trackclean")

2. Run the Complete Pipeline

library(trackclean)
library(readr)

raw_data <- read_csv(system.file("extdata", "raw_tracking_data.csv", package = "trackclean"))

# Fix tag replacement first (if applicable)
raw_data <- fix_tag_replacement(
  data = raw_data,
  original_id = 3,
  replacement_id = 11,
  replacement_time = "2025-03-18 11:51:00"
)

cleaned_data <- clean_playground_data(
  data = raw_data,
  id_mapping = system.file("extdata", "id_mapping.csv", package = "trackclean"),
  analyze_start = "2025-03-18 11:47:00",
  analyze_end   = "2025-03-18 11:57:00",
  bell_start    = "2025-03-18 11:53:00",
  bell_end      = "2025-03-18 11:58:00",
  output_file   = "cleaned_data.csv"
)

3. Use Individual Functions

For more control, run each step separately:

# Step 1: Map IDs
data <- map_ids(raw_data, id_mapping)

# Step 2: Mark time periods
data <- mark_time_periods(
  data,
  analyze_start = "2025-03-18 11:47:00",
  analyze_end   = "2025-03-18 11:57:00",
  bell_start    = "2025-03-18 11:53:00",
  bell_end      = "2025-03-18 11:58:00"
)

# Step 3: Standardize to seconds
data <- standardize_to_seconds(data)

# Step 4: Interpolate gaps
data <- interpolate_gaps(
  data,
  max_gap_small = 10,
  max_position_change = 0.3
)

Key Features

Two-Phase Gap Interpolation

The package uses a two-phase approach to handle missing data:

Phase 1: Interpolates small gaps (≤10 seconds by default) - Uses linear interpolation between known points - Appropriate for brief signal losses

Phase 2: Interpolates larger gaps conditionally - Only when position change between endpoints is minimal (≤30cm by default) - Indicates the participant remained stationary during the gap - Prevents false movement estimates for longer signal dropouts

Quality Assurance

All functions provide: - Progress messages and summaries - Data integrity checks - Row count validation - Clear flagging of imputed vs. original data

Function Reference

Function Purpose
clean_playground_data() Complete pipeline in one call
fix_tag_replacement() Fix tag replacements (run before pipeline)
map_ids() Map raw device IDs to participant IDs
mark_time_periods() Create Analyze and Bell columns
standardize_to_seconds() Aggregate to one-second intervals
interpolate_gaps() Two-phase gap interpolation

Output Columns

The cleaned dataset includes these flags:

Parameters

Customizable Thresholds

cleaned_data <- clean_playground_data(
  data = raw_data,
  id_mapping = "id_mapping.csv",
  analyze_start = "2025-03-18 11:47:00",
  analyze_end   = "2025-03-18 11:57:00",
  max_gap_small = 5,             # Phase 1: ≤5 seconds
  max_gap_large = 30,            # Phase 2: ≤30 seconds max
  max_position_change = 0.5      # Phase 2: ≤50cm movement
)

Author

Tomas Bilevicius

License

CC BY 4.0 — you are free to use, share, and adapt this package for any purpose, including commercially, as long as you give appropriate credit to the author.

mirror server hosted at Truenetwork, Russian Federation.