Help for package weatherjoin

Type:

Package

Title:

Join Gridded Weather Data to Event Tables

Version:

0.2.2

URL:

https://github.com/hauae/weatherjoin

BugReports:

https://github.com/hauae/weatherjoin/issues

Description:

High-level tools to attach gridded weather data from the NASA POWER Project to event-based datasets. The package plans efficient spatio-temporal API calls via the 'nasapower' R package, caches downloaded segments locally, and joins weather variables back to the input table using exact or rolling joins. This package is not affiliated with or endorsed by NASA.

License:

MIT + file LICENSE

Encoding:

UTF-8

RoxygenNote:

7.3.3

Imports:

data.table, jsonlite

Suggests:

nasapower, digest, fst, anytime, testthat (≥ 3.0.0), knitr, rmarkdown, withr

Depends:

R (≥ 4.1.0)

Config/testthat/edition:

VignetteBuilder:

knitr

NeedsCompilation:

Packaged:

2026-02-02 18:39:40 UTC; 00758120

Author:

Przemek Dolowy [aut, cre] (affiliation: Harper Adams University)

Maintainer:

Przemek Dolowy <pdolowy@harper-adams.ac.uk>

Repository:

CRAN

Date/Publication:

2026-02-04 10:20:03 UTC

weatherjoin: Join Gridded Weather Data to Event Tables

Description

Author(s)

Maintainer: Przemek Dolowy pdolowy@harper-adams.ac.uk (Harper Adams University)

Join weather back to events (supports rolling join for hourly)

Description

Join weather back to events (supports rolling join for hourly)

Usage

.attach_weather(
  x,
  weather,
  params,
  tz = "UTC",
  roll = c("nearest", "last", "none"),
  roll_max_hours = NULL,
  coord_digits = 5
)

Build standard time keys used by weatherjoin

Description

Build standard time keys used by weatherjoin

Usage

.build_time(DT, time, tz = "UTC", time_api_resolved = c("daily", "hourly"))

Arguments

DT

data.table with input data.

time

User ⁠time=⁠ specification (single column or multiple columns).

tz

Timezone used for parsing/constructing timestamps (default UTC).

time_api_resolved

"daily" or "hourly" (already resolved from user setting/guess).

Value

DT with timestamp_utc (POSIXct) and t_utc (numeric seconds) columns added.

Check cache coverage for planned calls

Description

Internal helper. Determines which planned provider calls are satisfied by existing cache entries and which must be fetched.

Usage

.cache_check(
  calls,
  time_api,
  params,
  site_elevation_col = "site_elevation",
  settings,
  cache_dir,
  cache_scope = c("user", "project"),
  pkg = "weatherjoin",
  cache_max_age_days = 30,
  refresh = c("if_missing", "if_stale", "always"),
  match_mode = c("cover", "exact"),
  param_match = c("superset", "exact")
)

Plan provider calls: for each loc_id, split by time sparsity

Description

Plan provider calls: for each loc_id, split by time sparsity

Usage

.call_plan(
  x,
  time_col = "timestamp_utc",
  loc_id_col = "loc_id",
  rep_lat_col = "rep_lat",
  rep_lon_col = "rep_lon",
  tz = "UTC"
)

Placeholder elevation lookup

Description

Placeholder elevation lookup

Usage

.elev_lookup(lon, lat, method = c("constant"), constant = 100, ...)

Fetch NASA POWER for planned calls

Description

Fetch NASA POWER for planned calls

Usage

.fetch_power(
  calls_to_fetch,
  time_api,
  params,
  community = "ag",
  time_standard = "UTC",
  settings = list(),
  cache_dir = NULL,
  cache_scope = c("user", "project"),
  pkg = "weatherjoin",
  dummy_hour = 12L,
  verbose = FALSE,
  ...
)

Multi-column time input path Map time columns to roles

Description

Multi-column time input path Map time columns to roles

Usage

.map_time_columns(time_cols, names_x)

Arguments

time_cols

Character vector of column names supplied by the user via ⁠time=⁠.

names_x

Names of the input table.

Value

A list with mode ("ymd" or "ydoy") and role names: year, month, day, hour (optional), doy (optional).

Normalize POWER output time columns to timestamp_utc (UTC)

Description

Normalize POWER output time columns to timestamp_utc (UTC)

Usage

.normalize_power_time(
  w,
  time_api = c("hourly", "daily"),
  tz = "UTC",
  dummy_hour = 12L
)

Resolve time_api based on user choice and input resolution

Description

Resolve time_api based on user choice and input resolution

Usage

.resolve_time_api(
  dt,
  time_api = c("guess", "hourly", "daily"),
  input_res = c("hourly", "daily"),
  tz = "UTC",
  dummy_hour = 12L
)

Spatial planning: map points to representative locations

Description

Spatial planning: map points to representative locations

Usage

.spatial_plan(
  x,
  spatial_mode = c("cluster", "exact", "by_group"),
  lat_col = "lat",
  lon_col = "lon",
  group_col = NULL,
  rep_method = c("median", "centroid"),
  cluster_radius_m = 250,
  keep_diag = TRUE,
  check_range = TRUE,
  coord_digits = 5L
)

Split sparse time points into segments using a gap penalty (hours)

Description

Split sparse time points into segments using a gap penalty (hours)

Usage

.split_time_ranges(times_utc)

Single-column time input path Validate and normalize a time column

Description

Single-column time input path Validate and normalize a time column

Usage

.validate_single_time(
  raw,
  tz = "UTC",
  dummy_hour = 12L,
  time_api_resolved = c("daily", "hourly"),
  time_col = "<time>",
  max_examples = 5L
)

Multi-column time input path Validate time components and build Date safely

Description

Multi-column time input path Validate time components and build Date safely

Usage

.validate_time_components(
  y,
  m = NULL,
  d = NULL,
  doy = NULL,
  h = NULL,
  mode = c("ymd", "ydoy"),
  time_api_resolved = c("daily", "hourly"),
  time_cols = character(),
  max_examples = 5L
)

Arguments

y, m, d

Integer-ish vectors (for mode="ymd").

doy

Integer-ish vector (for mode="ydoy").

h

Optional integer-ish vector.

mode

"ymd" or "ydoy"

time_api_resolved

"hourly" or "daily" (for hourly requirement checks)

time_cols

Character vector of user-specified columns for error context.

max_examples

How many bad examples to show in error messages.

Value

A list with date (Date) and hour (integer, possibly NA if missing and not allowed).

Internal: load required packages (used for interactive sourcing too)

Description

Internal: load required packages (used for interactive sourcing too)

Usage

.wj_load(pkgs = c("data.table"), attach = FALSE, quiet = TRUE)

Get weatherjoin option with default

Description

Get weatherjoin option with default

Usage

.wj_opt(name, default)

Join gridded weather data to an event table

Description

Attach gridded weather variables from NASA POWER to rows of an event table. The function:

standardizes/validates time input (single timestamp column or multiple time columns),
plans efficient provider calls by clustering locations (default) and splitting sparse time ranges,
caches downloaded weather segments locally and reuses them,
joins weather back to events using exact or rolling joins.

Usage

join_weather(
  x,
  params,
  time,
  lat_col = "lat",
  lon_col = "lon",
  time_api = c("guess", "hourly", "daily"),
  tz = "UTC",
  roll = c("nearest", "last", "none"),
  roll_max_hours = NULL,
  spatial_mode = c("cluster", "exact", "by_group"),
  group_col = NULL,
  cluster_radius_m = 250,
  site_elevation = c("constant", "auto"),
  elev_constant = 100,
  elev_fun = NULL,
  community = "ag",
  cache_scope = c("user", "project"),
  cache_dir = NULL,
  verbose = FALSE,
  ...
)

Arguments

x

A data.frame/data.table with event rows.

params

Character vector of NASA POWER parameter codes (e.g. "T2M").

time

A single column name containing time (POSIXct/Date/character/numeric) OR a character vector of column names used to assemble a timestamp (e.g. c("YEAR","MO","DY","HR")).

lat_col, lon_col

Column names for latitude and longitude (decimal degrees).

time_api

One of "guess", "hourly", "daily". If "daily" is chosen while the input contains time-of-day information, timestamps are downsampled to dates (with a fixed hour). If "hourly" is chosen but the input has no time-of-day information, an error is raised.

tz

Time zone used to interpret/construct input timestamps (default "UTC"). Weather is requested from NASA POWER in UTC.

roll

Join behaviour when matching timestamps: "nearest" (default, recommended), "last", or "none" (exact). Rolling is applied when joining hourly weather to event times.

roll_max_hours

Maximum allowed time distance (hours) for a rolling match. If NULL, a safe default is used: 1 hour for hourly joins and 24 hours for daily joins.

spatial_mode

How to reduce many points to representative locations before calling POWER: "cluster" (default), "exact", or "by_group". Clustering reduces accidental explosion of provider calls and matches POWER's coarse spatial resolution.

group_col

Grouping column used when spatial_mode="by_group".

cluster_radius_m

Clustering radius in meters when spatial_mode="cluster".

site_elevation

Elevation strategy for POWER calls: "constant" or "auto". Elevation is resolved for representative locations and becomes part of the cache identity.

elev_constant

Constant elevation (meters) used when site_elevation="constant" and as a fallback for "auto".

elev_fun

Optional function function(lon, lat, ...) returning elevation (meters) for representative points.

community

Passed to nasapower::get_power() (e.g. "ag").

cache_scope

Where to store cache by default: "user" or "project".

cache_dir

Optional explicit cache directory. If NULL, determined by cache_scope.

verbose

If TRUE, print progress messages.

...

Passed through to nasapower::get_power().

Value

A data.table with weather columns appended. Rows with missing/invalid inputs keep their original values and receive NA weather.

weatherjoin options

Description

Most users will not need to change package options. Advanced configuration can be controlled via options().

Details

Cache policy

weatherjoin.cache_max_age_days Cache entries older than this (days) are considered stale (default 60).
weatherjoin.cache_refresh When to refetch: one of "if_missing", "if_stale", "always" (default "if_missing").
weatherjoin.cache_match_mode Cache matching mode: "cover" (cached window covers requested) or "exact" (default "cover").
weatherjoin.cache_param_match Parameter matching for cache reuse: "superset" or "exact" (default "superset").
weatherjoin.cache_pkg Internal namespace used when cache_scope="user" (default "weatherjoin").

Time splitting and call planning

These options control how sparse time series are split into separate provider calls. They are primarily performance controls; incorrect values will not change the meaning of returned weather values, only how much data is downloaded and cached.

weatherjoin.split_penalty_hours Gap threshold (hours). Larger values yield fewer, wider time windows (default 72).
weatherjoin.pad_hours Padding (hours) added to both ends of each planned time window (default 0).
weatherjoin.max_parts Maximum number of planned time windows per representative location (default 50).

Time construction

weatherjoin.dummy_hour Hour used when constructing daily timestamps (default 12).

Diagnostics

weatherjoin.keep_rep_cols If TRUE, keep representative-location diagnostics (rep_lon/rep_lat, distance, elevation) in outputs (default FALSE).

Use withr for temporary changes:

withr::local_options(list(
  weatherjoin.split_penalty_hours = 168,
  weatherjoin.max_parts = 25
))

Clear cached weather data

Description

Deletes cached files and (optionally) removes rows from the cache index.

Usage

wj_cache_clear(
  cache_dir = NULL,
  cache_scope = c("user", "project"),
  pkg = "weatherjoin",
  filter = NULL,
  keep_index = FALSE,
  dry_run = FALSE,
  verbose = TRUE
)

Arguments

cache_dir

Optional explicit cache directory.

cache_scope

Where to store cache by default: "user" or "project".

pkg

Package name used for "user" cache scope.

filter

Optional expression evaluated within the cache index to select entries to remove.

keep_index

If TRUE, leaves index rows (useful for debugging); default FALSE.

dry_run

If TRUE, prints what would be deleted but does not delete.

verbose

If TRUE, prints progress.

Value

Invisibly returns the rows selected for deletion.

List cached weather segments

Description

Returns the cache index (one row per cached segment).

Usage

wj_cache_list(
  cache_dir = NULL,
  cache_scope = c("user", "project"),
  pkg = "weatherjoin"
)

Arguments

cache_dir

Optional explicit cache directory.

cache_scope

Where to store cache by default: "user" or "project".

pkg

Package name used for "user" cache scope.

Value

A data.table index of cached segments.

Upgrade cache index schema

Description

Ensures the cache index contains required columns and correct types.

Usage

wj_cache_upgrade_index(
  cache_dir = NULL,
  cache_scope = c("user", "project"),
  pkg = "weatherjoin",
  verbose = TRUE
)

Arguments

cache_dir

Optional explicit cache directory.

cache_scope

Where to store cache by default: "user" or "project".

pkg

Package name used for "user" cache scope.

verbose

If TRUE, prints progress.

Value

The upgraded cache index.

weatherjoin: Join Gridded Weather Data to Event Tables

Description

Author(s)

See Also

Join weather back to events (supports rolling join for hourly)

Description

Usage

Build standard time keys used by weatherjoin

Description

Usage

Arguments

Value

Check cache coverage for planned calls

Description

Usage

Plan provider calls: for each loc_id, split by time sparsity

Description

Usage

Placeholder elevation lookup

Description

Usage

Fetch NASA POWER for planned calls

Description

Usage

Multi-column time input path Map time columns to roles

Description

Usage

Arguments

Value

Normalize POWER output time columns to timestamp_utc (UTC)

Description

Usage

Resolve time_api based on user choice and input resolution

Description

Usage

Spatial planning: map points to representative locations

Description

Usage

Split sparse time points into segments using a gap penalty (hours)

Description

Usage

Single-column time input path Validate and normalize a time column

Description

Usage

Multi-column time input path Validate time components and build Date safely

Description

Usage

Arguments

Value

Internal: load required packages (used for interactive sourcing too)

Description

Usage

Get weatherjoin option with default

Description

Usage

Join gridded weather data to an event table

Description

Usage

Arguments

Value

See Also

weatherjoin options

Description

Details

Cache policy

Time splitting and call planning

Time construction

Diagnostics

Clear cached weather data

Description

Usage

Arguments

Value

List cached weather segments

Description

Usage

Arguments

Value

Upgrade cache index schema

Description