| Type: | Package |
| Title: | Join Gridded Weather Data to Event Tables |
| Version: | 0.2.0 |
| URL: | https://github.com/hauae/weatherjoin |
| BugReports: | https://github.com/hauae/weatherjoin/issues |
| Description: | High-level tools to attach gridded weather data from the NASA POWER Project to event-based datasets. The package plans efficient spatio-temporal API calls via the 'nasapower' R package, caches downloaded segments locally, and joins weather variables back to the input table using exact or rolling joins. This package is not affiliated with or endorsed by NASA. |
| License: | MIT + file LICENSE |
| Encoding: | UTF-8 |
| RoxygenNote: | 7.3.3 |
| Imports: | data.table, jsonlite |
| Suggests: | nasapower, digest, fst, anytime, testthat (≥ 3.0.0), knitr, rmarkdown, withr |
| Depends: | R (≥ 4.1.0) |
| Config/testthat/edition: | 3 |
| VignetteBuilder: | knitr |
| NeedsCompilation: | no |
| Packaged: | 2026-01-25 13:52:17 UTC; 00758120 |
| Author: | Przemek Dolowy [aut, cre] (affiliation: Harper Adams University) |
| Maintainer: | Przemek Dolowy <pdolowy@harper-adams.ac.uk> |
| Repository: | CRAN |
| Date/Publication: | 2026-01-29 18:50:01 UTC |
weatherjoin: Join Gridded Weather Data to Event Tables
Description
High-level tools to attach gridded weather data from the NASA POWER project to event-based datasets. The package plans efficient spatio-temporal API calls, caches downloaded segments locally, and joins weather variables back to the input table using exact or rolling joins. NASA POWER data are retrieved via the 'nasapower' R package. This package is not affiliated with or endorsed by NASA.
Author(s)
Maintainer: Przemek Dolowy pdolowy@harper-adams.ac.uk (Harper Adams University)
See Also
Useful links:
Join weather back to events (supports rolling join for hourly)
Description
Join weather back to events (supports rolling join for hourly)
Usage
.attach_weather(
x,
weather,
params,
tz = "UTC",
roll = c("nearest", "last", "none"),
roll_max_hours = NULL,
coord_digits = 5
)
Build standard time keys used by weatherjoin
Description
Build standard time keys used by weatherjoin
Usage
.build_time(DT, time, tz = "UTC", time_api_resolved = c("daily", "hourly"))
Arguments
DT |
data.table with input data. |
time |
User |
tz |
Timezone used for parsing/constructing timestamps (default UTC). |
time_api_resolved |
"daily" or "hourly" (already resolved from user setting/guess). |
Value
DT with timestamp_utc (POSIXct) and t_utc (numeric seconds) columns added.
Check cache coverage for planned calls
Description
Internal helper. Determines which planned provider calls are satisfied by existing cache entries and which must be fetched.
Usage
.cache_check(
calls,
time_api,
params,
site_elevation_col = "site_elevation",
settings,
cache_dir,
cache_scope = c("user", "project"),
pkg = "weatherjoin",
cache_max_age_days = 30,
refresh = c("if_missing", "if_stale", "always"),
match_mode = c("cover", "exact"),
param_match = c("superset", "exact")
)
Plan provider calls: for each loc_id, split by time sparsity
Description
Plan provider calls: for each loc_id, split by time sparsity
Usage
.call_plan(
x,
time_col = "timestamp_utc",
loc_id_col = "loc_id",
rep_lat_col = "rep_lat",
rep_lon_col = "rep_lon",
tz = "UTC"
)
Placeholder elevation lookup
Description
Placeholder elevation lookup
Usage
.elev_lookup(lon, lat, method = c("constant"), constant = 100, ...)
Fetch NASA POWER for planned calls
Description
Fetch NASA POWER for planned calls
Usage
.fetch_power(
calls_to_fetch,
time_api,
params,
community = "ag",
time_standard = "UTC",
settings = list(),
cache_dir = NULL,
cache_scope = c("user", "project"),
pkg = "weatherjoin",
dummy_hour = 12L,
verbose = FALSE,
...
)
Multi-column time input path Map time columns to roles
Description
Multi-column time input path Map time columns to roles
Usage
.map_time_columns(time_cols, names_x)
Arguments
time_cols |
Character vector of column names supplied by the user via |
names_x |
Names of the input table. |
Value
A list with mode ("ymd" or "ydoy") and role names: year, month, day, hour (optional), doy (optional).
Normalize POWER output time columns to timestamp_utc (UTC)
Description
Normalize POWER output time columns to timestamp_utc (UTC)
Usage
.normalize_power_time(
w,
time_api = c("hourly", "daily"),
tz = "UTC",
dummy_hour = 12L
)
Resolve time_api based on user choice and input resolution
Description
Resolve time_api based on user choice and input resolution
Usage
.resolve_time_api(
dt,
time_api = c("guess", "hourly", "daily"),
input_res = c("hourly", "daily"),
tz = "UTC",
dummy_hour = 12L
)
Spatial planning: map points to representative locations
Description
Spatial planning: map points to representative locations
Usage
.spatial_plan(
x,
spatial_mode = c("cluster", "exact", "by_group"),
lat_col = "lat",
lon_col = "lon",
group_col = NULL,
rep_method = c("median", "centroid"),
cluster_radius_m = 250,
keep_diag = TRUE,
check_range = TRUE,
coord_digits = 5L
)
Split sparse time points into segments using a gap penalty (hours)
Description
Split sparse time points into segments using a gap penalty (hours)
Usage
.split_time_ranges(times_utc)
SIngle-column time input path Validate and normalize a time column
Description
SIngle-column time input path Validate and normalize a time column
Usage
.validate_single_time(
raw,
tz = "UTC",
dummy_hour = 12L,
time_api_resolved = c("daily", "hourly"),
time_col = "<time>",
max_examples = 5L
)
Multi-column time input path Validate time components and build Date safely
Description
Multi-column time input path Validate time components and build Date safely
Usage
.validate_time_components(
y,
m = NULL,
d = NULL,
doy = NULL,
h = NULL,
mode = c("ymd", "ydoy"),
time_api_resolved = c("daily", "hourly"),
time_cols = character(),
max_examples = 5L
)
Arguments
y, m, d |
Integer-ish vectors (for mode="ymd"). |
doy |
Integer-ish vector (for mode="ydoy"). |
h |
Optional integer-ish vector. |
mode |
"ymd" or "ydoy" |
time_api_resolved |
"hourly" or "daily" (for hourly requirement checks) |
time_cols |
Character vector of user-specified columns for error context. |
max_examples |
How many bad examples to show in error messages. |
Value
A list with date (Date) and hour (integer, possibly NA if missing and not allowed).
Internal: load required packages (used for interactive sourcing too)
Description
Internal: load required packages (used for interactive sourcing too)
Usage
.wj_load(pkgs = c("data.table"), attach = FALSE, quiet = TRUE)
Get weatherjoin option with default
Description
Get weatherjoin option with default
Usage
.wj_opt(name, default)
Join gridded weather data to an event table
Description
Attach gridded weather variables from NASA POWER to rows of an event table. The function:
standardizes/validates time input (single timestamp column or multiple time columns),
plans efficient provider calls by clustering locations (default) and splitting sparse time ranges,
caches downloaded weather segments locally and reuses them,
joins weather back to events using exact or rolling joins.
Usage
join_weather(
x,
params,
time,
lat_col = "lat",
lon_col = "lon",
time_api = c("guess", "hourly", "daily"),
tz = "UTC",
roll = c("nearest", "last", "none"),
roll_max_hours = NULL,
spatial_mode = c("cluster", "exact", "by_group"),
group_col = NULL,
cluster_radius_m = 250,
site_elevation = c("constant", "auto"),
elev_constant = 100,
elev_fun = NULL,
community = "ag",
cache_scope = c("user", "project"),
cache_dir = NULL,
verbose = FALSE,
...
)
Arguments
x |
A data.frame/data.table with event rows. |
params |
Character vector of NASA POWER parameter codes (e.g. |
time |
A single column name containing time (POSIXct/Date/character/numeric) OR
a character vector of column names used to assemble a timestamp (e.g. |
lat_col, lon_col |
Column names for latitude and longitude (decimal degrees). |
time_api |
One of |
tz |
Time zone used to interpret/construct input timestamps (default |
roll |
Join behaviour when matching timestamps: |
roll_max_hours |
Maximum allowed time distance (hours) for a rolling match. If NULL, a safe default is used: 1 hour for hourly joins and 24 hours for daily joins. |
spatial_mode |
How to reduce many points to representative locations before calling POWER:
|
group_col |
Grouping column used when |
cluster_radius_m |
Clustering radius in meters when |
site_elevation |
Elevation strategy for POWER calls: |
elev_constant |
Constant elevation (meters) used when |
elev_fun |
Optional function |
community |
Passed to |
cache_scope |
Where to store cache by default: |
cache_dir |
Optional explicit cache directory. If NULL, determined by |
verbose |
If TRUE, print progress messages. |
... |
Passed through to |
Value
A data.table with weather columns appended. Rows with missing/invalid inputs keep their original values and receive NA weather.
See Also
wj_cache_list, wj_cache_clear, weatherjoin_options
weatherjoin options
Description
Most users will not need to change package options. Advanced configuration can be
controlled via options().
Details
Cache policy
-
weatherjoin.cache_max_age_daysCache entries older than this (days) are considered stale (default60). -
weatherjoin.cache_refreshWhen to refetch: one of"if_missing","if_stale","always"(default"if_missing"). -
weatherjoin.cache_match_modeCache matching mode:"cover"(cached window covers requested) or"exact"(default"cover"). -
weatherjoin.cache_param_matchParameter matching for cache reuse:"superset"or"exact"(default"superset"). -
weatherjoin.cache_pkgInternal namespace used whencache_scope="user"(default"weatherjoin").
Time splitting and call planning
These options control how sparse time series are split into separate provider calls. They are primarily performance controls; incorrect values will not change the meaning of returned weather values, only how much data is downloaded and cached.
-
weatherjoin.split_penalty_hoursGap threshold (hours). Larger values yield fewer, wider time windows (default72). -
weatherjoin.pad_hoursPadding (hours) added to both ends of each planned time window (default0). -
weatherjoin.max_partsMaximum number of planned time windows per representative location (default50).
Time construction
-
weatherjoin.dummy_hourHour used when constructing daily timestamps (default12).
Diagnostics
-
weatherjoin.keep_rep_colsIfTRUE, keep representative-location diagnostics (rep_lon/rep_lat, distance, elevation) in outputs (defaultFALSE).
Use withr for temporary changes:
withr::local_options(list( weatherjoin.split_penalty_hours = 168, weatherjoin.max_parts = 25 ))
Clear cached weather data
Description
Deletes cached files and (optionally) removes rows from the cache index.
Usage
wj_cache_clear(
cache_dir = NULL,
cache_scope = c("user", "project"),
pkg = "weatherjoin",
filter = NULL,
keep_index = FALSE,
dry_run = FALSE,
verbose = TRUE
)
Arguments
cache_dir |
Optional explicit cache directory. |
cache_scope |
Where to store cache by default: |
pkg |
Package name used for |
filter |
Optional expression evaluated within the cache index to select entries to remove. |
keep_index |
If |
dry_run |
If |
verbose |
If |
Value
Invisibly returns the rows selected for deletion.
List cached weather segments
Description
Returns the cache index (one row per cached segment).
Usage
wj_cache_list(
cache_dir = NULL,
cache_scope = c("user", "project"),
pkg = "weatherjoin"
)
Arguments
cache_dir |
Optional explicit cache directory. |
cache_scope |
Where to store cache by default: |
pkg |
Package name used for |
Value
A data.table index of cached segments.
Upgrade cache index schema
Description
Ensures the cache index contains required columns and correct types.
Usage
wj_cache_upgrade_index(
cache_dir = NULL,
cache_scope = c("user", "project"),
pkg = "weatherjoin",
verbose = TRUE
)
Arguments
cache_dir |
Optional explicit cache directory. |
cache_scope |
Where to store cache by default: |
pkg |
Package name used for |
verbose |
If |
Value
The upgraded cache index.