| Title: | Energy Burden Analysis Using Net Energy Return Methodology |
| Version: | 0.6.1 |
| Description: | Calculate and analyze household energy burden using the Net Energy Return aggregation methodology. Functions support weighted statistical calculations across geographic and demographic cohorts, with utilities for formatting results into publication-ready tables. Methods are based on Scheier & Kittner (2022) <doi:10.1038/s41467-021-27673-y>. |
| License: | AGPL (≥ 3) |
| Encoding: | UTF-8 |
| Language: | en-US |
| RoxygenNote: | 7.3.3 |
| Depends: | R (≥ 4.1.0) |
| Imports: | dplyr, httr, rappdirs, readr, rlang, scales, spatstat.univar, stats, stringr, tibble, tidyr |
| Suggests: | covr, DBI, httptest2, jsonlite, kableExtra, knitr, mockery, rmarkdown, RSQLite, rticles, testthat (≥ 3.0.0), tinytex, withr |
| VignetteBuilder: | knitr |
| LazyData: | true |
| URL: | https://github.com/ericscheier/emburden, https://ericscheier.info/emburden/ |
| BugReports: | https://github.com/ericscheier/emburden/issues |
| NeedsCompilation: | no |
| Packaged: | 2025-12-20 03:28:16 UTC; runner |
| Author: | Eric Scheier [aut, cre, cph] |
| Maintainer: | Eric Scheier <eric@scheier.org> |
| Repository: | CRAN |
| Date/Publication: | 2026-01-07 08:00:02 UTC |
emburden: Energy Burden Analysis Using Net Energy Return Methodology
Description
Calculate and analyze household energy burden using the Net Energy Return aggregation methodology. Functions support weighted statistical calculations across geographic and demographic cohorts, with utilities for formatting results into publication-ready tables. Methods are based on Scheier & Kittner (2022) doi:10.1038/s41467-021-27673-y.
Author(s)
Maintainer: Eric Scheier eric@scheier.org [copyright holder]
See Also
Useful links:
Report bugs at https://github.com/ericscheier/emburden/issues
Aggregate cohort data by census tract and income bracket
Description
Aggregate cohort data by census tract and income bracket
Usage
aggregate_cohort_data(data, dataset, vintage, verbose = FALSE)
Backup Production Database
Description
Creates a timestamped backup of the production database
Usage
backup_db()
Value
Path to backup file, or NULL if no database exists
Cache and Database Management Utilities
Description
Cache and Database Management Utilities
Calculate Weighted Metrics for Energy Burden Analysis
Description
Calculates weighted statistical metrics (mean, median, quantiles) for a specified energy metric, with optional grouping by geographic or demographic categories. This is the primary function for aggregating household-level energy burden data using proper weighting by household counts.
Usage
calculate_weighted_metrics(
graph_data,
group_columns,
metric_name,
metric_cutoff_level,
upper_quantile_view = 1,
lower_quantile_view = 0
)
Arguments
graph_data |
A data frame containing household energy burden data with columns for the metric of interest, household counts, and optional grouping variables |
group_columns |
Character vector of column names to group by, or NULL for no grouping (calculates overall statistics) |
metric_name |
Character string specifying the column name of the metric to analyze (e.g., "ner" for Net Energy Return) |
metric_cutoff_level |
Numeric value defining the poverty threshold for the metric (e.g., 15.67 for Nh corresponding to 6% energy burden) |
upper_quantile_view |
Numeric between 0 and 1 specifying the upper quantile to calculate (default: 1.0 for maximum) |
lower_quantile_view |
Numeric between 0 and 1 specifying the lower quantile to calculate (default: 0.0 for minimum) |
Details
This function requires the spatstat package for weighted quantile
calculations. It automatically handles missing values and ensures that
statistics are only calculated when sufficient data points exist (n >= 3).
The function adds an "All" category row that aggregates across all groups, in addition to the individual group statistics.
Value
A data frame with one row per group (or one row if ungrouped) containing:
household_count |
Total number of households in the group |
households_below_cutoff |
Number of households below poverty threshold |
pct_in_group_below_cutoff |
Proportion of group below threshold |
metric_mean |
Weighted mean of the metric |
metric_median |
Weighted median of the metric |
metric_upper |
Upper quantile value |
metric_lower |
Lower quantile value |
metric_max |
Maximum value in group |
metric_min |
Minimum value in group |
Examples
# Calculate metrics for NC cooperatives using Nh
library(dplyr)
# Sample data
data <- data.frame(
cooperative = rep(c("Coop A", "Coop B"), each = 3),
ner = c(20, 15, 25, 18, 22, 12),
households = c(1000, 500, 750, 900, 600, 400)
)
# Calculate weighted metrics by cooperative
results <- calculate_weighted_metrics(
graph_data = data,
group_columns = "cooperative",
metric_name = "ner",
metric_cutoff_level = 15.67,
upper_quantile_view = 0.95,
lower_quantile_view = 0.05
)
Check Available Data Sources
Description
Check which data sources are available locally (database, CSV files, or will require download from OpenEI).
Usage
check_data_sources(verbose = TRUE)
Arguments
verbose |
Logical, print detailed status (default TRUE) |
Value
A list with status of each data source
Examples
# Check what data is available
check_data_sources()
Clear all emburden cache and database
Description
Nuclear option: clears ALL cached data and database. Use with caution - will require re-downloading all data.
Usage
clear_all_cache(confirm = FALSE, verbose = TRUE)
Arguments
confirm |
Logical, must be TRUE to proceed (safety check) |
verbose |
Logical, print progress messages |
Value
Invisibly returns list with: cache_cleared (logical), db_cleared (logical)
Examples
# Clear everything (requires confirm = TRUE)
clear_all_cache(confirm = TRUE)
Clear cache for a specific dataset
Description
Removes cached CSV files and database entries for a specific dataset/vintage. Useful when you know a specific dataset is corrupted.
Usage
clear_dataset_cache(
dataset = c("ami", "fpl"),
vintage = c("2018", "2022"),
verbose = TRUE
)
Arguments
dataset |
Character, "ami" or "fpl" |
vintage |
Character, "2018" or "2022" |
verbose |
Logical, print progress messages |
Value
Invisibly returns number of items cleared
Examples
# Clear corrupted AMI 2018 cache
clear_dataset_cache("ami", "2018")
# Clear FPL 2022 cache
clear_dataset_cache("fpl", "2022", verbose = TRUE)
Clear Test Database and Cache
Description
Safe function to clear test database and cache for testing. NEVER touches production database.
Usage
clear_test_environment()
Colorize Text for Knitted Documents
Description
Wraps text in color formatting appropriate for the output format (LaTeX or HTML). This function is intended for use within R Markdown/knitr documents.
Usage
colorize(x, color)
Arguments
x |
Character string to colorize |
color |
Character string specifying the color name (e.g., "red", "blue") |
Details
This function detects the knitr output format and applies appropriate color
formatting. For LaTeX output, it uses \\textcolor{}. For HTML output, it
uses <span style='color: ...'>.
Value
Character string wrapped in LaTeX or HTML color commands, or unchanged if output format is neither
Examples
# In an R Markdown document:
colorize("Important text", "red")
Compare Energy Burden Between Years
Description
Compare household energy burden metrics across different data vintages, using proper Net Energy Return (Nh) aggregation methodology.
Usage
compare_energy_burden(
dataset = c("ami", "fpl"),
states = NULL,
group_by = "income_bracket",
counties = NULL,
vintage_1 = "2022",
vintage_2 = "2018",
format = TRUE,
strict_matching = TRUE
)
Arguments
dataset |
Character, either "ami" or "fpl" for cohort data type |
states |
Character vector of state abbreviations to filter by (optional) |
group_by |
Character or character vector. Use keywords "income_bracket" (default), "state", or "none" for standard groupings. Or provide custom column name(s) for dynamic grouping (e.g., "geoid" for tract-level, c("state_abbr", "income_bracket") for multi-level grouping). Custom columns must exist in the loaded data. |
counties |
Character vector of county names or FIPS codes to filter by (optional).
Requires |
vintage_1 |
Character, first vintage year: "2018" or "2022" (default "2022") |
vintage_2 |
Character, second vintage year: "2018" or "2022" (default "2018") |
format |
Logical, if TRUE returns formatted percentages (default TRUE) |
strict_matching |
Logical, if TRUE (default) only compares income brackets that exist in both vintages and warns about mismatched brackets. If FALSE, compares all brackets (may result in NA values for brackets unique to one vintage). |
Value
A data.frame with energy burden comparison showing:
neb_YYYY: Net Energy Burden for each vintage (where YYYY is the year)
change_pp: Absolute change in percentage points
change_pct: Relative percent change
Examples
# Single state comparison (fast, good for learning)
nc_comparison <- compare_energy_burden("ami", "NC", "income_bracket")
# Overall comparison (no grouping)
compare_energy_burden("ami", "NC", "none")
if (interactive()) {
# Multi-state regional comparison (requires census data download)
southeast <- compare_energy_burden(
dataset = "fpl",
states = c("NC", "SC", "GA", "FL"),
group_by = "state"
)
# Nationwide comparison by income bracket (all 51 states)
us_comparison <- compare_energy_burden(
dataset = "ami",
group_by = "income_bracket"
)
# Compare specific counties within a state (requires census data)
compare_energy_burden("fpl", "NC", counties = c("Orange", "Durham", "Wake"))
# Custom grouping by tract-level geoid (requires census data)
compare_energy_burden("ami", "NC", group_by = "geoid")
}
Check if Database Exists
Description
Check if Database Exists
Usage
db_exists(test = FALSE)
Arguments
test |
Logical, check test database instead of production |
Value
Logical, TRUE if database exists
Calculate Disposable Energy-Adjusted Resources (DEAR)
Description
Calculates DEAR as the ratio of net income after energy spending to gross income. DEAR = (G - S) / G.
Usage
dear_func(g, s, se = NULL)
Arguments
g |
Numeric vector of gross income values |
s |
Numeric vector of energy spending values |
se |
Optional numeric vector of effective energy spending (defaults to s) |
Value
Numeric vector of DEAR values (ratio of disposable income to gross income)
Examples
# Calculate DEAR
dear_func(50000, 3000)
Delete Database (PROTECTED)
Description
Deletes a database with safety checks. Production database requires explicit confirmation.
Usage
delete_db(test = TRUE, confirm = FALSE)
Arguments
test |
Logical, delete test database (default TRUE) |
confirm |
Logical, must be TRUE to delete production database |
Value
Logical, TRUE if deleted successfully
Detect potentially corrupted database data
Description
Checks if loaded data appears corrupted (too small, missing states, missing columns). Does NOT automatically delete - only warns and provides recommendations.
Usage
detect_database_corruption(
data,
dataset,
vintage,
states = NULL,
verbose = TRUE
)
Arguments
data |
Data frame to check |
dataset |
Character, "ami" or "fpl" |
vintage |
Character, "2018" or "2022" |
states |
Character vector of expected states (NULL = all US states) |
verbose |
Logical, print warnings |
Value
List with: is_corrupted (logical), issues (character vector), recommendation (character)
Download and merge data from multiple states
Description
Download and merge data from multiple states
Usage
download_and_merge_states(dataset, vintage, states, verbose = TRUE)
Arguments
dataset |
Character, "ami" or "fpl" |
vintage |
Character, "2018" or "2022" |
states |
Character vector of state abbreviations |
verbose |
Logical, print progress messages |
Value
Combined tibble with data from all states
Download census tract data from OpenEI
Description
Download census tract data from OpenEI
Usage
download_census_tract_data(verbose = FALSE)
Download Dataset from Zenodo
Description
Downloads a pre-processed dataset from the emburden Zenodo repository. Includes progress bars, checksum verification, and automatic retry logic.
Usage
download_from_zenodo(dataset, vintage, verbose = FALSE)
Arguments
dataset |
Character, either "ami" or "fpl" |
vintage |
Character, data vintage: "2018" or "2022" |
verbose |
Logical, print progress messages (default TRUE) |
Value
Tibble with downloaded data, or NULL if download fails
Download LEAD data from OpenEI
Description
Download LEAD data from OpenEI
Usage
download_lead_data(dataset, vintage, states = NULL, verbose = FALSE)
Download Census Tract Data from Zenodo
Description
Downloads pre-processed census tract data from Zenodo.
Usage
download_tracts_from_zenodo(verbose = FALSE)
Arguments
verbose |
Logical, print progress messages (default TRUE) |
Value
Tibble with census tract data, or NULL if download fails
Calculate Energy Burden
Description
Calculates the energy burden as the ratio of energy spending to gross income. Energy burden is defined as E_b = S/G, where S is energy spending and G is gross income.
Usage
energy_burden_func(g, s, se = NULL)
Arguments
g |
Numeric vector of gross income values |
s |
Numeric vector of energy spending values |
se |
Optional numeric vector of effective energy spending (defaults to s) |
Value
Numeric vector of energy burden values (ratio of spending to income)
Examples
# Calculate energy burden for households
gross_income <- c(50000, 75000, 100000)
energy_spending <- c(3000, 3500, 4000)
energy_burden_func(gross_income, energy_spending)
Calculate Energy Return on Investment (EROI)
Description
Calculates the Energy Return on Investment as the ratio of gross income to effective energy spending. EROI = G/Se.
Usage
eroi_func(g, s, se = NULL)
Arguments
g |
Numeric vector of gross income values |
s |
Numeric vector of energy spending values |
se |
Optional numeric vector of effective energy spending (defaults to s) |
Value
Numeric vector of EROI values
Examples
# Calculate EROI for households
eroi_func(50000, 3000)
Find emburden_db.sqlite database
Description
Find emburden_db.sqlite database
Usage
find_emburden_db()
Get all state abbreviations
Description
Get all state abbreviations
Usage
get_all_states()
Value
Character vector of all 51 state abbreviations (50 states + DC)
Get the emburden cache directory
Description
Get the emburden cache directory
Get cache directory for downloaded files
Usage
get_cache_dir()
get_cache_dir()
Convert county identifiers to FIPS codes
Description
Supports both 3-digit county FIPS codes and 5-digit state+county FIPS codes. County names can be matched from the orange_county_sample or nc_sample datasets.
Usage
get_county_fips(counties, states)
Arguments
counties |
Character vector of county identifiers (FIPS codes or names) |
states |
Character vector of state abbreviations for context |
Value
Character vector of 3-digit county FIPS codes
Get the emburden database directory
Description
Get the emburden database directory
Usage
get_database_dir()
Get the full path to the emburden database file
Description
Get the full path to the emburden database file
Usage
get_database_path()
Get Dataset Information
Description
Returns metadata about available LEAD datasets.
Usage
get_dataset_info()
Value
Data frame with dataset information
Examples
get_dataset_info()
Get Database Connection
Description
Get Database Connection
Usage
get_db_connection(test = FALSE)
Arguments
test |
Logical, connect to test database instead of production |
Value
DBI connection object
Get Database Path
Description
Returns the path to the database, with protection against accidental deletion. For tests, use a separate test database.
Usage
get_db_path(test = FALSE)
Arguments
test |
Logical, whether to use test database (default FALSE) |
Value
Path to database file
Get Available Income Brackets for a Dataset and Vintage
Description
Returns the expected income brackets for a given dataset and vintage year. Useful for understanding what brackets are available before running analyses.
Usage
get_income_brackets(dataset, vintage)
Arguments
dataset |
Character, either "ami" or "fpl" |
vintage |
Integer, the year of the data vintage (e.g., 2018, 2022) |
Value
Character vector of income bracket names
Examples
# Get AMI brackets for 2022
get_income_brackets("ami", 2022)
# Get FPL brackets for 2018
get_income_brackets("fpl", 2018)
Get state FIPS codes from abbreviations
Description
Get state FIPS codes from abbreviations
Usage
get_state_fips(state_abbrs)
Get Zenodo Record Information
Description
Returns the Zenodo DOI and file information for emburden datasets.
Usage
get_zenodo_config()
Value
List with Zenodo record information
Harmonize Income Brackets Across Vintages
Description
Harmonizes income bracket categories when comparing data across different vintage years. This is necessary because some datasets (particularly AMI) have different bracket definitions across years.
Usage
harmonize_income_brackets(
data,
dataset,
vintage,
strict_matching = TRUE,
comparison_vintages = NULL
)
Arguments
data |
A data frame containing income bracket data |
dataset |
Character, either "ami" or "fpl" |
vintage |
Integer, the year of the data vintage |
strict_matching |
Logical, if TRUE (default) only keeps brackets that exist in both vintages being compared. If FALSE, keeps all brackets. |
comparison_vintages |
Integer vector of length 2, the vintages being compared (e.g., c(2018, 2022)). Required when strict_matching = TRUE. |
Details
Dataset-Specific Bracket Definitions
AMI (Area Median Income)
-
2018: 3 brackets
-
very_low: Very low income (typically <50% AMI) -
low_mod: Low to moderate income (typically 50-80% AMI) -
mid_high: Middle to high income (typically >80% AMI)
-
-
2022: 5 brackets
-
very_low: Very low income (same as 2018) -
low_mod: Low to moderate income (same as 2018) -
mid_high: Middle to high income (narrower than 2018) -
100-150%: 100-150% of AMI (new in 2022) -
150%+: Above 150% of AMI (new in 2022)
-
FPL (Federal Poverty Level)
-
Both 2018 and 2022: 5 brackets
-
0-100%: Below poverty line -
100-150%: 100-150% of FPL -
150-200%: 150-200% of FPL -
200-400%: 200-400% of FPL -
400%+: Above 400% of FPL
-
Value
A list with components:
-
data: The harmonized data frame -
warnings: Character vector of any warnings about bracket mismatches -
dropped_brackets: Character vector of brackets that were dropped
Aggregate LEAD data by poverty status
Description
Consolidates LEAD cohort data by poverty status, aggregating households and computing weighted averages for income and spending.
Usage
lead_to_poverty(data, dataset)
Arguments
data |
A data frame of processed LEAD data (output from raw_to_lead) |
dataset |
Character string indicating income metric: "ami", "fpl", or "smi" |
Value
A data frame aggregated by geoid, poverty status, housing attributes
List Available Columns in Cohort Data
Description
Returns column names and descriptions for LEAD cohort datasets.
Usage
list_cohort_columns(dataset = NULL, vintage = NULL)
Arguments
dataset |
Character, either "ami" or "fpl" (optional, affects available columns) |
vintage |
Character, "2018" or "2022" (optional, affects available columns) |
Value
Data frame with columns: column_name, description, data_type
Examples
list_cohort_columns()
list_cohort_columns("ami", "2022")
List Available Income Brackets
Description
Returns the income brackets available for a given dataset and vintage.
Usage
list_income_brackets(dataset = c("ami", "fpl"), vintage = "2022")
Arguments
dataset |
Character, either "ami" or "fpl" |
vintage |
Character, "2018" or "2022" |
Value
Character vector of income bracket labels
Examples
list_income_brackets("ami", "2022")
list_income_brackets("fpl", "2018")
List Available States
Description
Returns all state abbreviations available in the LEAD dataset.
Usage
list_states()
Value
Character vector of 51 state abbreviations (50 states + DC)
Examples
list_states()
Load Census Tract Data
Description
Load census tract demographics and utility service territory information with automatic fallback to CSV or OpenEI download.
Usage
load_census_tract_data(states = NULL, verbose = TRUE)
Arguments
states |
Character vector of state abbreviations to filter by (optional) |
verbose |
Logical, print status messages (default TRUE) |
Value
A tibble with columns:
geoid: Census tract identifier
state_abbr: State abbreviation
county_name: County name
tract_name: Tract name
utility_name: Electric utility serving this tract
Additional demographic columns
Examples
if (interactive()) {
# Single state (requires census data download)
nc_tracts <- load_census_tract_data(states = "NC")
# Multiple states (regional)
southeast <- load_census_tract_data(states = c("NC", "SC", "GA", "FL"))
# Nationwide (all ~73,000 census tracts)
us_tracts <- load_census_tract_data() # No filter = all states
}
Load DOE LEAD Tool Cohort Data
Description
Load household energy burden cohort data with automatic fallback:
Try local database
Fall back to local CSV files
Auto-download from OpenEI if neither exists
Auto-import downloaded data to database for future use
Usage
load_cohort_data(
dataset = c("ami", "fpl"),
states = NULL,
counties = NULL,
vintage = "2022",
income_brackets = NULL,
verbose = TRUE,
...
)
Arguments
dataset |
Character, either "ami" (Area Median Income) or "fpl" (Federal Poverty Line) |
states |
Character vector of state abbreviations to filter by (optional) |
counties |
Character vector of county names or FIPS codes to filter by (optional).
County names are matched case-insensitively. Requires |
vintage |
Character, data vintage: "2018" or "2022" (default "2022") |
income_brackets |
Character vector of income brackets to filter by (optional) |
verbose |
Logical, print status messages (default TRUE) |
... |
Additional filter expressions passed to dplyr::filter() for dynamic filtering.
Allows filtering by any column in the dataset using tidyverse syntax.
Example: |
Value
A tibble with columns:
geoid: Census tract identifier
income_bracket: Income bracket label
households: Number of households
total_income: Total household income ($)
total_electricity_spend: Total electricity spending ($)
total_gas_spend: Total gas spending ($)
total_other_spend: Total other fuel spending ($)
TEN: Housing tenure category (1=Owned free/clear, 2=Owned with mortgage, 3=Rented, 4=Occupied without rent). Enables analysis of energy burden differences between renters and owners.
TEN-YBL6: Housing tenure crossed with year structure built (6 categories). Allows analysis of how building age and ownership status interact to affect energy burden (e.g., older rental units vs newer owner-occupied homes).
TEN-BLD: Housing tenure crossed with building type (e.g., single-family, multi-unit). Enables analysis of energy burden across different housing structures and ownership patterns.
TEN-HFL: Housing tenure crossed with primary heating fuel type (e.g., gas, electric, oil). Critical for analyzing how heating fuel choice and tenure status jointly influence energy costs and burden.
Examples
# Single state (fast, good for learning)
nc_ami <- load_cohort_data(dataset = "ami", states = "NC")
# Load specific vintage
nc_2018 <- load_cohort_data(dataset = "ami", states = "NC", vintage = "2018")
if (interactive()) {
# Multiple states (regional analysis - requires data download)
southeast <- load_cohort_data(dataset = "fpl", states = c("NC", "SC", "GA", "FL"))
# Nationwide (all 51 states - no filter)
us_data <- load_cohort_data(dataset = "ami", vintage = "2022")
# Filter to specific income brackets
low_income <- load_cohort_data(
dataset = "ami",
states = "NC",
income_brackets = c("0-30% AMI", "30-50% AMI")
)
# Filter to specific counties within a state
triangle <- load_cohort_data(
dataset = "fpl",
states = "NC",
counties = c("Orange", "Durham", "Wake")
)
# Or use county FIPS codes
orange <- load_cohort_data(
dataset = "fpl",
states = "NC",
counties = "37135"
)
# Use dynamic filtering for custom criteria
high_burden <- load_cohort_data(
dataset = "ami",
states = "NC",
households > 100,
total_electricity_spend / total_income > 0.06
)
# Analyze energy burden by housing characteristics
# Compare renters vs owners by heating fuel type
nc_housing <- load_cohort_data(dataset = "ami", states = "NC")
library(dplyr)
# Group by tenure and heating fuel to analyze energy burden patterns
housing_analysis <- nc_housing %>%
filter(!is.na(TEN), !is.na(`TEN-HFL`)) %>%
group_by(TEN, `TEN-HFL`) %>%
summarise(
total_households = sum(households),
avg_energy_burden = weighted.mean(
(total_electricity_spend + total_gas_spend + total_other_spend) / total_income,
w = households,
na.rm = TRUE
),
.groups = "drop"
)
}
North Carolina Complete Energy Burden Sample Data
Description
A comprehensive dataset containing energy burden data for all counties in North Carolina. This dataset includes both Federal Poverty Line (FPL) and Area Median Income (AMI) cohort data for 2018 and 2022 vintages, aggregated to the census tract × income bracket level.
Usage
nc_sample
Format
A named list with 4 data frames:
- fpl_2018
Federal Poverty Line cohort data for 2018 (~10,805 rows)
- fpl_2022
Federal Poverty Line cohort data for 2022 (~13,185 rows)
- ami_2018
Area Median Income cohort data for 2018 (~6,484 rows)
- ami_2022
Area Median Income cohort data for 2022 (~5,091 rows)
Each data frame contains:
- geoid
11-digit census tract identifier (character)
- income_bracket
Income bracket category (character)
- households
Number of households in this cohort (numeric)
- total_income
Total household income in dollars (numeric)
- total_electricity_spend
Total electricity spending in dollars (numeric)
- total_gas_spend
Total gas spending in dollars (numeric)
- total_other_spend
Total other fuel spending in dollars (numeric)
Details
This sample data provides full state coverage for more comprehensive analysis, testing,
and demonstrations. For lightweight quick demos, see orange_county_sample.
North Carolina (all 100 counties):
2018: 2,163 census tracts
2022: 2,642 census tracts (tract boundaries changed)
Income Brackets:
FPL: 0-100%, 100-150%, 150-200%, 200-400%, 400%+
AMI: Varies by vintage (4-6 categories)
Size: 1.3 MB compressed (.rda)
Source
U.S. Department of Energy Low-Income Energy Affordability Data (LEAD) Tool
2018 vintage: https://data.openei.org/submissions/573
2022 vintage: https://data.openei.org/submissions/6219
See Also
-
orange_county_sample- Lightweight sample (94 KB) for quick demos -
load_cohort_data- Load data for any state with county filtering -
compare_energy_burden- Compare energy burden across vintages -
calculate_weighted_metrics- Calculate weighted metrics with grouping
Examples
# Load sample data
data(nc_sample)
# View structure
names(nc_sample)
# Analyze energy burden by county
library(dplyr)
# Extract county FIPS (first 5 digits of geoid)
nc_sample$fpl_2022 %>%
mutate(county_fips = substr(geoid, 1, 5)) %>%
group_by(county_fips, income_bracket) %>%
summarise(
households = sum(households),
avg_energy_burden = sum(total_electricity_spend + total_gas_spend + total_other_spend) /
sum(total_income),
.groups = "drop"
) %>%
filter(county_fips == "37183") # Wake County
# Compare urban vs rural counties
urban_counties <- c("37119", "37063", "37183") # Mecklenburg, Durham, Wake
rural_counties <- c("37069", "37095", "37131") # Franklin, Hyde, Northampton
nc_sample$fpl_2022 %>%
mutate(
county_fips = substr(geoid, 1, 5),
region = case_when(
county_fips %in% urban_counties ~ "Urban",
county_fips %in% rural_counties ~ "Rural",
TRUE ~ "Other"
)
) %>%
filter(region != "Other") %>%
group_by(region, income_bracket) %>%
summarise(
households = sum(households),
energy_burden = sum(total_electricity_spend + total_gas_spend + total_other_spend) /
sum(total_income),
.groups = "drop"
)
Calculate Net Energy Burden (NEB)
Description
Calculates Net Energy Burden with proper aggregation methodology via the Net Energy Return (Nh) framework. For individual households, NEB = EB = S/G. When aggregating across households (with weights), automatically uses the Nh method to avoid 1-5% aggregation errors.
Usage
neb_func(g, s, se = NULL, weights = NULL, aggregate = FALSE)
Arguments
g |
Numeric vector of gross income values |
s |
Numeric vector of energy spending values |
se |
Optional numeric vector of effective energy spending (defaults to s) |
weights |
Optional numeric vector of weights for aggregation (e.g., household counts).
When provided, uses Nh method: |
aggregate |
Logical, if TRUE forces aggregation even without weights (uses unweighted mean). Default FALSE for backwards compatibility. |
Details
Individual Level: NEB = EB = S/G (mathematically identical)
Aggregation Modes:
-
No aggregation (default): Returns vector of individual NEB values
neb_func(income, spending) # Returns vector
-
Weighted aggregation: Automatically uses Nh method when weights provided
neb_func(income, spending, weights = households) # Returns single value
-
Unweighted aggregation: Use
aggregate = TRUEfor simple meanneb_func(income, spending, aggregate = TRUE) # Returns single value
Why Nh Method? Avoids 1-5% error from naive averaging:
-
CORRECT:
neb_func(g, s, weights = w)→ Uses Nh internally -
WRONG:
weighted.mean(s/g, w)→ Introduces bias
The Nh method: 1 / (1 + weighted.mean(nh, weights)) where nh = (g-s)/se
uses arithmetic mean instead of harmonic mean, providing computational
simplicity and numerical stability.
Value
If
weights = NULLandaggregate = FALSE: Numeric vector of individual NEB values (S/G)If
weightsprovided oraggregate = TRUE: Single aggregated NEB value via Nh method
See Also
ner_func() for the Net Energy Return (Nh) calculation
energy_burden_func() for simple EB without aggregation support
Examples
# Individual household - returns vector
neb_func(50000, 3000) # 0.06
neb_func(c(30000, 50000), c(3000, 3500)) # c(0.10, 0.07)
# Aggregation with weights - returns single value (CORRECT method)
incomes <- c(30000, 50000, 75000)
spending <- c(3000, 3500, 4000)
households <- c(100, 150, 200)
neb_func(incomes, spending, weights = households)
# Unweighted aggregation
neb_func(incomes, spending, aggregate = TRUE)
# Comparison: naive mean (WRONG) vs Nh method (CORRECT)
neb_naive <- weighted.mean(spending/incomes, households) # Biased
neb_correct <- neb_func(incomes, spending, weights = households) # Correct
abs(neb_naive - neb_correct) / neb_correct # ~1-5% error
Calculate Net Energy Return (Nh)
Description
Calculates the Net Energy Return using the formula Nh = (G - S) / Se, where G is gross income, S is energy spending, and Se is effective energy spending. This metric is the preferred aggregation variable as it properly accounts for harmonic mean behavior when aggregating across households.
Usage
ner_func(g, s, se = NULL)
Arguments
g |
Numeric vector of gross income values |
s |
Numeric vector of energy spending values |
se |
Optional numeric vector of effective energy spending (defaults to s) |
Details
The Net Energy Return is mathematically related to energy burden by: E_b = 1 / (Nh + 1), or equivalently: Nh = (1/E_b) - 1
Why use Nh for aggregation?
For individual household data, the Nh method enables simple arithmetic weighted mean aggregation:
-
Via Nh:
neb = 1 / (1 + weighted.mean(nh, weights))(arithmetic mean) -
Direct EB:
neb = 1 / weighted.mean(1/eb, weights)(harmonic mean)
Computational advantages of the arithmetic mean approach:
-
Simpler to compute - Uses standard
weighted.mean()function -
More numerically stable - Avoids division by very small EB values (e.g., 0.01)
-
More interpretable - "Average net return per dollar spent on energy"
-
Prevents errors - Makes it obvious you can't use arithmetic mean on EB directly
For cohort data (pre-aggregated totals), direct calculation sum(S)/sum(G)
is mathematically equivalent to the Nh method but simpler.
The 6% energy burden poverty threshold corresponds to Nh \ge 15.67.
Value
Numeric vector of Net Energy Return (Nh) values
Examples
# Calculate Net Energy Return
gross_income <- 50000
energy_spending <- 3000
nh <- ner_func(gross_income, energy_spending)
# Convert back to energy burden
energy_burden <- 1 / (nh + 1)
Orange County NC Energy Burden Sample Data
Description
A sample dataset containing energy burden data for Orange County, North Carolina (FIPS code 37135). This dataset includes both Federal Poverty Line (FPL) and Area Median Income (AMI) cohort data for 2018 and 2022 vintages.
Usage
orange_county_sample
Format
A named list with 4 data frames:
- fpl_2018
Federal Poverty Line cohort data for 2018 (135 rows)
- fpl_2022
Federal Poverty Line cohort data for 2022 (206 rows)
- ami_2018
Area Median Income cohort data for 2018 (259 rows)
- ami_2022
Area Median Income cohort data for 2022 (149 rows)
Each data frame contains:
- geoid
11-digit census tract identifier (character)
- income_bracket
Income bracket category (character)
- households
Number of households in this cohort (numeric)
- total_income
Total household income in dollars (numeric)
- total_electricity_spend
Total electricity spending in dollars (numeric)
- total_gas_spend
Total gas spending in dollars (numeric)
- total_other_spend
Total other fuel spending in dollars (numeric)
Details
This sample data is provided for quick demos, testing, and vignettes without
requiring a large download. For full state or national analysis, use
load_cohort_data() to download complete datasets from OpenEI.
Orange County NC (Chapel Hill, Carrboro, Hillsborough):
2018: 27 census tracts
2022: 42 census tracts (tract boundaries changed)
Income Brackets:
FPL: 0-100%, 100-150%, 150-200%, 200-400%, 400%+
AMI: very_low, low_mod, mid_high (aggregated from 6 AMI categories)
Source
U.S. Department of Energy Low-Income Energy Affordability Data (LEAD) Tool
2018 vintage: https://data.openei.org/submissions/573
2022 vintage: https://data.openei.org/submissions/6219
See Also
-
load_cohort_data- Load full datasets for any state -
compare_energy_burden- Compare energy burden across vintages -
calculate_weighted_metrics- Calculate weighted metrics with grouping
Examples
# Load sample data
data(orange_county_sample)
# View structure
names(orange_county_sample)
# Quick analysis of 2022 FPL data
library(dplyr)
orange_county_sample$fpl_2022 %>%
group_by(income_bracket) %>%
summarise(
households = sum(households),
avg_energy_burden = sum(total_electricity_spend + total_gas_spend + total_other_spend) /
sum(total_income)
)
Print Comparison Summary
Description
Pretty-print a comparison table from compare_energy_burden()
Usage
## S3 method for class 'energy_burden_comparison'
print(x, ...)
Arguments
x |
Comparison result from compare_energy_burden() |
... |
Additional arguments (not used) |
Value
Returns x invisibly for use in pipe chains.
Process raw LEAD data into analysis-ready format with energy metrics
Description
This is the main processing workflow that:
Converts raw OpenEI data to clean format
Optionally aggregates by poverty status
Adds energy burden and related metrics
Filters out zero-energy records
Usage
process_lead_cohort_data(data, dataset, vintage, aggregate_poverty = FALSE)
Arguments
data |
A data frame of raw LEAD data from OpenEI |
dataset |
Character string indicating dataset type ("ami" or "fpl") |
vintage |
Character string indicating ACS vintage year |
aggregate_poverty |
Logical; if TRUE, aggregate to poverty status level |
Value
A data frame ready for analysis with all energy metrics
Process raw LEAD data into clean format
Description
Converts raw LEAD data downloaded from OpenEI into a standardized clean format suitable for analysis. Handles both 2016 (SH) and 2018+ ACS vintages.
Usage
raw_to_lead(data, vintage)
Arguments
data |
A data frame of raw LEAD data from OpenEI |
vintage |
Character string indicating the ACS vintage year ("2016", "2018", "2022", etc.) |
Value
A data frame with standardized column names:
geoid |
11-digit census tract GEOID as character |
state_abbr |
2-letter state abbreviation (2018+ only) |
housing_tenure |
Housing tenure category |
year_constructed |
Year building was constructed category |
building_type |
Building type category |
min_units |
Minimum number of units in building |
detached |
Whether building is detached (1/0) |
primary_heating_fuel |
Primary heating fuel type |
income_bracket |
Income bracket category (depends on dataset: AMI, FPL, etc.) |
households |
Number of households |
income |
Annual income |
electricity_spend |
Annual electricity spending |
gas_spend |
Annual gas spending |
other_spend |
Annual other fuel spending |
Objects exported from other packages
Description
These objects are imported from other packages. Follow the links below to see their documentation.
- dplyr
Standardize cohort column names across vintages
Description
Standardize cohort column names across vintages
Usage
standardize_cohort_columns(data, dataset, vintage)
Format Large Numbers with Thousand Separators
Description
Converts numeric values to formatted strings with thousand separators (commas).
Usage
to_big(x)
Arguments
x |
Numeric vector to format |
Value
Character vector of formatted numbers
Examples
# Format large numbers
to_big(c(1000, 25000, 1000000))
Format Dollar Amounts in Billions
Description
Converts large dollar values to billions format with dollar sign prefix. Values less than 1 billion are shown in millions.
Usage
to_billion_dollar(x, suffix = " billion", override_to_k = TRUE)
Arguments
x |
Numeric vector to format |
suffix |
Character string to append after "billion" (default: " billion") |
override_to_k |
Logical (currently unused, kept for compatibility) |
Value
Character vector of formatted dollar amounts with "billion" or "m" suffix
Examples
# Format in billions
to_billion_dollar(c(5000000, 1000000000, 2500000000))
Format Number as Dollar Amount
Description
Converts numeric values to formatted dollar strings with appropriate decimal places and thousand separators.
Usage
to_dollar(x, latex = FALSE)
Arguments
x |
Numeric vector to format |
latex |
Logical indicating whether to escape dollar sign for LaTeX (default: FALSE) |
Value
Character vector of formatted dollar amounts
Examples
# Format dollar amounts
to_dollar(c(1000, 2500.50, 10000))
# LaTeX-escaped format
to_dollar(c(1000, 2500.50), latex = TRUE)
Format Numbers in Millions
Description
Converts large numeric values to millions format with appropriate suffix. Values less than 1 million are shown in thousands.
Usage
to_million(x, suffix = " million", override_to_k = TRUE)
Arguments
x |
Numeric vector to format |
suffix |
Character string to append after "million" (default: " million") |
override_to_k |
Logical indicating whether to show values < 1M as thousands (default: TRUE) |
Value
Character vector of formatted numbers with "million" or "k" suffix
Examples
# Format in millions
to_million(c(5000, 1000000, 2500000))
Format Number as Percentage
Description
Converts numeric values to formatted percentage strings with no decimal places by default.
Usage
to_percent(x, latex = FALSE)
Arguments
x |
Numeric vector to format (as proportions, not percentages) |
latex |
Logical indicating whether to escape percent sign for LaTeX (default: FALSE) |
Value
Character vector of formatted percentages
Examples
# Format percentages
to_percent(c(0.25, 0.50, 0.123))
# LaTeX-escaped format
to_percent(c(0.25, 0.50), latex = TRUE)
Try to import cohort data to database
Description
Try to import cohort data to database
Usage
try_import_to_database(data, dataset, vintage, verbose = FALSE)
Try to import census tract data to database
Description
Try to import census tract data to database
Usage
try_import_tracts_to_database(data, verbose = FALSE)
Try to load cohort data from CSV
Description
Try to load cohort data from CSV
Usage
try_load_from_csv(dataset, vintage, verbose = FALSE)
Try to load cohort data from database
Description
Try to load cohort data from database
Usage
try_load_from_database(dataset, vintage, verbose = FALSE)
Try to load census tract data from CSV
Description
Try to load census tract data from CSV
Usage
try_load_tracts_from_csv(verbose = FALSE)
Try to load census tract data from database
Description
Try to load census tract data from database
Usage
try_load_tracts_from_database(verbose = FALSE)
Validate data before caching to database
Description
Performs comprehensive validation BEFORE data is saved to database or cache. Prevents corrupted data from being cached in the first place.
Usage
validate_before_caching(
data,
dataset,
vintage,
expected_states = 51,
strict = TRUE
)
Arguments
data |
Data frame to validate |
dataset |
Character, "ami" or "fpl" |
vintage |
Character, "2018" or "2022" |
expected_states |
Integer, expected number of states (51 for nationwide) |
strict |
Logical, if TRUE throws errors; if FALSE returns list with validation results |
Value
If strict=FALSE, returns list with: valid (logical), issues (character vector) If strict=TRUE, throws error on validation failure