Anyone who works with household survey microdata knows this pattern: you download the raw files, open the codebook, and spend days recoding employment status, harmonizing income variables, and building indicators. Months later, a colleague starts the same project and writes the same code from scratch.
In STATA, teams share .do files, but these are tightly
coupled to specific file paths and variable names, and there is no
standard way to discover or validate them.
Recipes are metasurvey’s answer to this problem. A recipe is a portable, documented, and validated collection of transformation steps that can:
┌──────────────┐ ┌──────────────┐ ┌──────────────┐
│ 1. Develop │────▶│ 2. Package │────▶│ 3. Validate │
│ steps on a │ │ steps into │ │ against new │
│ survey │ │ a recipe │ │ data │
└──────────────┘ └──────────────┘ └──────┬───────┘
│
┌──────────────┐ ┌──────────────┐ │
│ 5. Discover │◀────│ 4. Publish │◀────────────┘
│ & reuse │ │ to registry │
│ recipes │ │ or API │
└──────────────┘ └──────────────┘
The typical starting point is loading a survey with
load_survey(). You can load example data or point to your
own files:
library(metasurvey)
# Load ECH 2022 with example data (downloads from GitHub)
ech_2022 <- load_survey(
load_survey_example("ech", "ech_2022"),
svy_type = "ech",
svy_edition = "2022",
svy_weight = add_weight(annual = "pesoano")
)
# Or load with existing recipes from the registry (requires API server)
ech_2022 <- load_survey(
load_survey_example("ech", "ech_2022"),
svy_type = "ech",
svy_edition = "2022",
svy_weight = add_weight(annual = "pesoano"),
recipes = get_recipe("ech", "2022")
)The most common workflow consists of developing transformations interactively on a survey and then converting the recorded steps into a recipe.
library(metasurvey)
library(data.table)
set.seed(42)
n <- 200
# Simulate survey microdata (standing in for load_survey)
dt <- data.table(
id = 1:n,
age = sample(18:80, n, replace = TRUE),
sex = sample(c(1, 2), n, replace = TRUE),
income = round(runif(n, 5000, 80000)),
activity = sample(c(2, 3, 5, 6), n,
replace = TRUE,
prob = c(0.55, 0.05, 0.05, 0.35)
),
weight = round(runif(n, 0.5, 3.0), 4)
)
svy <- Survey$new(
data = dt,
edition = "2023",
type = "ech",
psu = NULL,
engine = "data.table",
weight = add_weight(annual = "weight")
)
# Develop transformations interactively
svy <- step_compute(svy,
income_thousands = income / 1000,
employed = ifelse(activity == 2, 1L, 0L),
comment = "Income scaling and employment indicator"
)
svy <- step_recode(svy, labor_status,
activity == 2 ~ "Employed",
activity %in% 3:5 ~ "Unemployed",
activity %in% 6:8 ~ "Inactive",
.default = "Other",
comment = "ILO labor force classification"
)
svy <- step_recode(svy, age_group,
age < 25 ~ "Youth",
age < 45 ~ "Adult",
age < 65 ~ "Mature",
.default = "Senior",
comment = "Standard age groups"
)
# Convert all steps to a recipe
labor_recipe <- steps_to_recipe(
name = "Labor Force Indicators",
user = "Research Team",
svy = svy,
description = "Standard labor force indicators following ILO definitions",
steps = get_steps(svy),
topic = "labor"
)
labor_recipe
#>
#> ── Recipe: Labor Force Indicators ──
#> Author: Research Team
#> Survey: ech / 2023
#> Version: 1.0.0
#> Topic: labor
#> Description: Standard labor force indicators following ILO definitions
#> Certification: community
#>
#> ── Requires (3 variables) ──
#> income, activity, age
#>
#> ── Pipeline (3 steps) ──
#> 1. [compute] -> income_thousands, employed "Income scaling and employment indicator"
#> 2. [recode] -> labor_status "ILO labor force classification"
#> 3. [recode] -> age_group "Standard age groups"
#>
#> ── Produces (4 variables) ──
#> labor_status [categorical], age_group [categorical], income_thousands [numeric], employed [numeric]Every recipe can automatically generate its documentation from its
steps. The doc() method returns a list with input
variables, output variables, and the step-by-step pipeline:
# What variables does the recipe need?
doc$input_variables
#> [1] "income" "activity" "age"
# What variables does it create?
doc$output_variables
#> [1] "income_thousands" "employed" "labor_status" "age_group"
# Step-by-step pipeline
doc$pipeline
#> [[1]]
#> [[1]]$index
#> [1] 1
#>
#> [[1]]$type
#> [1] "compute"
#>
#> [[1]]$outputs
#> [1] "income_thousands" "employed"
#>
#> [[1]]$inputs
#> [1] "income" "activity"
#>
#> [[1]]$inferred_type
#> [1] "numeric"
#>
#> [[1]]$comment
#> [1] "Income scaling and employment indicator"
#>
#>
#> [[2]]
#> [[2]]$index
#> [1] 2
#>
#> [[2]]$type
#> [1] "recode"
#>
#> [[2]]$outputs
#> [1] "labor_status"
#>
#> [[2]]$inputs
#> [1] "activity"
#>
#> [[2]]$inferred_type
#> [1] "categorical"
#>
#> [[2]]$comment
#> [1] "ILO labor force classification"
#>
#>
#> [[3]]
#> [[3]]$index
#> [1] 3
#>
#> [[3]]$type
#> [1] "recode"
#>
#> [[3]]$outputs
#> [1] "age_group"
#>
#> [[3]]$inputs
#> [1] "age"
#>
#> [[3]]$inferred_type
#> [1] "categorical"
#>
#> [[3]]$comment
#> [1] "Standard age groups"This documentation is generated automatically, with no manual effort required.
Before applying a recipe to new data, verify that all required
variables exist. The validate() method stops with a clear
error if any dependency is missing:
Attach one or more recipes to a survey and apply them with
bake_recipes():
# Create a fresh survey with same structure (simulating a new edition)
set.seed(99)
dt2 <- data.table(
id = 1:100,
age = sample(18:80, 100, replace = TRUE),
sex = sample(c(1, 2), 100, replace = TRUE),
income = round(runif(100, 5000, 80000)),
activity = sample(c(2, 3, 5, 6), 100,
replace = TRUE,
prob = c(0.55, 0.05, 0.05, 0.35)
),
weight = round(runif(100, 0.5, 3.0), 4)
)
svy2 <- Survey$new(
data = dt2, edition = "2024", type = "ech",
psu = NULL, engine = "data.table",
weight = add_weight(annual = "weight")
)
# Attach and bake
svy2 <- add_recipe(svy2, labor_recipe)
svy2 <- bake_recipes(svy2)
head(get_data(svy2)[, .(id, income_thousands, labor_status, age_group)], 5)
#> id income_thousands labor_status age_group
#> <int> <num> <char> <char>
#> 1: 1 47.346 Unemployed Senior
#> 2: 2 28.114 Employed Mature
#> 3: 3 43.583 Employed Mature
#> 4: 4 13.440 Unemployed Mature
#> 5: 5 55.638 Employed SeniorThe same recipe applied to a different edition produces consistent results. This is how metasurvey ensures reproducibility over time.
In practice, you can load a survey and apply published recipes in a single call:
# Load ECH 2023 and apply the labor recipe from the registry (requires API)
ech_2023 <- load_survey(
load_survey_example("ech", "ech_2023"),
svy_type = "ech",
svy_edition = "2023",
svy_weight = add_weight(annual = "pesoano"),
recipes = get_recipe("ech", "2023", topic = "labor_market"),
bake = TRUE
)Categories help organize recipes by topic:
cats <- default_categories()
vapply(cats, function(c) c$name, character(1))
#> [1] "labor_market" "income" "education" "health" "demographics"
#> [6] "housing"Add categories to a recipe using add_category():
labor_recipe <- add_category(labor_recipe, "labor_market", "Labor market analysis")
labor_recipe <- add_category(labor_recipe, "income", "Income-related indicators")
labor_recipe
#>
#> ── Recipe: Labor Force Indicators ──
#> Author: Research Team
#> Survey: ech / 2023
#> Version: 1.0.0
#> Topic: labor
#> Description: Standard labor force indicators following ILO definitions
#> Certification: community
#> Categories: labor_market, income
#>
#> ── Requires (3 variables) ──
#> income, activity, age
#>
#> ── Pipeline (3 steps) ──
#> 1. [compute] -> income_thousands, employed "Income scaling and employment indicator"
#> 2. [recode] -> labor_status "ILO labor force classification"
#> 3. [recode] -> age_group "Standard age groups"
#>
#> ── Produces (4 variables) ──
#> labor_status [categorical], age_group [categorical], income_thousands [numeric], employed [numeric]The certification system offers three levels of trust:
| Level | Meaning |
|---|---|
community |
User contribution (default), no review |
reviewed |
Peer-reviewed by a recognized team |
official |
Endorsed for official statistics |
Higher certification levels appear first in search results and signal that the recipe has been reviewed.
The real power of recipes lies in sharing them. Every recipe you create can be published to the metasurvey registry, where other researchers can discover, reuse, and build upon your work.
The recommended workflow is to publish recipes to the public API. Anyone can browse recipes without an account; publishing requires registration:
# One-time: register and authenticate
api_register("Your Name", "you@example.com", "password")
api_login("you@example.com", "password")
# Publish your recipe (your profile is attached automatically)
api_publish_recipe(labor_recipe)When authenticated, api_publish_recipe() attaches your
user profile to the recipe. Other users see who published it, along with
institutional affiliation and certification level. This builds
accountability and trust in shared recipes.
No authentication is needed to browse and download recipes:
The Shiny app provides a visual interface for browsing recipes and workflows:
The explorer shows recipe cards with certification badges, download counts, and pipeline previews. Clicking a recipe opens a detail view with the full pipeline, an R code snippet, and links to related workflows.
Institutions that work with confidential or restricted-access surveys may need a private registry. metasurvey supports this via a self-hosted backend with MongoDB:
# Point to your institution's private API
configure_api("https://your-institution.example.com/api")
# From here, the workflow is identical
api_login("analyst@institution.edu", "password")
api_publish_recipe(labor_recipe)
api_list_recipes(survey_type = "ech")See the API and Database vignette for instructions on deploying the Plumber API with MongoDB for your own organization.
"ECH Labor Force Indicators").validate() on sample data to ensure all dependencies
exist.set_version() when updating them.workflow() to compute weighted
estimates from processed data