Title: Wrapper for Statistics Portugal API
Version: 0.3.0
Description: An R6-based client to facilitate interaction with the Statistics Portugal (Instituto Nacional de Estatistica - INE) API (https://www.ine.pt/xportal/xmain?xpid=INE&xpgid=ine_api&INST=322751522&xlang=en).
License: MIT + file LICENSE
URL: https://c-matos.github.io/ineptr2/, https://github.com/c-matos/ineptr2
BugReports: https://github.com/c-matos/ineptr2/issues
Imports: httr2, jsonlite, R6, rlang, xml2
Suggests: knitr, rmarkdown, testthat (≥ 3.0.0), withr
VignetteBuilder: knitr
Depends: R (≥ 4.1.0)
Encoding: UTF-8
RoxygenNote: 7.3.3
Config/testthat/edition: 3
NeedsCompilation: no
Packaged: 2026-04-27 10:13:12 UTC; carlmatos
Author: Carlos Matos ORCID iD [aut, cre, cph]
Maintainer: Carlos Matos <carlosmdmatos@gmail.com>
Repository: CRAN
Date/Publication: 2026-04-28 20:10:02 UTC

ineptr2: Wrapper for Statistics Portugal API

Description

logo

An R6-based client to facilitate interaction with the Statistics Portugal (Instituto Nacional de Estatistica - INE) API (https://www.ine.pt/xportal/xmain?xpid=INE&xpgid=ine_api&INST=322751522&xlang=en).

Author(s)

Maintainer: Carlos Matos carlosmdmatos@gmail.com (ORCID) [copyright holder]

See Also

Useful links:


INE API Client

Description

An R6 class providing access to the Statistics Portugal (INE) API. Holds configuration state (language, caching preferences) and provides methods for retrieving data, metadata, and indicator catalog.

See INEClient-fields for configurable fields (language, caching, timeouts, etc.).

Data

get_data(indicator, row_limit, ...)

Retrieve tidy data for an indicator, with automatic chunking and optional caching.

download_data(indicator, row_limit, ...)

Download data to the file cache without loading into memory.

load_raw_data(indicator)

Load previously downloaded raw JSON data from the file cache.

preview_chunks(indicator, row_limit, ...)

Preview how many API chunks a download would require.

Metadata

get_metadata(indicator)

Get cleaned metadata for an indicator.

info(indicator)

Print a summary of an indicator's key properties.

get_dim_info(indicator)

Get dimension descriptions.

get_dim_values(indicator, dims)

Get possible values for all dimensions.

is_valid(indicator)

Check if an indicator exists.

is_updated(indicator, last_updated, metadata)

Check if an indicator has been updated since last download.

Catalog

get_catalog()

Download and parse the full indicator catalog (~10 min).

download_catalog()

Download the catalog to the file cache.

Cache

list_cached()

List indicators present in the file cache.

clear_cache(indicator)

Clear cached files.

Active bindings

lang

Language code ("PT" or "EN").

use_cache

Whether caching is enabled.

cache_dir

Cache directory path, or NULL for default.

row_limit

Default maximum output rows per API request.

max_retries

Maximum retry attempts for chunk downloads.

progress_interval

Print progress every N chunks during downloads.

timeout

Timeout in seconds for API requests.

Methods

Public methods


Method new()

Create a new INE API client.

Usage
INEClient$new(
  lang = "PT",
  use_cache = FALSE,
  cache_dir = NULL,
  row_limit = 1000000L,
  max_retries = 3L,
  progress_interval = 10L,
  timeout = 300
)
Arguments
lang

Language code: "PT" (default) or "EN".

use_cache

Logical. Whether to cache API responses. Default FALSE.

cache_dir

Character or NULL. Cache directory path. If NULL (default), uses tools::R_user_dir("ineptr2", "cache").

row_limit

Integer. Default maximum output rows per API request. Default 1000000L.

max_retries

Integer. Maximum retry attempts for failed chunk downloads. Default 3L.

progress_interval

Integer. Print a progress message every N chunks during downloads. Default 10L.

timeout

Numeric. Timeout in seconds for API requests (metadata and data endpoints). Default 300 (5 minutes). The catalog endpoint uses a separate, longer timeout.

Returns

A new INEClient object.


Method get_data()

Retrieve tidy data for an indicator.

Usage
INEClient$get_data(indicator, row_limit = NULL, ...)
Arguments
indicator

INE indicator ID as a 7-digit string. Example: "0010003".

row_limit

Integer or NULL. Maximum output rows per API request before splitting into multiple calls. If NULL (default), uses the client's row_limit field. See Details.

...

Dimension filters. Each argument should be named dimN (where N is the dimension number) with a character vector of values. Omitted dimensions include all values.

Details
Row limit and chunking

The INE API limits each request to 1 000 000 output rows, counted as the product of unique values across all dimensions. When the estimated row count exceeds row_limit, the request is automatically split into smaller chunks by iterating over one or more dimensions.

If requests are timing out, try lowering row_limit (or increasing the client's timeout field) to produce more, smaller chunks.

Caching

When use_cache is enabled, processed data is stored as an RDS file. Subsequent calls with the same or narrower dimension filters return the cached result without hitting the API. Changing filters to include values outside the cached set triggers a fresh download.

Returns

A data frame with the indicator data.


Method download_data()

Download data for an indicator to the file cache without loading it into memory. Caching is temporarily enabled for the duration of the call regardless of the client's use_cache setting.

Usage
INEClient$download_data(indicator, row_limit = NULL, ...)
Arguments
indicator

INE indicator ID as a 7-digit string. Example: "0010003".

row_limit

Integer or NULL. Maximum output rows per API request before splitting into multiple calls. If NULL (default), uses the client's row_limit field.

...

Dimension filters in the form dimN = value.

Returns

Invisibly, a list with indicator, cache_dir, total_chunks, and complete, or invisible(NULL) on partial download failure (resume by calling again).


Method load_raw_data()

Load previously downloaded raw data from the file cache as a list of parsed JSON responses. Use download_data() first to populate the cache.

Usage
INEClient$load_raw_data(indicator)
Arguments
indicator

INE indicator ID as a 7-digit string. Example: "0010003".

Returns

A list with responses (parsed JSON) and urls.


Method get_metadata()

Get cleaned metadata for an indicator.

Usage
INEClient$get_metadata(indicator)
Arguments
indicator

INE indicator ID as a 7-digit string. Example: "0010003".

Returns

API response body as a list.


Method get_catalog()

Get the full INE indicator catalog. This operation is very time-consuming (~10 minutes) as it downloads the entire catalog from the INE API. Consider using download_catalog() to cache the result for subsequent calls.

Usage
INEClient$get_catalog()
Returns

A data frame with one row per indicator.


Method download_catalog()

Download the INE indicator catalog to the file cache without loading it into memory. This operation is time-consuming (~10 minutes) as it downloads the entire catalog from the INE API. Subsequent calls return the cached file immediately. Caching is temporarily enabled for the duration of the call regardless of the client's use_cache setting.

Usage
INEClient$download_catalog()
Returns

Invisibly, the cache file path.


Method info()

Print a summary of an indicator's key properties: code, name, periodicity and time range, last update date, and a per-dimension breakdown of unique values. Labels are displayed in the client's current language.

Usage
INEClient$info(indicator)
Arguments
indicator

INE indicator ID as a 7-digit string. Example: "0010003".

Returns

Invisibly, a list with code, name, periodicity, first_period, last_period, last_updated, and dimensions (a data frame with dim_num, name, and n_values columns).


Method get_dim_info()

Get dimension descriptions for an indicator.

Usage
INEClient$get_dim_info(indicator)
Arguments
indicator

INE indicator ID as a 7-digit string. Example: "0010003".

Returns

A data frame with dim_num, abrv, and versao columns.


Method get_dim_values()

Get possible values for all dimensions of an indicator.

Usage
INEClient$get_dim_values(indicator, dims = NULL)
Arguments
indicator

INE indicator ID as a 7-digit string. Example: "0010003".

dims

Integer vector of dimension numbers to include, or NULL (default) for all dimensions.

Returns

A tidy data frame with dimension values.


Method preview_chunks()

Preview how many API chunks a download would require, without fetching any data. Useful for estimating download time before committing to a large request.

Usage
INEClient$preview_chunks(indicator, row_limit = NULL, ...)
Arguments
indicator

INE indicator ID as a 7-digit string. Example: "0010003".

row_limit

Integer or NULL. Maximum output rows per API request before splitting into multiple calls. If NULL (default), uses the client's row_limit field.

...

Dimension filters in the form dimN = value.

Returns

Invisibly, a list with chunks and estimated_rows.


Method is_valid()

Check if an indicator exists and is callable via the INE API.

Usage
INEClient$is_valid(indicator)
Arguments
indicator

INE indicator ID as a 7-digit string. Example: "0010003".

Returns

TRUE if indicator exists, FALSE otherwise.


Method is_updated()

Check if an indicator has been updated since last download.

Usage
INEClient$is_updated(indicator, last_updated = NULL, metadata = NULL)
Arguments
indicator

INE indicator ID as a 7-digit string. Example: "0010003".

last_updated

A Date object or a character string in "YYYY-MM-DD" format. If provided, takes precedence over cached metadata. If NULL (default), the function looks for cached metadata or the metadata argument.

metadata

A metadata list object as returned by get_metadata(). If provided and last_updated is NULL, extracts DataUltimaAtualizacao.

Returns

TRUE if updated, FALSE if not.


Method list_cached()

List indicators present in the file cache.

Usage
INEClient$list_cached()
Returns

A data frame with one row per cached indicator and columns indicator, has_metadata, has_data, chunks_downloaded, chunks_total, and download_complete. Returns a zero-row data frame if no cache exists.


Method clear_cache()

Clear cached files.

Usage
INEClient$clear_cache(indicator = NULL)
Arguments
indicator

Optional INE indicator ID. If NULL (default), clears all cached files.

Returns

Invisibly returns TRUE if files were removed, FALSE otherwise.


Method print()

Print a summary of the client configuration.

Usage
INEClient$print(...)
Arguments
...

Ignored.


Method clone()

The objects of this class are cloneable with this method.

Usage
INEClient$clone(deep = FALSE)
Arguments
deep

Whether to make a deep clone.

See Also

INEClient-fields for field descriptions.

Examples


# -- Setup --
ine <- INEClient$new()
ine <- INEClient$new(lang = "EN", use_cache = TRUE)
print(ine)

# -- Metadata --
meta <- ine$get_metadata("0010003")
ine$info("0010003")
dims <- ine$get_dim_info("0010003")
vals <- ine$get_dim_values("0010003")

# -- Data --
df <- ine$get_data("0010003")
df <- ine$get_data("0010003", dim1 = "S7A2024", dim2 = c("11", "17"))
ine$preview_chunks("0008273")

# -- Validation --
ine$is_valid("0010003")
ine$is_updated("0010003", last_updated = "2024-01-01")

# -- Cache --
ine$list_cached()
ine$clear_cache()


INEClient configuration fields

Description

Configuration fields for the INEClient class. All fields are implemented as active bindings with validation. Set them with ine$field <- value and read them with ine$field.

Arguments

lang

Character. Language code: "PT" (default) or "EN". Affects API responses, cache file paths, and display labels.

use_cache

Logical. Whether to cache API responses locally. Default FALSE.

cache_dir

Character or NULL. Cache directory path. If NULL (default), uses tools::R_user_dir("ineptr2", "cache").

row_limit

Integer. Maximum output rows per API request before splitting into chunks. Must be between 1 and 1 000 000 (the API ceiling). Default 1000000L.

max_retries

Integer. Maximum retry attempts for failed chunk downloads. Default 3L.

progress_interval

Integer. Print a progress message every N chunks during downloads. Default 10L.

timeout

Numeric. Timeout in seconds for API requests (metadata and data endpoints). Default 300 (5 minutes). The catalog endpoint uses a separate, longer timeout.

See Also

INEClient for methods.

Examples

ine <- INEClient$new()
ine$lang
ine$lang <- "EN"

ine$use_cache <- TRUE
ine$cache_dir <- tempdir()
ine$row_limit <- 500000L


Calculate dimension lengths from raw metadata

Description

Calculate dimension lengths from raw metadata

Usage

calc_dims_length_from_raw(metadata_raw)

Arguments

metadata_raw

Raw metadata list from the INE API.

Value

A data.frame with dim_num and n columns.


Extract dimension values from raw metadata

Description

Extract dimension values from raw metadata

Usage

extract_dim_values(metadata_raw)

Arguments

metadata_raw

Raw metadata list from the INE API.

Value

A data.frame with dimension values.


Filter cached data frame to match current dimension filters

Description

Filter cached data frame to match current dimension filters

Usage

filter_cached_data(data, current_filters, dim_values)

Arguments

data

Cached data.frame

current_filters

Normalized dimension filters from current request

dim_values

Dimension values tibble from metadata (output of extract_dim_values)

Value

Filtered data.frame


Finalize a chunk by validating and renaming from .part to .json

Description

Finalize a chunk by validating and renaming from .part to .json

Usage

finalize_chunk(temp_path, final_path)

Arguments

temp_path

Path to the temporary .part file

final_path

Path to the final .json file

Value

TRUE on success, FALSE on failure


Gracefully handle HTTP request failures

Description

Validates connectivity, performs the request, and downgrades HTTP errors to messages instead of stopping.

Usage

gracefully_fail(request, path = NULL)

Arguments

request

An httr2 request object.

path

Optional file path to save the response body to disk.

Value

An httr2 response object, or invisible(NULL) on failure.


Check if current dimension filters are a subset of cached filters

Description

Check if current dimension filters are a subset of cached filters

Usage

is_filter_subset(current, cached)

Arguments

current

Named list of current dimension filters (normalized)

cached

Named list of cached dimension filters (normalized)

Value

TRUE if every value in current is available in cached


Normalize dimension filters for consistent comparison

Description

Normalize dimension filters for consistent comparison

Usage

normalize_dim_filters(filters)

Arguments

filters

Named list from ... (e.g., list(dim1 = "S7A2022", dim2 = c("11","17")))

Value

Named list with lowercase names and sorted character values


Process raw INE catalog XML into a tibble

Description

Process raw INE catalog XML into a tibble

Usage

process_ine_catalog(xml_string)

Arguments

xml_string

Character string with the raw catalog XML content.

Value

A data frame with one row per indicator.


Process raw INE API responses into tidy dataframe

Description

Process raw INE API responses into tidy dataframe

Usage

process_ine_data(raw_data)

Arguments

raw_data

List containing parsed JSON responses and urls from fetch_data_raw()

Value

Tidy data.frame with INE data


Convert a list of named lists to a data.frame

Description

Handles NULL values by replacing them with NA, unlike as.data.frame which silently drops NULL elements.

Usage

records_to_df(records)

Arguments

records

A list of named lists with consistent field names.

Value

A data.frame.

mirror server hosted at Truenetwork, Russian Federation.