Attributes In-Depth

In HDF5, attributes are small pieces of metadata attached to groups or datasets. They are best used to store descriptive information: units, timestamps, descriptions, or experimental parameters—separately from the main data array.

This vignette covers how to write, read, and manage these attributes using h5lite, as well as important limitations regarding their structure.

library(h5lite)
file <- tempfile(fileext = ".h5")

Writing Attributes

There are two ways to write attributes in h5lite: explicitly (targeting an object) or implicitly (saving R attributes).

1. Explicit Writing

You can write an attribute to any existing group or dataset using the attr argument in h5_write(). This is useful for adding metadata after the data has been saved.

# First, write a dataset
h5_write(1:10, file, "measurements/temperature")

# Now, attach attributes to it
h5_write(I("Celsius"),    file, "measurements/temperature", attr = "units")
h5_write(I("2023-10-27"), file, "measurements/temperature", attr = "date")
h5_write(I(0.1),          file, "measurements/temperature", attr = "precision")

Note: If the attribute already exists, it will be overwritten.

2. Implicit Writing (R Attributes)

h5lite automatically preserves custom R attributes attached to your objects. When you write an R object, any attributes (except for standard internal ones like dim, names, or class) are written as HDF5 attributes.

# Create a vector with custom R attributes
data <- rnorm(5)
attr(data, "description") <- I("Randomized control group")
attr(data, "valid")       <- I(TRUE)

# Write the object
h5_write(data, file, "experiment/control")

# Check the file - the attributes are there
h5_attr_names(file, "experiment/control")
#> [1] "description" "valid"

h5_str(file)
#> /
#> ├── measurements/
#> │   └── temperature <uint8 × 10>
#> │       ├── @units <utf8[7] scalar>
#> │       ├── @date <utf8[10] scalar>
#> │       └── @precision <float64 scalar>
#> └── experiment/
#>     └── control <float64 × 5>
#>         ├── @description <utf8[24] scalar>
#>         └── @valid <uint8 scalar>

Reading Attributes

1. Accessing Specific Attributes

If you only need a specific piece of metadata without reading the full dataset, you can use h5_read(..., attr = "name").

# Read just the 'units' attribute
units <- h5_read(file, "measurements/temperature", attr = "units")
print(units)
#> [1] "Celsius"

2. Reading with the Dataset

When you read a dataset, h5lite automatically reads all attached attributes and re-attaches them to the resulting R object.

# Read the full dataset
temps <- h5_read(file, "measurements/temperature")

# The attributes are available in R
attributes(temps)
#> $units
#> [1] "Celsius"
#> 
#> $date
#> [1] "2023-10-27"
#> 
#> $precision
#> [1] 0.1

str(temps)
#>  int [1:10] 1 2 3 4 5 6 7 8 9 10
#>  - attr(*, "units")= chr "Celsius"
#>  - attr(*, "date")= chr "2023-10-27"
#>  - attr(*, "precision")= num 0.1

Managing Attributes

Listing Attributes

Use h5_attr_names() to list the names of all attributes attached to a specific object.

h5_attr_names(file, "measurements/temperature")
#> [1] "units"     "date"      "precision"

Deleting Attributes

You can remove a specific attribute using h5_delete().

# Delete the 'precision' attribute
h5_delete(file, "measurements/temperature", attr = "precision")

# Verify removal
h5_attr_names(file, "measurements/temperature")
#> [1] "units" "date"

Important Limitations

While attributes are powerful for storing metadata, they are fundamentally simpler structures than HDF5 Datasets. HDF5 enforces specific constraints that affect how h5lite can store complex R objects as attributes.

1. No Dimension Scales (Loss of Names)

HDF5 Dimension Scales (the mechanism h5lite uses to store names, dimnames, and row.names) can only be attached to Datasets. They cannot be attached to attributes.

This means if you write a named vector, matrix, or array as an attribute, the names will be lost.

# A vector with names
named_vec <- c(a = 1, b = 2, c = 3)

# Write as a standard Dataset -> Names are preserved
h5_write(named_vec, file, "my_dataset")
h5_names(file, "my_dataset")
#> [1] "a" "b" "c"

# Write as an Attribute -> Names are LOST
h5_write(named_vec, file, "measurements/temperature", attr = "meta_vec")
h5_names(file, "measurements/temperature", attr = "meta_vec")
#> character(0)

Exception: Data Frames There is one major exception: data.frame objects.

Because HDF5 stores data frames as Compound Types, the column names are baked into the type definition itself, not stored as side-loaded metadata. Therefore, column names are preserved even when writing a data frame as an attribute. However, row.names (which rely on dimension scales) will still be lost.

# A data frame with metadata
df <- data.frame(
  id = 1:3, 
  status = c("ok", "fail", "ok")
)

# Write as attribute
h5_write(df, file, "measurements/temperature", attr = "log")

# Column names survive!
h5_names(file, "measurements/temperature", attr = "log")
#> [1] "id"     "status"

2. No Attributes on Attributes (Nesting)

In HDF5, you cannot attach attributes to other attributes. This hierarchy is strictly one level deep: Groups/Datasets can have attributes, but attributes cannot.

Consequently, you cannot treat an attribute as a “Group” or folder to store other items. If you need a hierarchical structure for your metadata, you should create a Group (e.g., /metadata) and store your metadata as Datasets inside it, rather than attaching them as attributes to another object.

Controlling Attribute Types

Attributes in HDF5 are typed just like datasets. h5lite allows you to control the storage type of attributes using the as argument in h5_write() or h5_read().

To target an attribute specifically, prefix the name with @ in the as vector.

Customizing Storage Type

# Write the temperature data again, but use a fixed length string for 'description'
h5_write(data, file, "experiment/control", as = c("@description" = "ascii[]"))

# Store an attribute as a `uint8` instead of the default `int32`
h5_write(I(42), file, "measurements/temperature", "sensor_id", as = "uint8")

Customizing Read Type

You can also coerce attributes when reading them.

# Force the 'valid' attribute to be read as logical, even if stored as integer
meta <- h5_read(file, "experiment/control", attr = "valid", as = "logical")

Special Note: Dimensions

You might notice that standard R attributes like dim are not visible in h5_attr_names().

This is because h5lite handles structural attributes implicitly. The dimensions of the attribute data itself are stored in the HDF5 Dataspace, not as a separate attribute. h5lite automatically restores the dim attribute on the R object when reading, ensuring matrices and arrays retain their shape.

mirror server hosted at Truenetwork, Russian Federation.