A summarised result

Introduction

A summarised result is a table that contains aggregated summary statistics (a result set with no patient-level data). The summarised result object consists of two objects: a results table and a settings table.

Results table

This table has 13 columns:

The following table summarises the requirements of each column in the summarised_result format:

Column name Column type is NA allowed? Requirements
result_id integer No NA
cdm_name character No NA
group_name character No name1
group_level character No level1
strata_name character No name2
strata_level character No level2
variable_name character No NA
variable_level character Yes NA
estimate_name character No snake_case
estimate_type character No estimateTypeChoices()
estimate_value character No NA
additional_name character No name3
additional_level character No level3

Settings

The settings table provides one row per result_id with the settings used to generate those results. There is no limit on the number of columns or parameters that can be provided per result_id, but at least three values should be provided:

These columns must be character vectors, but this restriction does not apply to other extra columns.

newSummarisedResult

The newSummarisedResult() function can be used to create objects. The inputs to this function are the summarised_result table, which must satisfy the conditions specified above, and the settings argument. The settings argument can be NULL or omit some required columns; missing columns will be populated by default and a warning will appear. Let’s see a very simple example:

library(omopgenerics)
library(dplyr)

x <- tibble(
  result_id = 1L,
  cdm_name = "my_cdm",
  group_name = "cohort_name",
  group_level = "cohort1",
  strata_name = "sex",
  strata_level = "male",
  variable_name = "Age group",
  variable_level = "10 to 50",
  estimate_name = "count",
  estimate_type = "numeric",
  estimate_value = "5",
  additional_name = "overall",
  additional_level = "overall"
)

result <- newSummarisedResult(x)
result |>
  glimpse()
#> Rows: 1
#> Columns: 13
#> $ result_id        <int> 1
#> $ cdm_name         <chr> "my_cdm"
#> $ group_name       <chr> "cohort_name"
#> $ group_level      <chr> "cohort1"
#> $ strata_name      <chr> "sex"
#> $ strata_level     <chr> "male"
#> $ variable_name    <chr> "Age group"
#> $ variable_level   <chr> "10 to 50"
#> $ estimate_name    <chr> "count"
#> $ estimate_type    <chr> "numeric"
#> $ estimate_value   <chr> "5"
#> $ additional_name  <chr> "overall"
#> $ additional_level <chr> "overall"
settings(result)
#> # A tibble: 1 × 8
#>   result_id result_type package_name package_version group     strata additional
#>       <int> <chr>       <chr>        <chr>           <chr>     <chr>  <chr>     
#> 1         1 ""          ""           ""              cohort_n… sex    ""        
#> # ℹ 1 more variable: min_cell_count <chr>

We can also associate settings with our results. These will typically be used to explain how the result was created.

result <- newSummarisedResult(
  x = x,
  settings = tibble(
    result_id = 1L,
    package_name = "PatientProfiles",
    study = "my_characterisation_study"
  )
)

result |> glimpse()
#> Rows: 1
#> Columns: 13
#> $ result_id        <int> 1
#> $ cdm_name         <chr> "my_cdm"
#> $ group_name       <chr> "cohort_name"
#> $ group_level      <chr> "cohort1"
#> $ strata_name      <chr> "sex"
#> $ strata_level     <chr> "male"
#> $ variable_name    <chr> "Age group"
#> $ variable_level   <chr> "10 to 50"
#> $ estimate_name    <chr> "count"
#> $ estimate_type    <chr> "numeric"
#> $ estimate_value   <chr> "5"
#> $ additional_name  <chr> "overall"
#> $ additional_level <chr> "overall"
settings(result)
#> # A tibble: 1 × 9
#>   result_id result_type package_name    package_version group  strata additional
#>       <int> <chr>       <chr>           <chr>           <chr>  <chr>  <chr>     
#> 1         1 ""          PatientProfiles ""              cohor… sex    ""        
#> # ℹ 2 more variables: min_cell_count <chr>, study <chr>

Combining summarised results

Multiple summarised result objects can be combined using the bind function. Result IDs will be assigned for each set of results with the same settings. If two groups of results have the same settings, although they are in different objects, they will be merged into a single one.

result1 <- newSummarisedResult(
  x = tibble(
    result_id = 1L,
    cdm_name = "my_cdm",
    group_name = "cohort_name",
    group_level = "cohort1",
    strata_name = "sex",
    strata_level = "male",
    variable_name = "Age group",
    variable_level = "10 to 50",
    estimate_name = "count",
    estimate_type = "numeric",
    estimate_value = "5",
    additional_name = "overall",
    additional_level = "overall"
  ),
  settings = tibble(
    result_id = 1L,
    package_name = "PatientProfiles",
    package_version = "1.0.0",
    study = "my_characterisation_study",
    result_type = "stratified_by_age_group"
  )
)

result2 <- newSummarisedResult(
  x = tibble(
    result_id = 1L,
    cdm_name = "my_cdm",
    group_name = "overall",
    group_level = "overall",
    strata_name = "overall",
    strata_level = "overall",
    variable_name = "overall",
    variable_level = "overall",
    estimate_name = "count",
    estimate_type = "numeric",
    estimate_value = "55",
    additional_name = "overall",
    additional_level = "overall"
  ),
  settings = tibble(
    result_id = 1L,
    package_name = "PatientProfiles",
    package_version = "1.0.0",
    study = "my_characterisation_study",
    result_type = "overall_analysis"
  )
)

Now that we have our results, we can combine them using bind. Because the two sets of results contain the same result ID, this will be automatically updated when the results are combined.

result <- bind(result1, result2)
result |>
  dplyr::glimpse()
#> Rows: 2
#> Columns: 13
#> $ result_id        <int> 1, 2
#> $ cdm_name         <chr> "my_cdm", "my_cdm"
#> $ group_name       <chr> "cohort_name", "overall"
#> $ group_level      <chr> "cohort1", "overall"
#> $ strata_name      <chr> "sex", "overall"
#> $ strata_level     <chr> "male", "overall"
#> $ variable_name    <chr> "Age group", "overall"
#> $ variable_level   <chr> "10 to 50", "overall"
#> $ estimate_name    <chr> "count", "count"
#> $ estimate_type    <chr> "numeric", "numeric"
#> $ estimate_value   <chr> "5", "55"
#> $ additional_name  <chr> "overall", "overall"
#> $ additional_level <chr> "overall", "overall"
settings(result)
#> # A tibble: 2 × 9
#>   result_id result_type     package_name package_version group strata additional
#>       <int> <chr>           <chr>        <chr>           <chr> <chr>  <chr>     
#> 1         1 stratified_by_… PatientProf… 1.0.0           "coh… "sex"  ""        
#> 2         2 overall_analys… PatientProf… 1.0.0           ""    ""     ""        
#> # ℹ 2 more variables: min_cell_count <chr>, study <chr>

Minimum cell count suppression

We have an entire vignette explaining how the summarised_result object is suppressed: vignette("suppression", "omopgenerics").

Export and import summarised results

The summarised_result object can be exported and imported as a CSV file with the following functions:

Note that exportSummarisedResult() also suppresses the results.

x <- tempdir()
files <- list.files(x)

exportSummarisedResult(result, path = x, fileName = "result.csv")
setdiff(list.files(x), files)
#> [1] "result.csv"

Note that the settings are included in the CSV file:

#> "result_id","cdm_name","group_name","group_level","strata_name","strata_level","variable_name","variable_level","estimate_name","estimate_type","estimate_value","additional_name","additional_level" "1","my_cdm","cohort_name","cohort1","sex","male","Age group","10 to 50","count","numeric","5","overall","overall" "2","my_cdm","overall","overall","overall","overall","overall","overall","count","numeric","55","overall","overall" "3","my_cdm","overall","overall","log_id","1","Log file created",NA,"date_time","character","2026-06-17 14:08:11","overall","overall" "3","my_cdm","overall","overall","log_id","1","Log file created",NA,"elapsed_time","integer","0","overall","overall" "3","my_cdm","overall","overall","log_id","2","Defining toy data",NA,"date_time","character","2026-06-17 14:08:11","overall","overall" "3","my_cdm","overall","overall","log_id","2","Defining toy data",NA,"elapsed_time","integer","0","overall","overall" "3","my_cdm","overall","overall","log_id","3","Summarise toy data",NA,"date_time","character","2026-06-17 14:08:11","overall","overall" "3","my_cdm","overall","overall","log_id","3","Summarise toy data",NA,"elapsed_time","integer","0","overall","overall" "3","my_cdm","overall","overall","log_id","4","Exporting log file",NA,"date_time","character","2026-06-17 14:08:11","overall","overall" "3","my_cdm","overall","overall","log_id","4","Exporting log file",NA,"elapsed_time","integer","2","overall","overall" "3","my_cdm","overall","overall","log_id","5","Exporting log file",NA,"date_time","character","2026-06-17 14:08:13","overall","overall" "1",NA,"overall","overall","overall","overall","settings",NA,"result_type","character","stratified_by_age_group","overall","overall" "1",NA,"overall","overall","overall","overall","settings",NA,"package_name","character","PatientProfiles","overall","overall" "1",NA,"overall","overall","overall","overall","settings",NA,"package_version","character","1.0.0","overall","overall" "1",NA,"overall","overall","overall","overall","settings",NA,"group","character","cohort_name","overall","overall" "1",NA,"overall","overall","overall","overall","settings",NA,"strata","character","sex","overall","overall" "1",NA,"overall","overall","overall","overall","settings",NA,"additional","character","","overall","overall" "1",NA,"overall","overall","overall","overall","settings",NA,"min_cell_count","character","5","overall","overall" "1",NA,"overall","overall","overall","overall","settings",NA,"study","character","my_characterisation_study","overall","overall" "2",NA,"overall","overall","overall","overall","settings",NA,"result_type","character","overall_analysis","overall","overall" "2",NA,"overall","overall","overall","overall","settings",NA,"package_name","character","PatientProfiles","overall","overall" "2",NA,"overall","overall","overall","overall","settings",NA,"package_version","character","1.0.0","overall","overall" "2",NA,"overall","overall","overall","overall","settings",NA,"group","character","","overall","overall" "2",NA,"overall","overall","overall","overall","settings",NA,"strata","character","","overall","overall" "2",NA,"overall","overall","overall","overall","settings",NA,"additional","character","","overall","overall" "2",NA,"overall","overall","overall","overall","settings",NA,"min_cell_count","character","5","overall","overall" "2",NA,"overall","overall","overall","overall","settings",NA,"study","character","my_characterisation_study","overall","overall" "3",NA,"overall","overall","overall","overall","settings",NA,"result_type","character","summarise_log_file","overall","overall" "3",NA,"overall","overall","overall","overall","settings",NA,"package_name","character","omopgenerics","overall","overall" "3",NA,"overall","overall","overall","overall","settings",NA,"package_version","character","1.4.0","overall","overall" "3",NA,"overall","overall","overall","overall","settings",NA,"group","character","","overall","overall" "3",NA,"overall","overall","overall","overall","settings",NA,"strata","character","log_id","overall","overall" "3",NA,"overall","overall","overall","overall","settings",NA,"additional","character","","overall","overall" "3",NA,"overall","overall","overall","overall","settings",NA,"min_cell_count","character","5","overall","overall"

You can later import the results back with importSummarisedResult():

res <- importSummarisedResult(path = file.path(x, "result.csv"))
class(res)
#> [1] "summarised_result" "omop_result"       "tbl_df"           
#> [4] "tbl"               "data.frame"
res |>
  glimpse()
#> Rows: 11
#> Columns: 13
#> $ result_id        <int> 1, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3
#> $ cdm_name         <chr> "my_cdm", "my_cdm", "my_cdm", "my_cdm", "my_cdm", "my…
#> $ group_name       <chr> "cohort_name", "overall", "overall", "overall", "over…
#> $ group_level      <chr> "cohort1", "overall", "overall", "overall", "overall"…
#> $ strata_name      <chr> "sex", "overall", "log_id", "log_id", "log_id", "log_…
#> $ strata_level     <chr> "male", "overall", "1", "1", "2", "2", "3", "3", "4",…
#> $ variable_name    <chr> "Age group", "overall", "Log file created", "Log file…
#> $ variable_level   <chr> "10 to 50", "overall", NA, NA, NA, NA, NA, NA, NA, NA…
#> $ estimate_name    <chr> "count", "count", "date_time", "elapsed_time", "date_…
#> $ estimate_type    <chr> "numeric", "numeric", "character", "integer", "charac…
#> $ estimate_value   <chr> "5", "55", "2026-06-17 14:08:11", "0", "2026-06-17 14…
#> $ additional_name  <chr> "overall", "overall", "overall", "overall", "overall"…
#> $ additional_level <chr> "overall", "overall", "overall", "overall", "overall"…
res |>
  settings()
#> # A tibble: 3 × 9
#>   result_id result_type     package_name package_version group strata additional
#>       <int> <chr>           <chr>        <chr>           <chr> <chr>  <chr>     
#> 1         1 stratified_by_… PatientProf… 1.0.0           "coh… "sex"  ""        
#> 2         2 overall_analys… PatientProf… 1.0.0           ""    ""     ""        
#> 3         3 summarise_log_… omopgenerics 1.4.0           ""    "log_… ""        
#> # ℹ 2 more variables: min_cell_count <chr>, study <chr>

Tidy a <summarised_result>

Tidy method

omopgenerics defines a tidy() method for <summarised_result> objects. This function:

1. Split group, strata, and additional pairs into separate columns:

The <summarised_result> object has the following pair columns: group_name-group_level, strata_name-strata_level, and additional_name-additional_level. These pairs use the &&& separator to combine multiple fields. For example, if you want to combine cohort_name and age_group in the group_name-group_level pair: group_name = "cohort_name &&& age_group" and group_level = "my_cohort &&& <40". By default, if no aggregation is produced in the group_name-group_level pair: group_name = "overall" and group_level = "overall".

ORIGINAL FORMAT:

group_name group_level
cohort_name acetaminophen
cohort_name &&& sex acetaminophen &&& Female
sex &&& age_group Male &&& <40

The tidy format puts each value into its own column. This makes it easier to manipulate, but the output is no longer standardised, as each <summarised_result> object will have a different number and set of column names. Missing values will be filled with the “overall” label.

TIDY FORMAT:

cohort_name sex age_group
acetaminophen overall overall
acetaminophen Female overall
overall Male <40

2. Add settings of the <summarised_result> object as columns:

Each <summarised_result> object has a settings attribute that relates the ‘result_id’ column to each different set of settings. The columns ‘result_type’, ‘package_name’ and ‘package_version’ are always present in settings, but we may also have extra parameters depending on how the object was created. In the <summarised_result> format, we need to use the settings() function to see those variables:

ORIGINAL FORMAT:

settings:

result_id my_setting package_name
1 TRUE omopgenerics
2 FALSE omopgenerics

<summarised_result>:

result_id cdm_name additional_name
1 omop ... overall
... ... ... ...
2 omop ... overall
... ... ... ...

In the tidy format, we add the settings as columns, so their values are repeated multiple times (there is only one row per result_id in settings, whereas there can be multiple rows in the <summarised_result> object). The column ‘result_id’ is removed because it no longer provides information. Again, we lose standardisation (multiple different settings), but we gain flexibility:

TIDY FORMAT:

cdm_name additional_name my_setting package_name
omop ... overall TRUE omopgenerics
... ... ... ... ...
omop ... overall FALSE omopgenerics
... ... ... ... ...

3. Pivot estimates as columns:

In the <summarised_result> format estimates are displayed in 3 columns:

  • ‘estimate_name’ indicates the name of the estimate.
  • ‘estimate_type’ indicates the type of the estimate (as all of them will be cast to character). Possible values are: numeric, integer, date, character, proportion, percentage, logical.
  • ‘estimate_value’ value of the estimate as <character>.

ORIGINAL FORMAT:

variable_name estimate_name estimate_type estimate_value
number individuals count integer 100
age mean numeric 50.3
age sd numeric 20.7

In the tidy format, we pivot the estimates, creating a new column for each ‘estimate_name’ value. The columns will be cast to ‘estimate_type’. If there are multiple estimate_type values for the same estimate_name, they will not be cast and will be displayed as character values (a warning will be thrown). Missing data are populated with NAs.

TIDY FORMAT:

variable_name count mean sd
number individuals 100 NA NA
age NA 50.3 20.7

Example

Let’s see a simple example with some toy data:

result |>
  tidy()
#> # A tibble: 2 × 7
#>   cdm_name cohort_name sex     variable_name variable_level count study         
#>   <chr>    <chr>       <chr>   <chr>         <chr>          <dbl> <chr>         
#> 1 my_cdm   cohort1     male    Age group     10 to 50           5 my_characteri…
#> 2 my_cdm   overall     overall overall       overall           55 my_characteri…

Split

The split functions are provided independently:

There is also the function: - splitAll() that splits any x_name-x_level pair found in the data.

splitAll(result)
#> # A tibble: 2 × 9
#>   result_id cdm_name cohort_name sex     variable_name variable_level
#>       <int> <chr>    <chr>       <chr>   <chr>         <chr>         
#> 1         1 my_cdm   cohort1     male    Age group     10 to 50      
#> 2         2 my_cdm   overall     overall overall       overall       
#> # ℹ 3 more variables: estimate_name <chr>, estimate_type <chr>,
#> #   estimate_value <chr>

Pivot estimates

pivotEstimates() can be used to pivot the variables that we are interested in.

The argument pivotEstimatesBy specifies which variables we want to use to pivot by. There are four options:

Note that variable_level can contain NA values, these will be ignored on the naming part.

pivotEstimates(
  result,
  pivotEstimatesBy = c("variable_name", "variable_level", "estimate_name")
)
#> # A tibble: 2 × 10
#>   result_id cdm_name group_name  group_level strata_name strata_level
#>       <int> <chr>    <chr>       <chr>       <chr>       <chr>       
#> 1         1 my_cdm   cohort_name cohort1     sex         male        
#> 2         2 my_cdm   overall     overall     overall     overall     
#> # ℹ 4 more variables: additional_name <chr>, additional_level <chr>,
#> #   `Age group_10 to 50_count` <dbl>, overall_overall_count <dbl>

Add settings

addSettings() is used to add the settings that we want as new columns to our <summarised_result> object.

The settingsColumn argument is used to choose which settings we want to add.

addSettings(
  result,
  settingsColumn = "result_type"
)
#> # A tibble: 2 × 14
#>   result_id cdm_name group_name  group_level strata_name strata_level
#>       <int> <chr>    <chr>       <chr>       <chr>       <chr>       
#> 1         1 my_cdm   cohort_name cohort1     sex         male        
#> 2         2 my_cdm   overall     overall     overall     overall     
#> # ℹ 8 more variables: variable_name <chr>, variable_level <chr>,
#> #   estimate_name <chr>, estimate_type <chr>, estimate_value <chr>,
#> #   additional_name <chr>, additional_level <chr>, result_type <chr>

Filter

Dealing with an <summarised_result> object can be difficult, especially when we are trying to filter. For example, it can be difficult to filter to a certain result_type or, when many strata are joined together, to filter only one of the variables. The tidy format makes filtering easier, but using it means losing the <summarised_result> object.

The omopgenerics package contains functions that help with this process:

For instance, let’s filter result so it only has results for males:

result |>
  filterStrata(sex == "male")
#> # A tibble: 1 × 13
#>   result_id cdm_name group_name  group_level strata_name strata_level
#>       <int> <chr>    <chr>       <chr>       <chr>       <chr>       
#> 1         1 my_cdm   cohort_name cohort1     sex         male        
#> # ℹ 7 more variables: variable_name <chr>, variable_level <chr>,
#> #   estimate_name <chr>, estimate_type <chr>, estimate_value <chr>,
#> #   additional_name <chr>, additional_level <chr>

Now let’s see an example using the information in settings to filter the result. In this case, we only want results from the “overall_analysis”. Since this information is in the result_type column in settings, we proceed as follows:

result |>
  filterSettings(result_type == "overall_analysis")
#> # A tibble: 1 × 13
#>   result_id cdm_name group_name group_level strata_name strata_level
#>       <int> <chr>    <chr>      <chr>       <chr>       <chr>       
#> 1         2 my_cdm   overall    overall     overall     overall     
#> # ℹ 7 more variables: variable_name <chr>, variable_level <chr>,
#> #   estimate_name <chr>, estimate_type <chr>, estimate_value <chr>,
#> #   additional_name <chr>, additional_level <chr>

Utility functions for <summarised_result>

Column retrieval functions

Working with <summarised_result> objects often involves managing columns for settings, grouping, strata, and additional levels. These retrieval functions help you identify and manage columns:

Let’s see the different values with our example result data:

settingsColumns(result)
#> [1] "study"
groupColumns(result)
#> [1] "cohort_name"
strataColumns(result)
#> [1] "sex"
additionalColumns(result)
#> character(0)
tidyColumns(result)
#> [1] "cdm_name"       "cohort_name"    "sex"            "variable_name" 
#> [5] "variable_level" "count"          "study"

Unite functions

The unite functions serve as the complementary tools to the split functions, allowing you to generate name-level pair columns from targeted columns within a <dataframe>.

There are three unite functions that allow you to create group, strata, and additional name-level columns from specified sets of columns:

Example

For example, to create group_name and group_level columns from a tibble, you can use:

# Create and show mock data
data <- tibble(
  denominator_cohort_name = c("general_population", "older_than_60", "younger_than_60"),
  outcome_cohort_name = c("stroke", "stroke", "stroke")
)
head(data)
#> # A tibble: 3 × 2
#>   denominator_cohort_name outcome_cohort_name
#>   <chr>                   <chr>              
#> 1 general_population      stroke             
#> 2 older_than_60           stroke             
#> 3 younger_than_60         stroke

# Unite into group name-level columns
data |>
  uniteGroup(cols = c("denominator_cohort_name", "outcome_cohort_name"))
#> # A tibble: 3 × 2
#>   group_name                                      group_level                  
#>   <chr>                                           <chr>                        
#> 1 denominator_cohort_name &&& outcome_cohort_name general_population &&& stroke
#> 2 denominator_cohort_name &&& outcome_cohort_name older_than_60 &&& stroke     
#> 3 denominator_cohort_name &&& outcome_cohort_name younger_than_60 &&& stroke

These functions can be helpful when creating your own <summarised_result>.

mirror server hosted at Truenetwork, Russian Federation.