In this vignette, we will explore the OmopSketch functions
designed to provide information about the number of counts of concepts
in tables. Specifically, there are two key functions that facilitate
this, summariseConceptIdCounts()
and
tableConceptIdCounts()
. The former one creates a summary
statistics results with the number of counts per each concept in the
clinical table, and the latter one displays the result in a table.
Let’s see an example of the previous functions. To start with, we
will load essential packages and create a mock cdm using
mockOmopSketch()
.
library(duckdb)
#> Loading required package: DBI
library(OmopSketch)
library(dplyr)
cdm <- mockOmopSketch()
cdm
#>
#> ── # OMOP CDM reference (duckdb) of mockOmopSketch ─────────────────────────────
#> • omop tables: person, observation_period, cdm_source, concept, vocabulary,
#> concept_relationship, concept_synonym, concept_ancestor, drug_strength,
#> condition_occurrence, death, drug_exposure, measurement, observation,
#> procedure_occurrence, visit_occurrence, device_exposure
#> • cohort tables: -
#> • achilles tables: -
#> • other tables: -
We now use the summariseConceptIdCounts()
function from
the OmopSketch package to retrieve counts for each concept id and name,
as well as for each source concept id and name, across the clinical
tables.
summariseConceptIdCounts(cdm, omopTableName = "drug_exposure") |>
select(group_level, variable_name, variable_level, estimate_name, estimate_value, additional_name, additional_level) |>
glimpse()
#> Rows: 31
#> Columns: 7
#> $ group_level <chr> "drug_exposure", "drug_exposure", "drug_exposure", "d…
#> $ variable_name <chr> "glucagon Nasal Powder [Baqsimi]", "Sisymbrium offici…
#> $ variable_level <chr> "1361368", "1830282", "35604883", "35604884", "374980…
#> $ estimate_name <chr> "count_records", "count_records", "count_records", "c…
#> $ estimate_value <chr> "100", "100", "100", "100", "100", "100", "100", "100…
#> $ additional_name <chr> "source_concept_id", "source_concept_id", "source_con…
#> $ additional_level <chr> "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0"…
By default, the function returns the number of records
(estimate_name == "count_records"
) for each concept_id. To
include counts by person, you can set the countBy
argument
to "person"
or to c("record", "person")
to
obtain both record and person counts.
summariseConceptIdCounts(cdm,
omopTableName = "drug_exposure",
countBy = c("record", "person")
) |>
select( variable_name, estimate_name, estimate_value)
#> # A tibble: 62 × 3
#> variable_name estimate_name estimate_value
#> <chr> <chr> <chr>
#> 1 glucagon Nasal Powder [Baqsimi] count_records 100
#> 2 glucagon Nasal Powder [Baqsimi] count_subjec… 63
#> 3 Sisymbrium officianale whole extract 10 MG Nasa… count_records 100
#> 4 Sisymbrium officianale whole extract 10 MG Nasa… count_subjec… 63
#> 5 sumatriptan Nasal Powder [Onzetra] count_records 100
#> 6 sumatriptan Nasal Powder [Onzetra] count_subjec… 60
#> 7 sumatriptan 11 MG Nasal Powder [Onzetra] count_records 100
#> 8 sumatriptan 11 MG Nasal Powder [Onzetra] count_subjec… 59
#> 9 Bos taurus catalase preparation count_records 100
#> 10 Bos taurus catalase preparation count_subjec… 64
#> # ℹ 52 more rows
Further stratification can be applied using the
interval
, sex
, and ageGroup
arguments. The interval argument supports “overall” (no time
stratification), “years”, “quarters”, or “months”.
summariseConceptIdCounts(cdm,
omopTableName = "condition_occurrence",
countBy = "person",
interval = "years",
sex = TRUE,
ageGroup = list("<=50" = c(0, 50), ">50" = c(51, Inf))
) |>
select(group_level, strata_level, variable_name, estimate_name, additional_level) |>
glimpse()
#> Rows: 1,289
#> Columns: 5
#> $ group_level <chr> "condition_occurrence", "condition_occurrence", "cond…
#> $ strata_level <chr> "overall", "overall", "overall", "overall", "overall"…
#> $ variable_name <chr> "Manic mood", "Manic symptoms co-occurrent and due to…
#> $ estimate_name <chr> "count_subjects", "count_subjects", "count_subjects",…
#> $ additional_level <chr> "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0"…
We can also filter the clinical table to a specific time window by setting the dateRange argument.
summarisedResult <- summariseConceptIdCounts(cdm,
omopTableName = "condition_occurrence",
dateRange = as.Date(c("1990-01-01", "2010-01-01")))
summarisedResult |>
omopgenerics::settings()|>
glimpse()
#> Rows: 1
#> Columns: 10
#> $ result_id <int> 1
#> $ result_type <chr> "summarise_concept_id_counts"
#> $ package_name <chr> "OmopSketch"
#> $ package_version <chr> "0.4.0"
#> $ group <chr> "omop_table"
#> $ strata <chr> ""
#> $ additional <chr> "source_concept_id"
#> $ min_cell_count <chr> "0"
#> $ study_period_end <chr> "2010-01-01"
#> $ study_period_start <chr> "1990-01-01"
Finally, you can summarise concept counts on a subset of records by
specifying the sample
argument.
summariseConceptIdCounts(cdm,
omopTableName = "condition_occurrence",
sample = 50) |>
select(group_level, variable_name, estimate_name) |>
glimpse()
#> Rows: 6
#> Columns: 3
#> $ group_level <chr> "condition_occurrence", "condition_occurrence", "conditi…
#> $ variable_name <chr> "Elevated mood", "Victim of vehicular AND/OR traffic acc…
#> $ estimate_name <chr> "count_records", "count_records", "count_records", "coun…
Finally, concept counts can be visualised using
tableConceptIdCounts()
. By default, it generates an
interactive reactable
table, but DT datatables are
also supported.
result <- summariseConceptIdCounts(cdm,
omopTableName = "measurement",
countBy = "record"
)
tableConceptIdCounts(result, type = "reactable")
tableConceptIdCounts(result, type = "datatable")
The display argument in tableConceptIdCounts() controls which concept
counts are shown. Available options include
display = "overall"
. It is the default option and it shows
both standard and source concept counts.
tableConceptIdCounts(result, display = "overall")
If display = "standard"
the table shows only
standard concept_id and concept_name counts.
tableConceptIdCounts(result, display = "standard")
If display = "source"
the table shows only
source concept_id and concept_name counts.
tableConceptIdCounts(result, display = "source")
#> Warning: Values from `estimate_value` are not uniquely identified; output will contain
#> list-cols.
#> • Use `values_fn = list` to suppress this warning.
#> • Use `values_fn = {summary_fun}` to summarise duplicates.
#> • Use the following dplyr code to identify duplicates.
#> {data} |>
#> dplyr::summarise(n = dplyr::n(), .by = c(cdm_name, group_level,
#> source_concept_id, result_id, group_name, estimate_type, estimate_name)) |>
#> dplyr::filter(n > 1L)
If display = "missing source"
the table shows only
counts for concept ids that are missing a corresponding source concept
id.
tableConceptIdCounts(result, display = "missing source")
If display = "missing standard"
the table shows only
counts for source concept ids that are missing a mapped standard concept
id.
tableConceptIdCounts(result, display = "missing standard")
#> Warning: `result` does not contain any `summarise_concept_id_counts` data.