Summarising measurement use in a dataset

Introduction

In this vignette we will see how we can summarise the use of measurement concepts in our dataset as a whole. For our example we’re going to be interested in measurement concepts related to respiratory function and will use the Eunomia synthetic dataset.

First we will connect to the database and create a cdm reference.

library(duckdb)
library(omopgenerics)
library(CDMConnector)
library(dplyr)
con <- dbConnect(duckdb(), dbdir = eunomiaDir())
cdm <- cdmFromCon(
  con = con, cdmSchem = "main", writeSchema = "main", cdmName = "Eunomia"
)
cdm
#> 
#> ── # OMOP CDM reference (duckdb) of Eunomia ────────────────────────────────────
#> • omop tables: person, observation_period, visit_occurrence, visit_detail,
#> condition_occurrence, drug_exposure, procedure_occurrence, device_exposure,
#> measurement, observation, death, note, note_nlp, specimen, fact_relationship,
#> location, care_site, provider, payer_plan_period, cost, drug_era, dose_era,
#> condition_era, metadata, cdm_source, concept, vocabulary, domain,
#> concept_class, concept_relationship, relationship, concept_synonym,
#> concept_ancestor, source_to_concept_map, drug_strength
#> • cohort tables: -
#> • achilles tables: -
#> • other tables: -

Now we’ll create a codelist with measurement concepts.

repiratory_function_codes <- newCodelist(list("respiratory function" = c(4052083, 4133840, 3011505)))
repiratory_function_codes
#> 
#> - respiratory function (3 codes)

For a general summary of the use of these codes in our dataset we can use summariseCodeUse from the CodelistGenerator R package.

library(CodelistGenerator)
code_use <- summariseCodeUse(repiratory_function_codes, cdm)
tableCodeUse(code_use)
Database name
Eunomia
Codelist name Standard concept name Standard concept ID Source concept name Source concept ID Source concept value Domain ID
Estimate name
Record count Person count
respiratory function overall - NA NA NA NA 8,728 2,096
FEV1/FVC 3011505 FEV1/FVC 3011505 19926-5 measurement 2,320 125
Spirometry 4133840 Spirometry 4133840 127783003 measurement 2,320 125
Measurement of respiratory function 4052083 Measurement of respiratory function 4052083 23426006 measurement 4,088 2,072

Although we now have a general summary of the use of our measurement codes, we may well want more information on these measurements to inform study feasibility and design.

MeasurementDiagnostics helps us to perform additional, measurement specific, diagnostic checks. For this we’ll simply call the summariseMeasurementUse() function which will run a series of checks.

library(MeasurementDiagnostics)

repiratory_function_measurements <- summariseMeasurementUse(cdm, repiratory_function_codes)

As with similar packages, our results are returned in the summarised_result format as defined by the omopgenerics package.

repiratory_function_measurements |> 
  glimpse()
#> Rows: 47
#> Columns: 13
#> $ result_id        <int> 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,…
#> $ cdm_name         <chr> "Eunomia", "Eunomia", "Eunomia", "Eunomia", "Eunomia"…
#> $ group_name       <chr> "codelist_name", "codelist_name", "codelist_name", "c…
#> $ group_level      <chr> "respiratory function", "respiratory function", "resp…
#> $ strata_name      <chr> "overall", "overall", "overall", "overall", "overall"…
#> $ strata_level     <chr> "overall", "overall", "overall", "overall", "overall"…
#> $ variable_name    <chr> "number records", "number subjects", "time", "time", …
#> $ variable_level   <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
#> $ estimate_name    <chr> "count", "count", "min", "q25", "median", "q75", "max…
#> $ estimate_type    <chr> "integer", "integer", "numeric", "numeric", "numeric"…
#> $ estimate_value   <chr> "8728", "2096", "0", "0", "371", "1726.25", "33541", …
#> $ additional_name  <chr> "overall", "overall", "overall", "overall", "overall"…
#> $ additional_level <chr> "overall", "overall", "overall", "overall", "overall"…

We can see each of the checks performed.

settings(repiratory_function_measurements) |> 
  pull("result_type") |> 
  unique()
#> [1] "measurement_timings"          "measurement_value_as_numeric"
#> [3] "measurement_value_as_concept"

One of the checks summarises the numeric values associated with tests. We can quickly create a table summarising these results.

tableMeasurementValueAsNumeric(repiratory_function_measurements)
CDM name Concept name Concept ID Domain ID Unit concept name Unit concept ID Estimate name Estimate value
respiratory function
Eunomia overall overall overall No matching concept 0 N 8,728
Median [Q25 - Q75] -
Range -
Missing value, N (%) 8,728 (100.00%)
Measurement of respiratory function 4052083 Measurement No matching concept 0 N 4,088
Median [Q25 - Q75] -
Range -
Missing value, N (%) 4,088 (100.00%)
FEV1/FVC 3011505 Measurement No matching concept 0 N 2,320
Median [Q25 - Q75] -
Range -
Missing value, N (%) 2,320 (100.00%)
Spirometry 4133840 Measurement No matching concept 0 N 2,320
Median [Q25 - Q75] -
Range -
Missing value, N (%) 2,320 (100.00%)

Similarly, we can see a summary of concept values associated with measurements. We can see from this that our respiratory function measurements do not have concept value results (instead having numeric values which we see in the table above).

tableMeasurementValueAsConcept(repiratory_function_measurements)
CDM name Concept name Concept ID Domain ID Variable name Value as concept name Value as concept ID Estimate name Estimate value
respiratory function
Eunomia overall overall overall Value as concept name No matching concept 0 N (%) 8,728 (100.00%)
FEV1/FVC 3011505 Measurement Value as concept name No matching concept 0 N (%) 2,320 (100.00%)
Spirometry 4133840 Measurement Value as concept name No matching concept 0 N (%) 2,320 (100.00%)
Measurement of respiratory function 4052083 Measurement Value as concept name No matching concept 0 N (%) 4,088 (100.00%)

As well as overview of the values of measurements, we can also see a summary of the timing between measurements for individuals in the dataset.

tableMeasurementTimings(repiratory_function_measurements)
CDM name Variable name Estimate name Estimate value
respiratory function
Eunomia Number records N 8,728
Number subjects N 2,096
Time Median [Q25 - Q75] 371.00 [0.00 - 1,726.25]
Range 0.00 to 33,541.00

mirror server hosted at Truenetwork, Russian Federation.