Title: | Functions to Replicate the Center for Disease Control and Prevention's 'LTAS' Software in R |
Version: | 0.1.4 |
Description: | A suite of functions for reading in a rate file in XML format, stratify a cohort, and calculate 'SMRs' from the stratified cohort and rate file. |
License: | MIT + file LICENSE |
Encoding: | UTF-8 |
RoxygenNote: | 7.3.1 |
Imports: | dplyr, knitr, lubridate, magrittr, purrr, readr, rlang, stringr, tidyr, XML, zoo |
Suggests: | rmarkdown, ggplot2, testthat (≥ 3.0.0), R.rsp |
VignetteBuilder: | knitr, R.rsp |
Depends: | R (≥ 2.10) |
LazyData: | true |
Config/testthat/edition: | 3 |
NeedsCompilation: | no |
Packaged: | 2024-08-22 18:12:22 UTC; inh4 |
Author: | Stephen Bertke [aut, cre] |
Maintainer: | Stephen Bertke <sbertke@cdc.gov> |
Repository: | CRAN |
Date/Publication: | 2024-08-22 23:00:02 UTC |
Pipe operator
Description
See magrittr::%>%
for details.
Usage
lhs %>% rhs
Arguments
lhs |
A value or the magrittr placeholder. |
rhs |
A function call using the magrittr semantics. |
Value
The result of calling rhs(lhs)
.
Checks all strata in py_table are contained in rate file
Description
Checks all strata in py_table are contained in rate file
Usage
checkStrata(py_table, rateobj)
Arguments
py_table |
A stratified cohort created by |
rateobj |
A rate object created by |
Value
A list containing:
The py_table with strata removed not found in rateobj
The observations from py_table that were removed
Examples
library(LTASR)
library(dplyr)
library(purrr)
#Import example person file
person <- person_example %>%
mutate(dob = as.Date(dob, format='%m/%d/%Y'),
pybegin = as.Date(pybegin, format='%m/%d/%Y'),
dlo = as.Date(dlo, format='%m/%d/%Y'))
#Import default rate object
rateobj <- us_119ucod_19602021
#Stratify person table
py_table <- get_table(person, rateobj)
#Check Strata are in rate file
checkStrata(py_table, rateobj)
Create exp_strata object
Description
exp_strata()
creates an exp_strata that defines which variable to consider,
any lag to be applied, and cutpoints for the strata.
Usage
exp_strata(var = character(), cutpt = numeric(), lag = 0)
Arguments
var |
character naming the variable within the history data.frame to consider. |
cutpt |
numeric vector defining the cutpoints to use to stratify the calculated cumulative exposure for variable |
lag |
numeric defining the lag, in years, to be applied to exposure variables. Default is 0 yrs (i.e. unlagged). Must be a whole number. |
Value
an object of class exp_strata
to be used in the get_table_history()
.
Examples
library(LTASR)
exp1 <- exp_strata(var = 'employed',
cutpt = c(-Inf, 365, Inf),
lag = 10)
Expand data through range of date values
Description
Expand a data.frame to include all dates between a start and end value defined by parameters x and y
Usage
expand_dates(
df,
start,
end,
md_tmplt = seq(as.Date("1/1/2015", "%m/%d/%Y"), as.Date("12/31/2015",
"%m/%d/%Y"), by = "day")
)
Arguments
df |
Input data.frame |
start |
start date |
end |
end date |
md_tmplt |
Date vector that defines which dates within a year to output. |
Value
A data.frame/tibble containing all variables of the input data.frame
as well as a new variable, date
, with repeated rows for each date between
start
and end
spaced as defined by md_tmplt.
Examples
library(LTASR)
data <- data.frame(id = 1,
start = as.Date('3/1/2015', format='%m/%d/%Y'),
end = as.Date('3/15/2015', format='%m/%d/%Y'))
expand_dates(data, start, end)
Stratify Person Table
Description
get_table
reads in a data.frame/tibble containing basic demographic information
for each person of the cohort and stratifies the person-time and deaths into 5-year age,
5-year calendar period, race, and sex strata. See Details
for information on how the
person file must be formatted.
Usage
get_table(persondf, rateobj, strata = dplyr::vars(), batch_size = 500)
Arguments
persondf |
data.frame like object containing one row per person with the required demographic information |
rateobj |
a rate object created by the |
strata |
any additional variables contained in persondf on which to stratify.
Must be wrapped in a |
batch_size |
a number specifying how many persons to stratify at a time. Default is 500 |
Details
The persondf tibble must contain the variables:
id,
gender (character: 'M'/'F'),
race (character: 'W'/'N'),
dob (date),
pybegin (date),
dlo (date),
vs (character: indicator identifying deaths as 'D')
rev (numeric: values 5-10),
code (character: ICD code)
Value
A data.frame with a row for each strata containing the number of observed
deaths within each of the defined minors/outcomes (_o1
-_oxxx
) and the number of person days.
Examples
library(LTASR)
library(dplyr)
#Import example person file
person <- person_example %>%
mutate(dob = as.Date(dob, format='%m/%d/%Y'),
pybegin = as.Date(pybegin, format='%m/%d/%Y'),
dlo = as.Date(dlo, format='%m/%d/%Y'))
#Import default rate object
rateobj <- us_119ucod_19602021
#Stratify person table
py_table <- get_table(person, rateobj)
Stratify Person Table with Time Varying Co-variate
Description
get_table_history
reads in a data.frame/tibble (persondf
) containing basic demographic information for
each person of the cohort as well as a data.frame/tibble (historydf
) containing time varying exposure
information and stratifies the person-time and deaths into 5-year age, 5-year calendar period, race, sex and
exposure categories. See Details
for information on how the person file and history file must be
formatted.
Usage
get_table_history(
persondf,
rateobj,
historydf,
exps = list(),
strata = dplyr::vars(),
batch_size = 500
)
Arguments
persondf |
data.frame like object containing one row per person with the required demographic information. |
rateobj |
a rate object created by the |
historydf |
data.frame like object containing one row per person and exposure period. An exposure period is a
period of time where exposure levels remain constant. See |
exps |
a list containing exp_strata objects created by |
strata |
any additional variables contained in persondf on which to stratify.
Must be wrapped in a |
batch_size |
a number specifying how many persons to stratify at a time. Default is 500. |
Details
The persondf tibble must contain the variables:
id,
gender (character: 'M'/'F'),
race (character: 'W'/'N'),
dob (date),
pybegin (date),
dlo (date),
rev (numeric: values 5-10),
code (character: ICD code)
The historydf tibble must contain the variables:
id,
begin_dt (date),
end_dt (date),
-
<daily exposure levels>
Value
A data.frame with a row for each strata containing the number of observed
deaths within each of the defined minors/outcomes (_o1
-_oxxx
) and the number of person days.
Examples
library(LTASR)
library(dplyr)
#Import example person file
person <- person_example %>%
mutate(dob = as.Date(dob, format='%m/%d/%Y'),
pybegin = as.Date(pybegin, format='%m/%d/%Y'),
dlo = as.Date(dlo, format='%m/%d/%Y'))
#Import example history file
history <- history_example %>%
mutate(begin_dt = as.Date(begin_dt, format='%m/%d/%Y'),
end_dt = as.Date(end_dt, format='%m/%d/%Y'))
#Import default rate object
rateobj <- us_119ucod_19602021
#Define exposure of interest. Create exp_strata object.The `employed` variable
#indicates (0/1) periods of employment and will be summed each day of each exposure
#period. Therefore, this calculates duration of employment in days. The cut-points
#used below will stratify by person-time with less than and greater than a
#year of employment (365 days of employment).
exp1 <- exp_strata(var = 'employed',
cutpt = c(-Inf, 365, Inf),
lag = 0)
#Stratify cohort by employed variable.
py_table <- get_table_history(persondf = person,
rateobj = rateobj,
historydf = history,
exps = list(exp1))
#Multiple exposures can be considered.
exp1 <- exp_strata(var = 'employed',
cutpt = c(-Inf, 365, Inf),
lag = 0)
exp2 <- exp_strata(var = 'exposure_level',
cutpt = c(-Inf, 0, 10000, 20000, Inf),
lag = 10)
#Stratify cohort by employed variable.
py_table <- get_table_history(persondf = person,
rateobj = rateobj,
historydf = history,
exps = list(exp1, exp2))
Stratify Person Table with Time Varying Co-variate
Description
get_table_history_est
reads in a data.frame/tibble (persondf
) containing basic demographic information for
each person of the cohort as well as a data.frame/tibble (historydf
) containing time varying exposure
information and stratifies the person-time and deaths into 5-year age, 5-year calendar period, race, sex and
exposure categories. Additionally, average cumulative exposure values for each strata and each exposure
variable are included. These strata are more crudely calculated by taking regular steps (such as every 7 days)
as opposed to evaluating every individual day. See Details
for information on how the person file and history file must be
formatted.
Usage
get_table_history_est(
persondf,
rateobj,
historydf,
exps,
strata = dplyr::vars(),
step = 7,
batch_size = 25 * step
)
Arguments
persondf |
data.frame like object containing one row per person with the required demographic information. |
rateobj |
a rate object created by the |
historydf |
data.frame like object containing one row per person and exposure period. An exposure period is a
period of time where exposure levels remain constant. See |
exps |
a list containing exp_strata objects created by |
strata |
any additional variables contained in persondf on which to stratify.
Must be wrapped in a |
step |
numeric defining number of days to jump when calculating cumulative exposure values. Exact stratification specifies a step of 1 day. |
batch_size |
a number specifying how many persons to stratify at a time. |
Details
The persondf tibble must contain the variables:
id,
gender (character: 'M'/'F'),
race (character: 'W'/'N'),
dob (date),
pybegin (date),
dlo (date),
rev (numeric: values 5-10),
code (character: ICD code)
The historydf tibble must contain the variables:
id,
begin_dt (date),
end_dt (date),
-
<daily exposure levels>
Value
A data.frame with a row for each strata containing the number of observed
deaths within each of the defined minors/outcomes (_o1
-_oxxx
) and the number of person days.
Examples
library(LTASR)
library(dplyr)
#Import example person file
person <- person_example %>%
mutate(dob = as.Date(dob, format='%m/%d/%Y'),
pybegin = as.Date(pybegin, format='%m/%d/%Y'),
dlo = as.Date(dlo, format='%m/%d/%Y'))
#Import example history file
history <- history_example %>%
mutate(begin_dt = as.Date(begin_dt, format='%m/%d/%Y'),
end_dt = as.Date(end_dt, format='%m/%d/%Y'))
#Import default rate object
rateobj <- us_119ucod_19602021
#Define exposure of interest. Create exp_strata object.The `employed` variable
#indicates (0/1) periods of employment and will be summed each day of each exposure
#period. Therefore, this calculates duration of employment in days. The cut-points
#used below will stratify by person-time with less than and greater than a
#year of employment (365 days of employment).
exp1 <- exp_strata(var = 'employed',
cutpt = c(-Inf, 365, Inf),
lag = 0)
#Stratify cohort by employed variable.
py_table <- get_table_history_est(persondf = person,
rateobj = rateobj,
historydf = history,
exps = list(exp1))
#Multiple exposures can be considered.
exp1 <- exp_strata(var = 'employed',
cutpt = c(-Inf, 365, Inf),
lag = 0)
exp2 <- exp_strata(var = 'exposure_level',
cutpt = c(-Inf, 0, 10000, 20000, Inf),
lag = 10)
#Stratify cohort by employed variable.
py_table <- get_table_history_est(persondf = person,
rateobj = rateobj,
historydf = history,
exps = list(exp1, exp2))
Example History File for Testing
Description
A tibble containing example history file data to be used for testing and demonstration of the package
Usage
history_example
Format
A data frame with 4 rows and 5 variables:
- id
unique identifier; numeric
- begin_dt
beginning date of an exposure period; character
- end_dt
beginning date of an exposure period; character
- employed
a hypothetical variable indicating employment during the given exposure period; numeric (0/1)
- exposure_level
a hypothetical variable identifying daily exposure levels to be summed to calculate a cumulative exposure; numeric
...
Source
Internally Generated
Map ICD codes to grouped minors
Description
Map ICD codes to grouped minors
Usage
mapDeaths(persondf, rateobj)
Arguments
persondf |
Person data.frame |
rateobj |
A rate object created from |
Value
A data.frame for each death observed in the person file with the following variables:
id, code, rev: from the persondf
minor: the minor/outcome from the rate file that the death was mapped to
Examples
library(LTASR)
#Import example person file
person <- person_example
#Import default rate object
rateobj <- us_119ucod_19602021
#Check mapping of deaths to minors/outcomes
mapDeaths(person, rateobj)
Parses LTAS rate file in .xml format
Description
Parses LTAS rate file in .xml format
Usage
parseRate(xmlpath)
Arguments
xmlpath |
path of LTAS rate file |
Value
returns a list containing:
$residual: the minor number where all unknown deaths will be assigned
$MinorDesc: a data.frame/tibble giving descriptions of minor numbers as well as how minors are mapped to majors
$mapping: a data.frame/tibble listing how each icd-code and revision will be mapped to each minor number
$age_cut: a numeric specifying cut-points for age strata
$cp_cut: a numeric specifying cut-points for calendar period strata
Example Person File for Testing
Description
A tibble containing example person file data to be used for testing and demonstration of the package
Usage
person_example
Format
A tibble with 3 observations and 9 variables:
- id
unique identifier; character
- gender
Gender/Sex; character 'M' or 'F'
- race
Race; character 'W' or 'N'
- dob
Date of Birth; character to be converted to date
- pybegin
date to begin follow-up/at-risk accumulation, character to be converted to date
- dlo
Date last observed; character to be converted to date
- vs
indicator identifying the vital status of the cohort. A value of 'D' indicates an observed death; character
- rev
ICD revision of the ICD code; numeric
- code
ICD-code for the cause of death; character
...
Source
Internally Generated
Calculate SMRs for Custom minor groupings
Description
smr_major
will collapse minor outcomes into "major" groupings as defined in the
rate object, rateobj
.
Usage
smr_custom(smr_minor_table, minor_grouping)
Arguments
smr_minor_table |
A data.frame/tibble as created by |
minor_grouping |
A numeric vector defining which minors to group together |
Value
A data.frame/tibble containing the expected and observed number of deaths as well the SMR, lower CI and upper CI for the outcome by the user
Examples
library(LTASR)
library(dplyr)
#Import example person file
person <- person_example %>%
mutate(dob = as.Date(dob, format='%m/%d/%Y'),
pybegin = as.Date(pybegin, format='%m/%d/%Y'),
dlo = as.Date(dlo, format='%m/%d/%Y'))
#Import default rate object
rateobj <- us_119ucod_19602021
#Stratify person table
py_table <- get_table(person, rateobj)
#Calculate SMRs for all minors
smr_minor_table <- smr_minor(py_table, rateobj)
#Calculate custom minor grouping for all deaths
smr_custom(smr_minor_table, 1:119)
#' #Calculate custom minor grouping for all deaths
smr_custom(smr_minor_table, 4:40)
Calculate SMRs for Major groupings
Description
smr_major
will collapse minor outcomes into "major" groupings as defined in the
rate object, rateobj
.
Usage
smr_major(smr_minor_table, rateobj)
Arguments
smr_minor_table |
A data.frame/tibble as created by |
rateobj |
A rate object created by |
Value
A data.frame/tibble containing the expected and observed number of deaths
as well as SMRs, lower CI and upper CI for each major as defined in the rate object
rateobj
Examples
library(LTASR)
library(dplyr)
#Import example person file
person <- person_example %>%
mutate(dob = as.Date(dob, format='%m/%d/%Y'),
pybegin = as.Date(pybegin, format='%m/%d/%Y'),
dlo = as.Date(dlo, format='%m/%d/%Y'))
#Import default rate object
rateobj <- us_119ucod_19602021
#Stratify person table
py_table <- get_table(person, rateobj)
#Calculate SMRs for all minors
smr_minor_table <- smr_minor(py_table, rateobj)
#Calculate SMRs major groupings found within rate file
smr_major(smr_minor_table, rateobj)
Calculate SMRs for Minors
Description
smr_minor
calculates SMRs for all minor groupings found within the rate
object, rateobj
, for the stratified cohort py_table
Usage
smr_minor(py_table, rateobj)
Arguments
py_table |
A stratified cohort created by |
rateobj |
A rate object created by |
Value
A dataframe/tibble containing the expected and observed number of deaths
as well as SMRs, lower CI and upper CI for each minor found in the rate object
rateobj
Examples
library(LTASR)
library(dplyr)
#Import example person file
person <- person_example %>%
mutate(dob = as.Date(dob, format='%m/%d/%Y'),
pybegin = as.Date(pybegin, format='%m/%d/%Y'),
dlo = as.Date(dlo, format='%m/%d/%Y'))
#Import default rate object
rateobj <- us_119ucod_19602021
#Stratify person table
py_table <- get_table(person, rateobj)
#Calculate SMRs for all minors
smr_minor(py_table, rateobj)
119 UCOD U.S. Death Rate, 1960-2021
Description
A list containing referent underlying cause of death (UCOD) rate information for the US population from 1960-2021 for the 119 minor/outcome LTAS groupings
Usage
us_119ucod_19602021
Format
A list with 4 elements:
- residual
the minor/outcome number to which unknown/uncategorized outcomes will be mapped to
- MinorDesc
a data.frame containing descriptions for each minor and major grouping
- mapping
a tibble detailing which minor number each icd-code and revision combination will be mapped to
- rates
the population referent rate for each minor for each gender/race/calendar period/age strata
...
Source
Available upon request from sbertke@cdc.gov
119 UCOD U.S. Death Rate, 1960-2022
Description
A list containing referent underlying cause of death (UCOD) rate information for the US population from 1960-2022 for the 119 minor/outcome LTAS groupings
Usage
us_119ucod_recent
Format
A list with 4 elements:
- residual
the minor/outcome number to which unknown/uncategorized outcomes will be mapped to
- MinorDesc
a data.frame containing descriptions for each minor and major grouping
- mapping
a tibble detailing which minor number each icd-code and revision combination will be mapped to
- rates
the population referent rate for each minor for each gender/race/calendar period/age strata
...
Source
Available upon request from sbertke@cdc.gov