This vignette shows how to prepare <incidence2>
objects from the incidence2 package for use with cfr, using the prepare_data()
method for the <incidence2>
class. If detailed individual-level data are available that include deaths and recoveries, then alternative methods for severity estimation could be used (e.g. directly calculating CFR from the subset cases with a known death outcome). However, there may be situations where only deaths are recorded, in which case the methods described here would provide an option for CFR calculation.
We first load the libraries we require, including cfr, incidence2, outbreaks for linelist data from a simulated ebola outbreak.
Aggregated case data such as the Covid-19 dataset provided by incidence2 can be converted into an <incidence2>
object using incidence2::incidence()
, and then handled by prepare_data()
.
# get data bundled with the {incidence2} package
covid_uk <- covidregionaldataUK
# view the data
head(covid_uk)
#> date region region_code cases_new cases_total deaths_new
#> 1 2020-01-30 East Midlands E12000004 NA NA NA
#> 2 2020-01-30 East of England E12000006 NA NA NA
#> 3 2020-01-30 England E92000001 2 2 NA
#> 4 2020-01-30 London E12000007 NA NA NA
#> 5 2020-01-30 North East E12000001 NA NA NA
#> 6 2020-01-30 North West E12000002 NA NA NA
#> deaths_total recovered_new recovered_total hosp_new hosp_total tested_new
#> 1 NA NA NA NA NA NA
#> 2 NA NA NA NA NA NA
#> 3 NA NA NA NA NA NA
#> 4 NA NA NA NA NA NA
#> 5 NA NA NA NA NA NA
#> 6 NA NA NA NA NA NA
#> tested_total
#> 1 NA
#> 2 NA
#> 3 NA
#> 4 NA
#> 5 NA
#> 6 NA
Note that the grouping structure of this dataset given by the “region” variable is present in the <incidence2>
object.
prepare_data()
respects grouping structure when present, and returns a dataset with one additional column for each grouping variable.
# convert to incidence2 object
covid_uk_incidence <- incidence(
covid_uk,
date_index = "date",
groups = "region",
counts = c("cases_new", "deaths_new"),
count_names_to = "count_variable"
)
#> Warning in incidence(): `cases_new` contains NA values. Consider imputing these
#> and calling `incidence()` again.
# View head of prepared data with NAs retained
# Note that this will cause issues with CFR functions such as cfr_static()
head(
prepare_data(
covid_uk_incidence,
cases_variable = "cases_new",
deaths_variable = "deaths_new"
)
)
#> NAs in cases and deaths are being replaced with 0s: Set `fill_NA = FALSE` to prevent this.
#> date region cases deaths
#> 1 2020-01-30 East Midlands 0 0
#> 2 2020-01-30 East of England 0 0
#> 3 2020-01-30 England 2 0
#> 4 2020-01-30 London 0 0
#> 5 2020-01-30 North East 0 0
#> 6 2020-01-30 North West 0 0
In this example, the “region” column is added to the data, allowing for disease severity to be calculated separately for each region if needed.
Users who wish to override grouping variables in their data are advised to do this when converting their data into an <incidence2>
object, and to be aware of how incidence2 aggregates case and death counts, including how it deals with NA
s; see incidence2::incidence()
for more details.
Users who prepare data while maintaining grouping structure should take care to apply cfr_*()
to their data by group, as cfr_*()
functions cannot currently handle grouped data.