cthist

This package provides functions for mass-downloading and interpreting historical clinical trial registry entry data.

How to install

To install the stable version of cthist through CRAN:

install.packages("cthist")
library(cthist)

If you want the most recent development version of cthist, you will need to install devtools first, and then install via git:

install.packages("devtools")
library(devtools)
install_github("bgcarlisle/cthist")
library(cthist)

Functions provided by cthist

This package provides 5 functions for downloading and interpreting historical clinical trial data from ClinicalTrials.gov

Download clinical trial version dates:

## Get all the dates and status updates when the registry entry for
## NCT02110043 changed

clinicaltrials_gov_dates(c("NCT02110043", "NCT03281616"))
## A tibble: 10 × 5
##    nctid       version_number total_versions version_date overall_status       
##    <chr>                <int>          <int> <chr>        <chr>                
##  1 NCT02110043              0              8 2014-04-08   RECRUITING           
##  2 NCT02110043              1              8 2014-09-22   RECRUITING           
##  3 NCT02110043              2              8 2014-10-13   RECRUITING           
##  4 NCT02110043              3              8 2016-03-15   RECRUITING           
##  5 NCT02110043              4              8 2016-12-20   RECRUITING           
##  6 NCT02110043              5              8 2017-07-04   RECRUITING           
##  7 NCT02110043              6              8 2017-07-26   ACTIVE_NOT_RECRUITING
##  8 NCT02110043              7              8 2021-05-20   COMPLETED            
##  9 NCT03281616              0              2 2017-09-11   COMPLETED            
## 10 NCT03281616              1              2 2017-09-18   COMPLETED            

## Get all the dates when NCT02110043 had a change in overall status

clinicaltrials_gov_dates("NCT02110043", status_change_only=TRUE)
## A tibble: 3 × 5
##   nctid       version_number total_versions version_date overall_status       
##   <chr>                <int>          <int> <chr>        <chr>                
## 1 NCT02110043              0              8 2014-04-08   RECRUITING           
## 2 NCT02110043              6              8 2017-07-26   ACTIVE_NOT_RECRUITING
## 3 NCT02110043              7              8 2021-05-20   COMPLETED            

Download clinical trial registry entry version data:

## Get the 4th version of NCT02110043

version_data <- clinicaltrials_gov_version("NCT02110043", 4)

## Get the 2nd item (enrolment) for that version
version_data$enrol
## [1] 22

## Get the 3rd item (enrolment type) for that version
version_data$enroltype
## [1] "ESTIMATED"

Mass-download clinical trial registry entry versions:

## Download all data for all versions of NCT02110043 and store in
## variable `versions`

versions <- clinicaltrials_gov_download("NCT02110043")

Mass-download clinical trial registry entry versions for many trials and save to disk:

## Download all data for all versions of NCT02110043 and NCT03281616
## and save to versions.csv

clinicaltrials_gov_download(c("NCT02110043", "NCT03281616"), "versions.csv")

Extract publications indexed on downloaded trial versions

The function clinicaltrials_gov_download downloads a data frame of versions of a trial’s history, with the references column containing a nested JSON-encoded data frame of the publications that were indexed by ClinicalTrials.gov.

The function extract_publications interprets a data frame of the type returned by clinicaltrials_gov_download and returns a new data frame that contains only publications of the type specified (“RESULT”, “BACKGROUND”, or “DERIVED”).

This function will provide one row for every publication of the type specified that was indexed on ClinicalTrials.gov for every version of the trial registry record contained on the data frame provided.

## Download only the latest clinical trial registry entries for the 
## specified NCT numbers and extract PMID's for indexed RESULT 
## publications

clinicaltrials_gov_download(
  c("NCT05784103", "NCT05780281"), 
  latest=TRUE
) %>%
  extract_publications(type="RESULT") %>%
  select(nctid, pmid)

# A tibble: 2 × 2
  nctid       pmid    
  <chr>       <chr>   
1 NCT05784103 28183823
2 NCT05780281 34928698

Calculate overall status lengths

The function clinicaltrials_gov_download downloads a data frame of versions of a trial’s history, with the overall_status column indicating the status of the trial on the date the entry is updated, which is specified in the version_date column.

The function overall_status_lengths interprets a data frame of the type returned by clinicaltrials_gov_download and returns a new data frame that contains a list of the NCT numbers and all the overall statuses that the trial in question passed through, and for how many days, optionally, within a specified timeframe.

## Download the clinical trial registry entries for the specified NCT
## number(s) and calculate the number of days that each registry entry
## spends in a reported overall status within a prescribed time
## interval of interest (in this case, the years 2020-2022, inclusive)

clinicaltrials_gov_download(
    c("NCT04338971", "NCT03461211")
) %>%
    overall_status_lengths(
        start_date = "2020-01-01",
        end_date = "2022-12-31"
    )

# A tibble: 5 × 3
# Groups:   nctid [2]
  nctid       overall_status          days    
  <chr>       <chr>                   <drtn>  
1 NCT03461211 ACTIVE_NOT_RECRUITING   406 days
2 NCT03461211 COMPLETED               668 days
3 NCT03461211 ENROLLING_BY_INVITATION  21 days
4 NCT04338971 COMPLETED               488 days
5 NCT04338971 WITHHELD                511 days

What data is extracted?

Variable Data type
Version number (0, 1, 2, etc.) Double
Version date (ISO-8601) Date
Overall status Character
Start date Date
Start date precision Character
Primary completion date Date
Primary completion date precision Date
Primary completion date type Character
Enrolment Double
Enrolment type Character
Inclusion and exclusion criteria Character (HTML)
Outcome measures JSON-encoded table
Overall contacts JSON-encoded table
Central contacts JSON-encoded table
Responsible party JSON-encoded table
Lead sponsor JSON-encoded table
Collaborators JSON-encoded table
Locations JSON-encoded table
“Why stopped?” Character
Results posted Logical
References JSON-encoded table
Organization study ID Character
Secondary IDs JSON-encoded table

Note regarding ClinicalTrials.gov July 2023 website re-write

For cthist v >= 2.0.0, the method for downloading has been updated to reflect the new version of ClinicalTrials.gov. Because the data on the updated website are presented differently from the way they were scraped from the old version, there will be some changes. E.g. the overall status field is now in all-caps.

DRKS.de functions deprecated

Update as of 2022-12-11

DRKS.de has recently been updated in a manner that makes scraping data more difficult and so the functions related to DRKS.de have been deprecated, at least temporarily while I assess the changes.

Note on use

Please note that this script is provided under AGPL v 3, and so you may use it for any purpose, however if you modify it, you must provide access to your modified version or you are in violation of the terms of the license.

Citing cthist

@Manual{bgcarlisle-cthist,
  title          = {Analysis of Clinical Trial Registry Entry Histories Using the Novel {{R}} Package cthist},
  author         = {Carlisle, Benjamin Gregory},
  date           = {2022-07-01},
  journaltitle   = {PLOS ONE},
  shortjournal   = {PLOS ONE},
  volume         = {17},
  number         = {7},
  pages          = {e0270909},
  publisher      = {{Public Library of Science}},
  issn           = {1932-6203},
  doi            = {10.1371/journal.pone.0270909},
  url            = {https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0270909}
}

Please open an issue in the issue tracker above if you find a bug, need this package to download some historical trial data that it currently does not capture, or if you would like to collaborate on a project that uses this tool.

If you used my package in your research and you found it useful, I would take it as a kindness if you cited it.

Best,

Benjamin Gregory Carlisle PhD

mirror server hosted at Truenetwork, Russian Federation.