| Title: | Unified Framework for Data Quality Control | 
| Version: | 0.1.0 | 
| Maintainer: | Luis Garcez <luisgarcez1@gmail.com> | 
| Description: | An easy framework to set a quality control workflow on a dataset. Includes a various range of functions that allow to establish an adaptable data quality control. | 
| Imports: | dplyr, stringr, janitor, openxlsx, readxl | 
| License: | MIT + file LICENSE | 
| Encoding: | UTF-8 | 
| LazyData: | true | 
| RoxygenNote: | 7.1.1 | 
| URL: | https://github.com/luisgarcez11/qualitycontrol | 
| BugReports: | https://github.com/luisgarcez11/qualitycontrol/issues | 
| Suggests: | knitr, rmarkdown, testthat | 
| Depends: | R (≥ 2.10) | 
| VignetteBuilder: | knitr | 
| NeedsCompilation: | no | 
| Packaged: | 2022-11-25 13:16:49 UTC; jjferreira-admin | 
| Author: | Luis Garcez  | 
| Repository: | CRAN | 
| Date/Publication: | 2022-11-28 09:30:02 UTC | 
Amyotrophic lateral sclerosis Example dataset
Description
An Amyotrophic lateral sclerosis related example dataset.
Usage
als_data
Format
A list
subjidSubject ID
p1ALSFRS-R 1
p2ALSFRS-R 2
p3ALSFRS-R 3
p4ALSFRS-R 4
p5ALSFRS-R 5
p6ALSFRS-R 6
p7ALSFRS-R 7
p8ALSFRS-R 8
p9ALSFRS-R 9
x1rALSFRS-R R1
x2rALSFRS-R R2
x3rALSFRS-R R3
age_at_baselineAge at baseline
age_at_onsetAge at onsite
onsetRegion of onset
baseline_dateBaseline date3
death_dateDeath date
An example dataset containing a Quality Control mapping
Description
An example dataset containing a Quality Control mapping
Usage
als_data_qc_mapping
Format
A list of 3 tibbles.
missingTable with all the 'missing' tests.
inconsistenciesTable with all the 'inconsistencies' tests.
rangeTable with all the 'out of range' tests.
QC dataset using a specific variable mapping
Description
QC dataset using a specific variable mapping
Usage
qc_data(data, qc_mapping, output_file = NULL)
Arguments
data | 
 A data frame, data frame extension (e.g. a   | 
qc_mapping | 
 A list of data frame or data frame extension (e.g. a   | 
output_file | 
 (optional) File path ended in   | 
Value
A data frame containing all the findings.
Examples
qc_data(als_data, als_data_qc_mapping)
Read Quality Control mapping file
Description
read_qc_mapping reads an .xlsx file that contains
the QC mapping.
Usage
read_qc_mapping(path)
Arguments
path | 
 excel file path to be read. Each tab should contain 3 tabs with the names missing, inconsistencies and range. Each tab will correspond to one QC mapping table. QC mapping  
 The columns specified above should contain specific values: 
  | 
Value
A list containing all the QC mapping tables
Test if variable values are duplicated
Description
Test if variable values are duplicated
Usage
test_duplicated(data, variable)
Arguments
data | 
 data to be tested.  | 
variable | 
 The variable to be tested.  | 
Value
A data frame containing all the findings regarding the applied test.
Examples
test_duplicated(als_data, 'subjid')
Test the inconsistencies between variables on a dataset
Description
Test the inconsistencies between variables on a dataset
Usage
test_inconsistencies(data, variable1, variable2, relation)
Arguments
data | 
 data to be tested.  | 
variable1 | 
 The variable to be tested.  | 
variable2 | 
 The variable to be tested.  | 
relation | 
 String such as 'greater_than', 'greater_than_or_equal' 'lower_than_or_equal' and 'lower_than'.  | 
Value
A data frame containing all the findings regarding the applied test.
Examples
test_inconsistencies(als_data, 'baseline_date', 'death_date', relation = 'lower_than')
test_inconsistencies(als_data, 'age_at_baseline', 'age_at_onset', relation = 'greater_than')
Test the variable missingness on a dataset
Description
Test the variable missingness on a dataset
Usage
test_missing(data, variable)
Arguments
data | 
 data to be tested.  | 
variable | 
 The variable to be tested.  | 
Value
A data frame containing all the findings regarding the applied test.
Examples
test_missing(als_data, 'p8')
test_missing(als_data, 'p1')
Test the range of a variable on a dataset
Description
Test the range of a variable on a dataset
Usage
test_range(
  data,
  variable,
  type,
  categories = NULL,
  lower_value = NULL,
  upper_value = NULL
)
Arguments
data | 
 data to be tested.  | 
variable | 
 The variable to be tested.  | 
type | 
 String such as 'categorical', 'date' or 'numeric'  | 
categories | 
 Only to be filled if   | 
lower_value | 
 Only to be filled if   | 
upper_value | 
 Only to be filled if   | 
Value
A data frame containing all the findings regarding the applied test.
Examples
test_range(als_data, 'onset', c('bulbar','respiratory', 'spinal'), type = 'categorical')
test_range(als_data, 'age_at_baseline', lower_value = 20, upper_value = 100, 
type = 'numeric')
test_range(als_data, 'age_at_onset', lower_value = 20, upper_value = 100,
type = 'numeric')
test_range(als_data, 'baseline_date', lower_value = '2000-01-01', upper_value = '2022-01-01', 
type = 'date')
test_range(als_data, 'death_date', lower_value = '2000-01-01', upper_value = '2022-01-01',
 type = 'date')