Title: | Make and Apply Customized Rounding Specifications for Tables |
Version: | 0.0.5 |
Description: | Translate double and integer valued data into character values formatted for tabulation in manuscripts or other types of academic reports. |
License: | MIT + file LICENSE |
Encoding: | UTF-8 |
LazyData: | true |
RoxygenNote: | 7.3.2 |
Suggests: | testthat, covr, knitr, rmarkdown, magrittr, dplyr, tidyr, tibble, gt |
Imports: | glue, stringi |
URL: | https://github.com/bcjaeger/table.glue |
BugReports: | https://github.com/bcjaeger/table.glue/issues |
VignetteBuilder: | knitr |
Depends: | R (≥ 2.10) |
NeedsCompilation: | no |
Packaged: | 2024-11-21 19:34:44 UTC; bjaeger |
Author: | Byron Jaeger |
Maintainer: | Byron Jaeger <bjaeger@wakehealth.edu> |
Repository: | CRAN |
Date/Publication: | 2024-11-21 21:20:02 UTC |
table.glue: Make and Apply Customized Rounding Specifications for Tables
Description
Translate double and integer valued data into character values formatted for tabulation in manuscripts or other types of academic reports.
Author(s)
Maintainer: Byron Jaeger bjaeger@wakehealth.edu (ORCID)
See Also
Useful links:
Convert table data to inline list
Description
Convert table data to inline list
Usage
as_inline(data, tbl_variables, tbl_values)
Arguments
data |
a data frame. |
tbl_variables |
column names that will be used to form groups in the table |
tbl_values |
column names that contains table values. |
Value
a list of tbl_values
values for each permutation of tbl_variables
Note
variables in tbl_variables
that have missing values will be
have their missing values converted into an explicit category named
variable_missing, where 'variable' is the name of the variable.
Examples
example_data <- data.frame(
sex = c("female", "male"),
height = c("158 (154 - 161)", "178 (175 - 188)")
)
as_inline(example_data, tbl_variables = 'sex', tbl_values = 'height')
car_data <- mtcars
car_data$car_name <- rownames(mtcars)
as_inline(car_data, tbl_variables = 'car_name', tbl_values = 'mpg')
Bracket helpers
Description
If you have table values that take the form point estimate (uncertainty estimate), you can use these functions to access specific parts of the table value.
Usage
bracket_drop(x, bracket_left = "(", bracket_right = ")")
bracket_extract(
x,
bracket_left = "(",
bracket_right = ")",
drop_bracket = FALSE
)
bracket_insert_left(x, string, bracket_left = "(", bracket_right = ")")
bracket_insert_right(x, string, bracket_left = "(", bracket_right = ")")
bracket_point_estimate(x, bracket_left = "(", bracket_right = ")")
bracket_lower_bound(
x,
bracket_left = "(",
separator = ",",
bracket_right = ")"
)
bracket_upper_bound(
x,
bracket_left = "(",
separator = ",",
bracket_right = ")"
)
Arguments
x |
a character vector where each value contains a point estimate and confidence limits. |
bracket_left |
a character value specifying what symbol is used to bracket the left hand side of the confidence interval |
bracket_right |
a character value specifying what symbol is used to bracket the right hand side of the confidence interval |
drop_bracket |
a logical value ( |
string |
a character value of a string that will be inserted into the left or right side of the bracket. |
separator |
a character value specifying what symbol is used to separate the lower and upper bounds of the interval. |
Value
a character value with length equal to the length of x
.
Examples
tbl_value <- "12.1 (95% CI: 9.1, 15.1)"
bracket_drop(tbl_value)
bracket_point_estimate(tbl_value)
bracket_extract(tbl_value, drop_bracket = TRUE)
bracket_lower_bound(tbl_value)
bracket_upper_bound(tbl_value)
Format values left of decimal
Description
Values to the left of the decimal are generally called 'big' since they
are larger than values to the right of the decimal. format_big()
lets you update the settings of a rounding_specification
object
(see round_spec) so that values left of the decimal will be printed
with a specific format (see examples).
Usage
format_big(rspec, mark = ",", interval = 3L)
Arguments
rspec |
a |
mark |
a character value used to separate number groups to the left of the decimal point. See prettyNum for more details on this. Set this input to ” to negate it's effect. |
interval |
a numeric value indicating the size of number groups for numbers left of the decimal. |
Value
an object of class rounding_specification
.
Examples
big_x <- 1234567
rspec <- format_big(round_spec(), mark = '|', interval = 3)
table_value(big_x, rspec) # returns "1|234|567"
Format decimal symbol
Description
format_decimal()
lets you update the settings of a
rounding_specification
object (see round_spec) so that
the decimal is represented by a user-specified mark.
Usage
format_decimal(rspec, mark = ".")
Arguments
rspec |
a |
mark |
a character value used to represent the decimal point. |
Value
an object of class rounding_specification
.
See Also
Other formatting helpers:
format_small()
Examples
small_x <- 0.1234567
rspec <- round_spec()
rspec <- round_using_decimal(rspec, digits = 7)
rspec <- format_decimal(rspec, mark = '*')
table_value(small_x, rspec)
Format missing values
Description
format_missing()
updates a rounding_specification
object so that
missing values are printed as the user specifies.
Usage
format_missing(rspec, replace_na_with)
Arguments
rspec |
a |
replace_na_with |
a character value that replaces missing values. |
Value
an object of class rounding_specification
.
Examples
rspec <- round_spec()
rspec <- format_missing(rspec, 'oh no!')
table_value(x = c(pi, NA), rspec)
Format values right of decimal
Description
Values to the right of the decimal are generally called 'small' since they
are smaller than values to the left of the decimal. format_small()
lets you update the settings of a rounding_specification
object
(see round_spec) so that values right of the decimal will be printed
with a specific format (see examples).
Usage
format_small(rspec, mark = "", interval = 5L)
Arguments
rspec |
a |
mark |
a character value used to separate number groups to the right of the decimal point. See prettyNum for more details on this. Set this input to ” to negate it's effect. |
interval |
a numeric value indicating the size of number groups for numbers left of the decimal. |
Value
an object of class rounding_specification
.
See Also
Other formatting helpers:
format_decimal()
Examples
small_x <- 0.1234567
rspec <- round_spec()
rspec <- round_using_decimal(rspec, digits = 7)
rspec <- format_small(rspec, mark = '*', interval = 1)
table_value(small_x, rspec)
NHANES blood pressure data
Description
The US National Health and Nutrition Examination Survey (NHANES) was designed to assess the health and nutritional status of the non-institutionalized US population and was conducted by the National Center for Health Statistics of the Centers for Disease Control and Prevention (1). Since 1999-2000, NHANES has been conducted in two-year cycles using a multistage probability sampling design to select participants. Each cycle is independent with different participants recruited.
Usage
nhanes
Format
A data frame with columns:
- exam
NHANES exam: 2013-2014, 2015-2016, or 2017-2018
- seqn
survey participant identifier
- psu
primary sampling unit
- strata
survey strata
- wts_mec_2yr
2 year mobile examination weights
- exam_status
exam status. Participants either completed both the NHANES interview and exam or just the interview.
- age
participant's age, in years
- sex
participant's sex
- race_ethnicity
participant's race and ethnicity
- education
participant's education
- pregnant
pregancy status
- bp_sys_mmhg
participant's systolic blood pressure, mm Hg
- bp_dia_mmhg
participant's diastolic blood pressure, mm Hg
- n_msr_sbp
the number of valid systolic blood pressure readings
- n_msr_dbp
the number of valid diastolic blood pressure readings
- bp_high_aware
was participant ever told they had high blood pressure by a medical professional?
- meds_bp
is participant currently using medication to lower their blood pressure?
Note
Blood pressure measurements
The same protocol was followed to measure systolic and diastolic blood pressure (SBP and DBP) in each NHANES cycle. After survey participants had rested 5 minutes, their BP was measured by a trained physician using a mercury sphygmomanometer and an appropriately sized cuff. Three BP measurements were obtained at 30 second intervals.
Source
NHANES website, https://www.cdc.gov/nchs/nhanes/index.htm
References
National health and nutrition examination survey homepage, available at https://www.cdc.gov/nchs/nhanes/index.htm. Accessed on 09/07/2020.
Examples
nhanes
Set rules for rounding ties
Description
Rounding a number x to the nearest integer requires some tie-breaking
rule for those cases when x is exactly half-way between two integers,
that is, when the fraction part of x is exactly 0.5. The
round_half_up()
function implements a tie-breaking rule that
consistently rounds half units upward. Although this creates a slight
bias toward larger rounded outputs, it is widely used in many disciplines.
The round_half_even()
function breaks ties by rounding to the nearest
even unit.
Usage
round_half_up(rspec)
round_half_even(rspec)
Arguments
rspec |
a |
Value
an object of class rounding_specification
.
See Also
Other rounding helpers:
round_using_magnitude()
Examples
# note base R behavior rounds to even:
round(0.5) # --> 0
round(1.5) # --> 2
round(2.5) # --> 2
# make rspec that rounds up
rspec <- round_half_up(round_spec())
rspec <- round_using_decimal(rspec, digits = 0)
# check
table_value(0.5, rspec) # --> 1
table_value(1.5, rspec) # --> 2
table_value(2.5, rspec) # --> 3
# make rspec that rounds even
rspec <- round_half_even(round_spec())
rspec <- round_using_decimal(rspec, digits = 0)
# check
table_value(0.5, rspec) # --> 0
table_value(1.5, rspec) # --> 2
table_value(2.5, rspec) # --> 2
Make a rounding specification
Description
round_spec()
creates a rounding specification object with default
settings. The settings of a rounding specification object can be
updated using functions in the round_
(see round_half_up,
round_half_even, round_using_signif, round_using_decimal,
and round_using_magnitude) and format_
(see format_missing,
format_big, format_small, and format_decimal) families.
Usage
round_spec(force_default = FALSE)
Arguments
force_default |
a logical value. If |
Details
Rounding specifications are meant to be passed into the table_glue
and table_value functions. The specification can also be passed into
table_
functions implicitly by saving a rounding specification into
the global options.
The round_spec()
function intentionally uses no input arguments.
This is to encourage users to develop rounding specifications
using the round_
and format_
families in conjunction with
the pipe (%>%
) operator.
Value
an object of class rounding_specification
.
Examples
rspec <- round_spec()
table_value(x = pi, rspec)
Set rules for rounding numbers
Description
These functions update a rounding_specification
object (see
round_spec) so that a particular approach to rounding is applied:
round to a dynamic decimal place based on magnitude of the rounded number (
round_using_magnitude()
)round to a specific number of significant digits (
round_using_signif()
)round to a specific decimal place (
round_using_decimal()
)
Usage
round_using_magnitude(rspec, digits = c(2, 1, 0), breaks = c(1, 10, Inf))
round_using_signif(rspec, digits = 2)
round_using_decimal(rspec, digits = 1)
Arguments
rspec |
a |
digits |
for |
breaks |
(only relevant if rounding based on magnitude) a positive, monotonically increasing numeric vector designating rounding boundaries. |
Details
digits
and breaks
must be used in coordination with each other
when rounding based on magnitude. For example, using
breaks = c(1, 10, Inf)
and decimals = c(2, 1, 0)
,
numbers whose absolute value is < 1 are rounded to 2 decimal places,
numbers whose absolute value is >= 1 and < 10 are rounding to 1 decimal place, and
numbers whose absolute value is >= 10 are rounding to 0 decimal places. The use of magnitude to guide rounding rules is extremely flexible and can be used for many different applications (e.g., see table_pvalue). Rounding by magnitude is similar in some ways to rounding to a set number of significant digits but not entirely the same (see examples).
Value
an object of class rounding_specification
.
See Also
Other rounding helpers:
round_half_up()
Examples
x <- c(pi, exp(1))
x <- c(x, x*10, x*100, x*1000)
# make one specification using each rounding approach
specs <- list(
magnitude = round_using_magnitude(round_spec()),
decimal = round_using_decimal(round_spec()),
signif = round_using_signif(round_spec())
)
# apply all three rounding specifications to x
# notice how the rounding specifications are in agreement
# for smaller values of x but their answers are different
# for larger values of x.
sapply(specs, function(rspec) table_value(x, rspec))
# output:
# magnitude decimal signif
# [1,] "3.1" "3.1" "3.1"
# [2,] "2.7" "2.7" "2.7"
# [3,] "31" "31.4" "31.0"
# [4,] "27" "27.2" "27.0"
# [5,] "314" "314.2" "310.0"
# [6,] "272" "271.8" "270.0"
# [7,] "3,142" "3,141.6" "3,100.0"
# [8,] "2,718" "2,718.3" "2,700.0"
Round estimates and their corresponding errors
Description
Though they are not easy to find in print, there are some general conventions for rounding numbers. When rounding a summary statistic such as the mean or median, the number of rounded digits shown should be governed by the precision of the statistic. For instance, authors are usually asked to present means plus or minus standard deviations in published research, or regression coefficients plus or minus the standard error. The convention applied here is to
find the place of the first significant digit of the error
round the estimate to that place
round the error to 1 additional place
present the combination in a form such as estimate (error) or estimate +/- error
Usage
table_ester(
estimate,
error,
form = "{estimate} ± {error}",
majority_rule = FALSE
)
table_estin(
estimate,
lower,
upper,
form = "{estimate} ({lower}, {upper})",
majority_rule = FALSE
)
Arguments
estimate |
a numeric vector of estimate values. |
error |
a numeric vector of error values. All errors should be >0. |
form |
a character value that indicates how the error and estimate
should be formatted together. Users can specify anything they like
as long as they use the terms |
majority_rule |
a logical value. If |
lower |
the lower-bound of an interval for the estimate. |
upper |
the upper-bound of an interval for the estimate. |
Value
a character vector
References
Blackstone, Eugene H. "Rounding numbers" (2016): The Journal of Thoracic and Cardiovascular Surgery. DOI: https://doi.org/10.1016/j.jtcvs.2016.09.003
See Also
Other table helpers:
table_glue()
,
table_pvalue()
,
table_value()
Examples
# ---- examples are taken from Blackstone, 2016 ----
# Example 1: ----
# Mean age is 72.17986, and the standard deviation (SD) is 9.364132.
## Steps:
## - Nine is the first significant figure of the SD.
## - Nine is in the ones place. Thus...
## + round the mean to the ones place (i.e., round(x, digits = 0))
## + round the SD to the tenths place (i.e., round(x, digits = 1))
table_ester(estimate = 72.17986, error = 9.364132)
# > [1] 72 +/- 9.4
# an estimated lower and upper bound for 95% confidence limits
lower <- 72.17986 - 1.96 * 9.364132
upper <- 72.17986 + 1.96 * 9.364132
table_estin(estimate = 72.17986, lower = lower, upper = upper,
form = "{estimate} (95% CI: {lower}, {upper})")
# > [1] "72 (95% CI: 54, 91)"
# Example 2: ----
# Mean cost is $72,347.23, and the standard deviation (SD) is $23,994.06.
## Steps:
## - Two is the first significant figure of the SD.
## - Nine is in the ten thousands place. Thus...
## + round mean to the 10-thousands place (i.e., round(x, digits = -4))
## + round SD to the thousands place (i.e., round(x, digits = -3))
table_ester(estimate = 72347.23, error = 23994.06)
# > [1] "70,000 +/- 24,000"
# an estimated lower and upper bound for 95% confidence limits
lower <- 72347.23 - 1.96 * 23994.06
upper <- 72347.23 + 1.96 * 23994.06
table_estin(estimate = 72347.23, lower = lower, upper = upper,
form = "{estimate} (95% CI: {lower} - {upper})")
# > [1] "70,000 (95% CI: 30,000 - 120,000)"
Expressive rounding for table values
Description
Expressive rounding for table values
Usage
table_glue(..., rspec = NULL, .sep = "", .envir = parent.frame())
Arguments
... |
strings to round and format. Multiple inputs are concatenated together. Named arguments are not supported. |
rspec |
a |
.sep |
Separator used to separate elements |
.envir |
environment to evaluate each expression in. |
Value
a character vector of length equal to the vectors supplied in ...
See Also
Other table helpers:
table_ester()
,
table_pvalue()
,
table_value()
Examples
x <- runif(10)
y <- runif(10)
table_glue("{x} / {y} = {x/y}")
table_glue("{x}", "({100 * y}%)", .sep = ' ')
df = data.frame(x = 1:10, y=1:10)
table_glue("{x} / {y} = {as.integer(x/y)}", .envir = df)
table_glue("{x} / {y} = {as.integer(x/y)}")
with(df, table_glue("{x} / {y} = {as.integer(x/y)}"))
mtcars$car <- rownames(mtcars)
# use the default rounding specification
table_glue(
"the {car} gets ~{mpg} miles/gallon and weighs ~{wt} thousand lbs",
.envir = mtcars[1:3, ]
)
# use your own rounding specification
rspec <- round_spec()
rspec <- round_using_decimal(rspec, digits = 1)
table_glue(
"the {car} gets ~{mpg} miles/gallon and weighs ~{wt} thousand lbs",
rspec = rspec,
.envir = mtcars[1:3, ]
)
Round p-values
Description
When presenting p-values, journals tend to request a lot of finessing.
table_pvalue()
is meant to do almost all of the finessing for you.
The part it does not do is interpret the p-value. For that,
please see the guideline on interpretation of p-values by the American
Statistical Association (Wasserstein, 2016). The six main statements
on p-value usage are included in the "Interpreting p-values" section
below.
Usage
table_pvalue(
x,
round_half_to = "even",
decimals_outer = 3L,
decimals_inner = 2L,
alpha = 0.05,
bound_inner_low = 0.01,
bound_inner_high = 0.99,
bound_outer_low = 0.001,
bound_outer_high = 0.999,
miss_replace = "--",
drop_leading_zero = TRUE
)
Arguments
x |
a vector of numeric values. All values should be > 0 and < 1. |
round_half_to |
a character value indicating how to break ties when the rounded unit is exactly halfway between two rounding points. See round_half_even and round_half_up for details. Valid inputs are 'even' and 'up'. |
decimals_outer |
number of decimals to print when p > bound_outer_high or p < bound_outer_low. |
decimals_inner |
number of decimals to print when
|
alpha |
a numeric value indicating the significance level, i.e. the probability that you will make the mistake of rejecting the null hypothesis when it is true. |
bound_inner_low |
the lower bound of the inner range. |
bound_inner_high |
the upper bound of the inner range. |
bound_outer_low |
the lowest value printed. Values lower than the threshold will be printed as <threshold. |
bound_outer_high |
the highest value printed. Values higher than the threshold will be printed as >threshold. |
miss_replace |
a character value that replaces missing values. |
drop_leading_zero |
a logical value. If |
Value
a character vector
Interpreting p-values
The American Statistical Association (ASA) defines the p-value as follows:
A p-value is the probability under a specified statistical model that a statistical summary of the data (e.g., the sample mean difference between two compared groups) would be equal to or more extreme than its observed value.
It then provides six principles to guide p-value usage:
P-values can indicate how incompatible the data are with a specified statistical model.A p-value provides one approach to summarizing the incompatibility between a particular set of data and a proposed model for the data. The most common context is a model, constructed under a set of assumptions, together with a so-called "null hypothesis". Often the null hypothesis postulates the absence of an effect, such as no difference between two groups, or the absence of a relationship between a factor and an outcome. The smaller the p-value, the greater the statistical incompatibility of the data with the null hypothesis, if the underlying assumptions used to calculate the p-value hold. This incompatibility can be interpreted as casting doubt on or providing evidence against the null hypothesis or the underlying assumptions.
P-values do not measure the probability that the studied hypothesis is true, or the probability that the data were produced by random chance alone. Researchers often wish to turn a p-value into a statement about the truth of a null hypothesis, or about the probability that random chance produced the observed data. The p-value is neither. It is a statement about data in relation to a specified hypothetical explanation, and is not a statement about the explanation itself.
Scientific conclusions and business or policy decisions should not be based only on whether a p-value passes a specific threshold. Practices that reduce data analysis or scientific inference to mechanical "bright-line" rules (such as "p < 0.05") for justifying scientific claims or conclusions can lead to erroneous beliefs and poor decision making. A conclusion does not immediately become "true" on one side of the divide and "false" on the other. Researchers should bring many contextual factors into play to derive scientific inferences, including the design of a study, the quality of the measurements, the external evidence for the phenomenon under study, and the validity of assumptions that underlie the data analysis. Pragmatic considerations often require binary, "yes-no" decisions, but this does not mean that p-values alone can ensure that a decision is correct or incorrect. The widespread use of "statistical significance" (generally interpreted as "p<0.05") as a license for making a claim of a scientific finding (or implied truth) leads to considerable distortion of the scientific process.
Proper inference requires full reporting and transparency P-values and related analyses should not be reported selectively. Conducting multiple analyses of the data and reporting only those with certain p-values (typically those passing a significance threshold) renders the reported p-values essentially uninterpretable. Cherry picking promising findings, also known by such terms as data dredging, significance chasing, significance questing, selective inference, and "p-hacking," leads to a spurious excess of statistically significant results in the published literature and should be vigorously avoided. One need not formally carry out multiple statistical tests for this problem to arise: Whenever a researcher chooses what to present based on statistical results, valid interpretation of those results is severely compromised if the reader is not informed of the choice and its basis. Researchers should disclose the number of hypotheses explored during the study, all data collection decisions, all statistical analyses conducted, and all p-values computed. Valid scientific conclusions based on p-values and related statistics cannot be drawn without at least knowing how many and which analyses were conducted, and how those analyses (including p-values) were selected for reporting.
A p-value, or statistical significance, does not measure the size of an effect or the importance of a result. Statistical significance is not equivalent to scientific, human, or economic significance. Smaller p-values do not necessarily imply the presence of larger or more important effects, and larger p-values do not imply a lack of importance or even lack of effect. Any effect, no matter how tiny, can produce a small p-value if the sample size or measurement precision is high enough, and large effects may produce unimpressive p-values if the sample size is small or measurements are imprecise. Similarly, identical estimated effects will have different p-values if the precision of the estimates differs.
By itself, a p-value does not provide a good measure of evidence regarding a model or hypothesis. Researchers should recognize that a p-value without context or other evidence provides limited information. For example, a p-value near 0.05 taken by itself offers only weak evidence against the null hypothesis. Likewise, a relatively large p-value does not imply evidence in favor of the null hypothesis; many other hypotheses may be equally or more consistent with the observed data. For these reasons, data analysis should not end with the calculation of a p-value when other approaches are appropriate and feasible.
References
Wasserstein, Ronald L., and Nicole A. Lazar. "The ASA statement on p-values: context, process, and purpose." (2016): The American Statistician: 129-133. DOI: https://doi.org/10.1080/00031305.2016.1154108
See Also
Other table helpers:
table_ester()
,
table_glue()
,
table_value()
Examples
# Guideline by the American Medical Association Manual of Style:
# Round p-values to 2 or 3 digits after the decimal point depending
# on the number of zeros. For example,
## - Change .157 to .16.
## - Change .037 to .04.
## - Don't change .047 to .05, because it will no longer be significant.
## - Keep .003 as is because 2 zeros after the decimal are fine.
## - Change .0003 or .00003 or .000003 to <.001
#
# In addition, the guideline states that "expressing P to more than 3
# significant digits does not add useful information." You may or may not
# agree with this guideline (I do not agree with parts of it),
# but you will (hopefully) appreciate `table_pvalue()` automating these
# recommendations if you submit papers to journals associated with
# the American Medical Association.
pvals_ama <- c(0.157, 0.037, 0.047, 0.003, 0.0003, 0.00003, 0.000003)
table_pvalue(pvals_ama)
# > [1] ".16" ".04" ".047" ".003" "<.001" "<.001" "<.001"
# `table_pvalue()` will fight valiantly to keep your p-value < alpha if
# it is < alpha. If it's >= alpha, `table_pvalue()` treats it normally.
pvals_close <- c(0.04998, 0.05, 0.050002)
table_pvalue(pvals_close)
# > [1] ".04998" ".05" ".05"
General rounding for tables
Description
table_value()
casts numeric vectors into character vectors.
The main purpose of table_value()
is to round and format
numeric data for presentation.
Usage
table_value(x, rspec = NULL)
Arguments
x |
a vector of numeric values. |
rspec |
a |
Value
a vector of character values (rounded numbers).
See Also
Other table helpers:
table_ester()
,
table_glue()
,
table_pvalue()
Examples
table_value(0.123)
table_value(1.23)
table_value(12.3)
with(mtcars, table_value(disp))