| Title: | Jensen-Shannon Divergence Estimation, Confidence Intervals, and Distribution Plots |
| Version: | 0.1.0 |
| Description: | Estimates Jensen-Shannon divergence (JSD) for quantifying distributional differences between two groups on a given variable. Supports both continuous and discrete variables, with tools for point estimation, bootstrap confidence intervals, and visualization of raw group-specific distributions. |
| License: | GPL (≥ 3) |
| Encoding: | UTF-8 |
| RoxygenNote: | 7.3.3 |
| Depends: | R (≥ 4.0.0) |
| Imports: | graphics, grDevices, stats |
| Suggests: | survival, testthat (≥ 3.0.0) |
| Config/testthat/edition: | 3 |
| NeedsCompilation: | no |
| Packaged: | 2026-03-26 08:58:52 UTC; lwj |
| Author: | Yueqin Hu [aut, cre], Yiran Zhou [aut], Wenjuan Liu [aut] |
| Maintainer: | Yueqin Hu <yueqinhu@bnu.edu.cn> |
| Repository: | CRAN |
| Date/Publication: | 2026-03-30 18:30:02 UTC |
Validate logarithm base
Description
Validate logarithm base
Usage
check_base(base)
Arguments
base |
Logarithm base. |
Value
Invisibly TRUE.
Validate confidence level
Description
Validate confidence level
Usage
check_conf_level(conf_level)
Arguments
conf_level |
Confidence level. |
Value
Invisibly TRUE.
Detect variable type for JSD
Description
Detect variable type for JSD
Usage
detect_type(x, y, discrete_max_unique = 10, tol = 1e-08)
Arguments
x |
First vector. |
y |
Second vector. |
discrete_max_unique |
Maximum number of unique values for integer-like numeric data to be treated as discrete. |
tol |
Tolerance for integer-like detection. |
Value
One of "continuous" or "discrete".
Fixed integration range for continuous JSD
Description
Fixed integration range for continuous JSD
Usage
fixed_range(x, y, qrange = c(0.001, 0.999), extend = 3)
Arguments
x |
Numeric vector for group 1. |
y |
Numeric vector for group 2. |
qrange |
Quantile range used to determine the main data span. |
extend |
Extension multiplier based on IQR. |
Value
Named numeric vector with elements 'L' and 'U'.
Estimate Jensen-Shannon divergence
Description
Unified front-end for JSD estimation for continuous and discrete variables.
Usage
jsd(x, y, type = c("auto", "continuous", "discrete"), base = 2, ...)
Arguments
x |
First vector. |
y |
Second vector. |
type |
One of '"auto"', '"continuous"', or '"discrete"'. |
base |
Logarithm base. Defaults to 2. Use 'exp(1)' for nats. |
... |
Additional arguments passed to the type-specific estimator. |
Value
An object of class '"jsd_estimate"'.
Bootstrap confidence interval for Jensen-Shannon divergence
Description
Unified front-end for JSD confidence interval estimation for continuous and discrete variables.
Usage
jsd_ci(
x,
y,
type = c("auto", "continuous", "discrete"),
B = 1000,
conf_level = 0.95,
base = 2,
seed = NULL,
...
)
Arguments
x |
First vector. |
y |
Second vector. |
type |
One of '"auto"', '"continuous"', or '"discrete"'. |
B |
Number of bootstrap replicates. |
conf_level |
Confidence level. Defaults to 0.95. |
base |
Logarithm base. Defaults to 2. Use 'exp(1)' for nats. |
seed |
Optional random seed. |
... |
Additional arguments passed to the type-specific bootstrap estimator. |
Value
An object of class '"jsd_ci"'.
Estimate JSD for continuous variables
Description
Computes Jensen-Shannon divergence (JSD) between two numeric vectors using kernel density estimation (KDE) and numerical integration.
Usage
jsd_continuous(
x,
y,
L = NULL,
U = NULL,
base = 2,
bw = "nrd0",
kernel = "gaussian",
grid_n = 4096,
qrange = c(0.001, 0.999),
extend = 3,
eps = 1e-12,
renormalize = TRUE,
na_rm = TRUE
)
Arguments
x |
Numeric vector for group 1. |
y |
Numeric vector for group 2. |
L |
Optional lower integration bound. |
U |
Optional upper integration bound. |
base |
Logarithm base. Defaults to 2. Use 'exp(1)' for nats. |
bw |
Bandwidth passed to [stats::density()]. |
kernel |
Kernel passed to [stats::density()]. |
grid_n |
Number of grid points used for KDE. |
qrange |
Quantile range used when 'L' and 'U' are not supplied. |
extend |
Extension multiplier for the automatically chosen range. |
eps |
Small constant for numerical stability. |
renormalize |
Logical; renormalize estimated densities over the grid? |
na_rm |
Logical; remove missing values? |
Value
An object of class '"jsd_estimate"'.
Bootstrap confidence interval for continuous JSD
Description
Bootstrap confidence interval for continuous JSD
Usage
jsd_continuous_ci(
x,
y,
B = 1000,
conf_level = 0.95,
L = NULL,
U = NULL,
base = 2,
bw = "nrd0",
kernel = "gaussian",
grid_n = 4096,
qrange = c(0.001, 0.999),
extend = 3,
eps = 1e-12,
renormalize = TRUE,
seed = NULL,
na_rm = TRUE,
na_rm_failed = TRUE
)
Arguments
x |
Numeric vector for group 1. |
y |
Numeric vector for group 2. |
B |
Number of bootstrap replicates. |
conf_level |
Confidence level. Defaults to 0.95. |
L |
Optional lower integration bound. |
U |
Optional upper integration bound. |
base |
Logarithm base. Defaults to 2. Use 'exp(1)' for nats. |
bw |
Bandwidth passed to [stats::density()]. |
kernel |
Kernel passed to [stats::density()]. |
grid_n |
Number of grid points used for KDE. |
qrange |
Quantile range used when 'L' and 'U' are not supplied. |
extend |
Extension multiplier for the automatically chosen range. |
eps |
Small constant for numerical stability. |
renormalize |
Logical; renormalize estimated densities over the grid? |
seed |
Optional random seed. |
na_rm |
Logical; remove missing values? |
na_rm_failed |
Logical; drop failed bootstrap draws when summarizing? |
Value
An object of class '"jsd_ci"'.
Estimate JSD for discrete variables
Description
Computes Jensen-Shannon divergence (JSD) between two discrete variables using empirical probability mass functions.
Usage
jsd_discrete(
x,
y,
support = NULL,
base = 2,
eps = 1e-12,
add_smoothing = FALSE,
na_rm = TRUE
)
Arguments
x |
Vector for group 1. Can be numeric, factor, character, or logical. |
y |
Vector for group 2. Can be numeric, factor, character, or logical. |
support |
Optional support values. If 'NULL', the union of observed values in 'x' and 'y' is used. |
base |
Logarithm base. Defaults to 2. Use 'exp(1)' for nats. |
eps |
Small constant for numerical stability. |
add_smoothing |
Logical; add 1 to each cell count? |
na_rm |
Logical; remove missing values? |
Value
An object of class '"jsd_estimate"'.
Bootstrap confidence interval for discrete JSD
Description
Bootstrap confidence interval for discrete JSD
Usage
jsd_discrete_ci(
x,
y,
B = 1000,
conf_level = 0.95,
support = NULL,
base = 2,
eps = 1e-12,
add_smoothing = FALSE,
seed = NULL,
na_rm = TRUE,
na_rm_failed = TRUE
)
Arguments
x |
Vector for group 1. |
y |
Vector for group 2. |
B |
Number of bootstrap replicates. |
conf_level |
Confidence level. Defaults to 0.95. |
support |
Optional support values. |
base |
Logarithm base. Defaults to 2. Use 'exp(1)' for nats. |
eps |
Small constant for numerical stability. |
add_smoothing |
Logical; add 1 to each cell count? |
seed |
Optional random seed. |
na_rm |
Logical; remove missing values? |
na_rm_failed |
Logical; drop failed bootstrap draws when summarizing? |
Value
An object of class '"jsd_ci"'.
Build support for discrete variables
Description
Build support for discrete variables
Usage
make_support(x, y, support = NULL)
Arguments
x |
First vector. |
y |
Second vector. |
support |
Optional user-specified support. |
Value
A character vector of support values.
Plot two continuous distributions
Description
Plot two continuous distributions
Usage
plot_continuous(
x,
y,
group_names = c("Group 1", "Group 2"),
bins = 30,
style = c("both", "hist", "density"),
main = "Two-group raw distributions",
xlab = "Value",
ylab = "Density",
col_x = rgb(0.2, 0.4, 0.8, 0.4),
col_y = rgb(0.8, 0.2, 0.2, 0.4),
line_col_x = "#2F5FB3",
line_col_y = "#CC3333",
lwd = 2,
na_rm = TRUE,
show_jsd = TRUE,
jsd_digits = 3
)
Arguments
x |
Numeric vector for group 1. |
y |
Numeric vector for group 2. |
group_names |
Group labels. |
bins |
Approximate number of histogram bins. |
style |
One of '"both"', '"hist"', or '"density"'. |
main |
Plot title. |
xlab |
X-axis label. |
ylab |
Y-axis label. |
col_x |
Fill color for group 1. |
col_y |
Fill color for group 2. |
line_col_x |
Line color for group 1 density. |
line_col_y |
Line color for group 2 density. |
lwd |
Line width for density curves. |
na_rm |
Logical; remove missing values? |
show_jsd |
Logical; whether to display JSD on the plot. |
jsd_digits |
Number of digits for displayed JSD. |
Value
Invisibly returns plotting data.
Plot two discrete distributions with overlap
Description
Plot two discrete distributions with overlap
Usage
plot_discrete(
x,
y,
support = NULL,
group_names = c("Group 1", "Group 2"),
main = "Two-group discrete distributions",
xlab = "Value",
ylab = "Proportion",
col_x = adjustcolor("#2F5FB3", alpha.f = 0.2),
col_y = adjustcolor("#CC3333", alpha.f = 0.2),
overlap_col = adjustcolor("grey55", alpha.f = 0.35),
line_col_x = "#2F5FB3",
line_col_y = "#CC3333",
lwd = 2,
pch = 16,
cex_pt = 1.1,
las = 1,
bar_width = 0.2,
show_jsd = TRUE,
jsd_digits = 3,
na_rm = TRUE
)
Arguments
x |
Vector for group 1. |
y |
Vector for group 2. |
support |
Optional support values. |
group_names |
Group labels. |
main |
Plot title. |
xlab |
X-axis label. |
ylab |
Y-axis label. |
col_x |
Color for group 1. |
col_y |
Color for group 2. |
overlap_col |
Fill color for overlap bars. |
line_col_x |
Line color for group 1. |
line_col_y |
Line color for group 2. |
lwd |
Line width. |
pch |
Point character. |
cex_pt |
Point size. |
las |
Axis label style for x-axis. |
bar_width |
Width of overlap bars. |
show_jsd |
Logical; whether to display JSD on the plot. |
jsd_digits |
Number of digits for displayed JSD. |
na_rm |
Logical; remove missing values? |
Value
Invisibly returns plotting data.
Plot two-group distributions
Description
Unified front-end for plotting continuous or discrete two-group raw distributions.
Usage
plot_dist(x, y, type = c("auto", "continuous", "discrete"), ...)
Arguments
x |
First vector. |
y |
Second vector. |
type |
One of '"auto"', '"continuous"', or '"discrete"'. |
... |
Additional arguments passed to the type-specific plotting function. |
Value
Invisibly returns plotting data.
Log with package-supported base
Description
Log with package-supported base
Usage
safe_log_base(x, base = 2)
Arguments
x |
Numeric vector. |
base |
Logarithm base. Defaults to 2. Must be either 2 or exp(1). |
Value
Numeric vector.
Numerical trapezoidal integration
Description
Numerical trapezoidal integration
Usage
trapz_num(x, y)
Arguments
x |
Numeric grid values. |
y |
Numeric function values on the grid. |
Value
A numeric scalar.
Validate paired inputs
Description
Validate paired inputs
Usage
validate_xy(x, y, min_n = 1, na_rm = TRUE, finite_only = FALSE)
Arguments
x |
First input vector. |
y |
Second input vector. |
min_n |
Minimum required sample size after filtering. |
na_rm |
Logical; remove missing values? |
finite_only |
Logical; keep only finite values? |
Value
A list with cleaned x and y.