Title: Jensen-Shannon Divergence Estimation, Confidence Intervals, and Distribution Plots
Version: 0.1.0
Description: Estimates Jensen-Shannon divergence (JSD) for quantifying distributional differences between two groups on a given variable. Supports both continuous and discrete variables, with tools for point estimation, bootstrap confidence intervals, and visualization of raw group-specific distributions.
License: GPL (≥ 3)
Encoding: UTF-8
RoxygenNote: 7.3.3
Depends: R (≥ 4.0.0)
Imports: graphics, grDevices, stats
Suggests: survival, testthat (≥ 3.0.0)
Config/testthat/edition: 3
NeedsCompilation: no
Packaged: 2026-03-26 08:58:52 UTC; lwj
Author: Yueqin Hu [aut, cre], Yiran Zhou [aut], Wenjuan Liu [aut]
Maintainer: Yueqin Hu <yueqinhu@bnu.edu.cn>
Repository: CRAN
Date/Publication: 2026-03-30 18:30:02 UTC

Validate logarithm base

Description

Validate logarithm base

Usage

check_base(base)

Arguments

base

Logarithm base.

Value

Invisibly TRUE.


Validate confidence level

Description

Validate confidence level

Usage

check_conf_level(conf_level)

Arguments

conf_level

Confidence level.

Value

Invisibly TRUE.


Detect variable type for JSD

Description

Detect variable type for JSD

Usage

detect_type(x, y, discrete_max_unique = 10, tol = 1e-08)

Arguments

x

First vector.

y

Second vector.

discrete_max_unique

Maximum number of unique values for integer-like numeric data to be treated as discrete.

tol

Tolerance for integer-like detection.

Value

One of "continuous" or "discrete".


Fixed integration range for continuous JSD

Description

Fixed integration range for continuous JSD

Usage

fixed_range(x, y, qrange = c(0.001, 0.999), extend = 3)

Arguments

x

Numeric vector for group 1.

y

Numeric vector for group 2.

qrange

Quantile range used to determine the main data span.

extend

Extension multiplier based on IQR.

Value

Named numeric vector with elements 'L' and 'U'.


Estimate Jensen-Shannon divergence

Description

Unified front-end for JSD estimation for continuous and discrete variables.

Usage

jsd(x, y, type = c("auto", "continuous", "discrete"), base = 2, ...)

Arguments

x

First vector.

y

Second vector.

type

One of '"auto"', '"continuous"', or '"discrete"'.

base

Logarithm base. Defaults to 2. Use 'exp(1)' for nats.

...

Additional arguments passed to the type-specific estimator.

Value

An object of class '"jsd_estimate"'.


Bootstrap confidence interval for Jensen-Shannon divergence

Description

Unified front-end for JSD confidence interval estimation for continuous and discrete variables.

Usage

jsd_ci(
  x,
  y,
  type = c("auto", "continuous", "discrete"),
  B = 1000,
  conf_level = 0.95,
  base = 2,
  seed = NULL,
  ...
)

Arguments

x

First vector.

y

Second vector.

type

One of '"auto"', '"continuous"', or '"discrete"'.

B

Number of bootstrap replicates.

conf_level

Confidence level. Defaults to 0.95.

base

Logarithm base. Defaults to 2. Use 'exp(1)' for nats.

seed

Optional random seed.

...

Additional arguments passed to the type-specific bootstrap estimator.

Value

An object of class '"jsd_ci"'.


Estimate JSD for continuous variables

Description

Computes Jensen-Shannon divergence (JSD) between two numeric vectors using kernel density estimation (KDE) and numerical integration.

Usage

jsd_continuous(
  x,
  y,
  L = NULL,
  U = NULL,
  base = 2,
  bw = "nrd0",
  kernel = "gaussian",
  grid_n = 4096,
  qrange = c(0.001, 0.999),
  extend = 3,
  eps = 1e-12,
  renormalize = TRUE,
  na_rm = TRUE
)

Arguments

x

Numeric vector for group 1.

y

Numeric vector for group 2.

L

Optional lower integration bound.

U

Optional upper integration bound.

base

Logarithm base. Defaults to 2. Use 'exp(1)' for nats.

bw

Bandwidth passed to [stats::density()].

kernel

Kernel passed to [stats::density()].

grid_n

Number of grid points used for KDE.

qrange

Quantile range used when 'L' and 'U' are not supplied.

extend

Extension multiplier for the automatically chosen range.

eps

Small constant for numerical stability.

renormalize

Logical; renormalize estimated densities over the grid?

na_rm

Logical; remove missing values?

Value

An object of class '"jsd_estimate"'.


Bootstrap confidence interval for continuous JSD

Description

Bootstrap confidence interval for continuous JSD

Usage

jsd_continuous_ci(
  x,
  y,
  B = 1000,
  conf_level = 0.95,
  L = NULL,
  U = NULL,
  base = 2,
  bw = "nrd0",
  kernel = "gaussian",
  grid_n = 4096,
  qrange = c(0.001, 0.999),
  extend = 3,
  eps = 1e-12,
  renormalize = TRUE,
  seed = NULL,
  na_rm = TRUE,
  na_rm_failed = TRUE
)

Arguments

x

Numeric vector for group 1.

y

Numeric vector for group 2.

B

Number of bootstrap replicates.

conf_level

Confidence level. Defaults to 0.95.

L

Optional lower integration bound.

U

Optional upper integration bound.

base

Logarithm base. Defaults to 2. Use 'exp(1)' for nats.

bw

Bandwidth passed to [stats::density()].

kernel

Kernel passed to [stats::density()].

grid_n

Number of grid points used for KDE.

qrange

Quantile range used when 'L' and 'U' are not supplied.

extend

Extension multiplier for the automatically chosen range.

eps

Small constant for numerical stability.

renormalize

Logical; renormalize estimated densities over the grid?

seed

Optional random seed.

na_rm

Logical; remove missing values?

na_rm_failed

Logical; drop failed bootstrap draws when summarizing?

Value

An object of class '"jsd_ci"'.


Estimate JSD for discrete variables

Description

Computes Jensen-Shannon divergence (JSD) between two discrete variables using empirical probability mass functions.

Usage

jsd_discrete(
  x,
  y,
  support = NULL,
  base = 2,
  eps = 1e-12,
  add_smoothing = FALSE,
  na_rm = TRUE
)

Arguments

x

Vector for group 1. Can be numeric, factor, character, or logical.

y

Vector for group 2. Can be numeric, factor, character, or logical.

support

Optional support values. If 'NULL', the union of observed values in 'x' and 'y' is used.

base

Logarithm base. Defaults to 2. Use 'exp(1)' for nats.

eps

Small constant for numerical stability.

add_smoothing

Logical; add 1 to each cell count?

na_rm

Logical; remove missing values?

Value

An object of class '"jsd_estimate"'.


Bootstrap confidence interval for discrete JSD

Description

Bootstrap confidence interval for discrete JSD

Usage

jsd_discrete_ci(
  x,
  y,
  B = 1000,
  conf_level = 0.95,
  support = NULL,
  base = 2,
  eps = 1e-12,
  add_smoothing = FALSE,
  seed = NULL,
  na_rm = TRUE,
  na_rm_failed = TRUE
)

Arguments

x

Vector for group 1.

y

Vector for group 2.

B

Number of bootstrap replicates.

conf_level

Confidence level. Defaults to 0.95.

support

Optional support values.

base

Logarithm base. Defaults to 2. Use 'exp(1)' for nats.

eps

Small constant for numerical stability.

add_smoothing

Logical; add 1 to each cell count?

seed

Optional random seed.

na_rm

Logical; remove missing values?

na_rm_failed

Logical; drop failed bootstrap draws when summarizing?

Value

An object of class '"jsd_ci"'.


Build support for discrete variables

Description

Build support for discrete variables

Usage

make_support(x, y, support = NULL)

Arguments

x

First vector.

y

Second vector.

support

Optional user-specified support.

Value

A character vector of support values.


Plot two continuous distributions

Description

Plot two continuous distributions

Usage

plot_continuous(
  x,
  y,
  group_names = c("Group 1", "Group 2"),
  bins = 30,
  style = c("both", "hist", "density"),
  main = "Two-group raw distributions",
  xlab = "Value",
  ylab = "Density",
  col_x = rgb(0.2, 0.4, 0.8, 0.4),
  col_y = rgb(0.8, 0.2, 0.2, 0.4),
  line_col_x = "#2F5FB3",
  line_col_y = "#CC3333",
  lwd = 2,
  na_rm = TRUE,
  show_jsd = TRUE,
  jsd_digits = 3
)

Arguments

x

Numeric vector for group 1.

y

Numeric vector for group 2.

group_names

Group labels.

bins

Approximate number of histogram bins.

style

One of '"both"', '"hist"', or '"density"'.

main

Plot title.

xlab

X-axis label.

ylab

Y-axis label.

col_x

Fill color for group 1.

col_y

Fill color for group 2.

line_col_x

Line color for group 1 density.

line_col_y

Line color for group 2 density.

lwd

Line width for density curves.

na_rm

Logical; remove missing values?

show_jsd

Logical; whether to display JSD on the plot.

jsd_digits

Number of digits for displayed JSD.

Value

Invisibly returns plotting data.


Plot two discrete distributions with overlap

Description

Plot two discrete distributions with overlap

Usage

plot_discrete(
  x,
  y,
  support = NULL,
  group_names = c("Group 1", "Group 2"),
  main = "Two-group discrete distributions",
  xlab = "Value",
  ylab = "Proportion",
  col_x = adjustcolor("#2F5FB3", alpha.f = 0.2),
  col_y = adjustcolor("#CC3333", alpha.f = 0.2),
  overlap_col = adjustcolor("grey55", alpha.f = 0.35),
  line_col_x = "#2F5FB3",
  line_col_y = "#CC3333",
  lwd = 2,
  pch = 16,
  cex_pt = 1.1,
  las = 1,
  bar_width = 0.2,
  show_jsd = TRUE,
  jsd_digits = 3,
  na_rm = TRUE
)

Arguments

x

Vector for group 1.

y

Vector for group 2.

support

Optional support values.

group_names

Group labels.

main

Plot title.

xlab

X-axis label.

ylab

Y-axis label.

col_x

Color for group 1.

col_y

Color for group 2.

overlap_col

Fill color for overlap bars.

line_col_x

Line color for group 1.

line_col_y

Line color for group 2.

lwd

Line width.

pch

Point character.

cex_pt

Point size.

las

Axis label style for x-axis.

bar_width

Width of overlap bars.

show_jsd

Logical; whether to display JSD on the plot.

jsd_digits

Number of digits for displayed JSD.

na_rm

Logical; remove missing values?

Value

Invisibly returns plotting data.


Plot two-group distributions

Description

Unified front-end for plotting continuous or discrete two-group raw distributions.

Usage

plot_dist(x, y, type = c("auto", "continuous", "discrete"), ...)

Arguments

x

First vector.

y

Second vector.

type

One of '"auto"', '"continuous"', or '"discrete"'.

...

Additional arguments passed to the type-specific plotting function.

Value

Invisibly returns plotting data.


Log with package-supported base

Description

Log with package-supported base

Usage

safe_log_base(x, base = 2)

Arguments

x

Numeric vector.

base

Logarithm base. Defaults to 2. Must be either 2 or exp(1).

Value

Numeric vector.


Numerical trapezoidal integration

Description

Numerical trapezoidal integration

Usage

trapz_num(x, y)

Arguments

x

Numeric grid values.

y

Numeric function values on the grid.

Value

A numeric scalar.


Validate paired inputs

Description

Validate paired inputs

Usage

validate_xy(x, y, min_n = 1, na_rm = TRUE, finite_only = FALSE)

Arguments

x

First input vector.

y

Second input vector.

min_n

Minimum required sample size after filtering.

na_rm

Logical; remove missing values?

finite_only

Logical; keep only finite values?

Value

A list with cleaned x and y.

mirror server hosted at Truenetwork, Russian Federation.