Type: Package
Title: Kernel Density Estimation with Parametric Starts and Asymmetric Kernels
Version: 1.1.1
Description: Handles univariate non-parametric density estimation with parametric starts and asymmetric kernels in a simple and flexible way. Kernel density estimation with parametric starts involves fitting a parametric density to the data before making a correction with kernel density estimation, see Hjort & Glad (1995) <doi:10.1214/aos/1176324627>. Asymmetric kernels make kernel density estimation more efficient on bounded intervals such as (0, 1) and the positive half-line. Supported asymmetric kernels are the gamma kernel of Chen (2000) <doi:10.1023/A:1004165218295>, the beta kernel of Chen (1999) <doi:10.1016/S0167-9473(99)00010-9>, and the copula kernel of Jones & Henderson (2007) <doi:10.1093/biomet/asm068>. User-supplied kernels, parametric starts, and bandwidths are supported.
License: MIT + file LICENSE
URL: https://github.com/JonasMoss/kdensity
BugReports: https://github.com/JonasMoss/kdensity/issues
Encoding: UTF-8
Suggests: extraDistr, SkewHyperbolic, testthat, covr, knitr, rmarkdown
Imports: assertthat, univariateML, EQL
RoxygenNote: 7.3.2
VignetteBuilder: knitr
NeedsCompilation: no
Packaged: 2025-03-04 11:59:23 UTC; jmoss
Author: Jonas Moss ORCID iD [aut, cre], Martin Tveten [ctb]
Maintainer: Jonas Moss <jonas.gjertsen@gmail.com>
Repository: CRAN
Date/Publication: 2025-03-04 11:50:04 UTC

Add a new bw to bw_environment.

Description

Add a new bw to bw_environment.

Usage

add_bw(bw_str, bw)

Arguments

bw_str

A string giving the name of the density.

bw

The bw function.

Value

None.


Add a new kernel to kernels_environment.

Description

Add a new kernel to kernels_environment.

Usage

add_kernel(kernel_str, kernel)

Arguments

kernel_str

A string giving the name of the density.

kernel

The kernel function.

Value

None.


Add a new parametric start to starts_environment.

Description

Add a new parametric start to starts_environment.

Usage

add_start(start_str, start)

Arguments

start_str

A string giving the name of the density.

start

The parametric start function.

Value

None.


Bandwidth Selectors

Description

The available options for bandwidth selectors, passed as the bw argument to kdensity.

Arguments

x

The input data.

kernel_str

A string specifying the kernel, e.g. "gaussian."

start_str

A string specifying the parametric start, e.g. "normal".

support

The domain of definition for the kernel. (-Inf, Inf) for symmetric kernels.

Details

The bandwidth functions are not exported. They are members of the environment bw_environments, and can be accessed by kdensity:::bw_environments.

Bandwidth selectors

⁠"nrd0", "nrd", "bcv", "SJ"⁠: Bandwidth selectors from stats. They are documented in ⁠[bandwidth][stats::bandwidth] stats:bandwidth⁠. "nrd0" is the standard bandwidth selector for symmetric kernels with constant parametric starts.

"ucv": Unbiased cross validation. The standard option for asymmetric kernels.

"RHE": Selector for parametric starts with a symmetric kernel, based on a reference rule with Hermite polynomials. Described in Hjort & Glad (1995). The default method in kdensity when a parametric start is supplied and the kernel is symmetric.

"JH": Selector for the Gaussian copula kernel, based on normal reference rule. Described in Jones & Henderson. The default method when the gcopula kernel is used in kdensity.

Structure

The bandwidth selector is a function of four arguments: The data x, a kernel string kernel, a start string start, and a support vector support. To obtain the functions associated with these strings, use get_kernel and get_start. The function should return a double.

References

Jones, M. C., and D. A. Henderson. "Kernel-type density estimation on the unit interval." Biometrika 94.4 (2007): 977-984. Hjort, Nils Lid, and Ingrid K. Glad. "Nonparametric density estimation with a parametric start." The Annals of Statistics (1995): 882-904.

See Also

kdensity(), stats::bandwidth.kernel() for the bandwidth selectors of stats::density(). In addition, kernels(); parametric_starts()

Examples

## Not a serious bandwidth function.
silly_width <- function(x, kernel = NULL, start = NULL, support = NULL) {
  rexp(1)
}
kdensity(mtcars$mpg, start = "gumbel", bw = silly_width)

Get bandwidth functions from string.

Description

Get bandwidth functions from string.

Usage

get_bw(bw_str)

Arguments

bw_str

a string specifying the density of interest.

Value

a bandwidth function.


Helper function that gets a kernel function for kdensity.

Description

Helper function that gets a kernel function for kdensity.

Usage

get_kernel(kernel_str)

Arguments

kernel_str

a string specifying which kernel to use.

Value

a kernel function of the format k(u) with integral normalized to 1.


Fill in missing kernel, start or support given the supplied values.

Description

This function takes the supplied values of kernel, start, and support and fills in the non-supplied ones. It also handles inconsistencies, such as providing a support on (-Inf, Inf) but a kernel on (0, Inf).

Usage

get_kernel_start_support(kernel, start, support)

Arguments

kernel

Supplied kernel; string or list.

start

Supplied parametric start; string or list.

support

Binary vector.

Details

The kernel and start parameters are either strings or adhering to the kernel/start list structure. support is a numeric vector of length two.

Value

a list with members kernel, kernel_str, start, start_str, and support.


Supplies a plotting range from a kdensity object.

Description

Supplies a plotting range from a kdensity object.

Usage

get_range(obj)

Arguments

obj

A kdensity object.

Value

S vector of size 1000, used for plotting.


Get a bandwidth string when 'bw' is unspecified.

Description

Get a bandwidth string when 'bw' is unspecified.

Usage

get_standard_bw(kernel_str, start_str, support)

Arguments

kernel_str

a kernel string

start_str

a parametric start string.

support

the support.

Value

a bandwidth string.


Get densities and estimators from strings.

Description

Get densities and estimators from strings.

Usage

get_start(start_str)

Arguments

start_str

A string specifying the density of interest.

Value

A list of two functions.


Parametrically guided kernel density estimation

Description

kdensity computes a parametrically guided kernel density estimate for univariate data. It supports asymmetric kernels and parametric starts through the kernel and start arguments.

Usage

kdensity(
  x,
  bw = NULL,
  adjust = 1,
  kernel = NULL,
  start = NULL,
  support = NULL,
  na.rm = FALSE,
  normalized = TRUE,
  tolerance = 0.01
)

Arguments

x

Numeric vector containing the data.

bw

A bandwidth function. Can be either a string, a custom-made function, or a double. The supported bandwidth functions are documented in bandwidths().

adjust

An adjustment constant, so that h = adjust*bw*sd, where sd varies with the chosen kernel.

kernel

The kernel function. Can be chosen from the list of built-in kernels or be custom-made. See kernels() for details.

start

Parametric start. Can be chosen from the list of built-in parametric starts or be custom-made. See parametric_starts() for details.

support

The support of the data. Must be compatible with the supplied x and the supplied start and kernel. Is used to find the normalization constant, see normalized.

na.rm

Logical; if TRUE, NAs will be removed from x.

normalized

Logical; if TRUE, the density is normalized.

tolerance

Numeric; the relative error to tolerate in normalization.

Details

The default values for bw, kernel, start, and support are interdependent, and are chosen to make sense. E.g., the default value for support when start = beta is c(0, 1).

The start argument defaults to uniform, which corresponds to ordinary kernel density estimation. The typical default value for kernel is gaussian.

If normalized is FALSE and start != "uniform", the resulting density will not integrate to 1 in general.

Value

kdensity returns an S3 function object of base::class() "kdensity". This is a callable function with the following elements, accessible by '$':

x

The data supplied in x.

⁠bw_str, bw, adjust, h⁠

The bandwidth function, the resulting bandwidth, the adjust argument, and the adjusted bandwidth.

⁠kernel_str, kernel, start, start_str, support⁠

Name of the kernel, the kernel object, name of the parametric start, the start object, and the support of the density.

⁠data.name, n, range, has.na, na.rm, normalized⁠

Name of the data, number of observations, the range of the data, whether the data x contained NA values, whether na.rm is TRUE or not, and whether the density is normalized.

call

The call to kdensity.

estimates

Named numeric vector containing the parameter estimates from the parametric start.

logLik

The log-likelihood of the parametric starts. Is NA for the uniform start.

References

Hjort, Nils Lid, and Ingrid K. Glad. "Nonparametric density estimation with a parametric start." The Annals of Statistics (1995): 882-904.

Jones, M. C., and D. A. Henderson. "Miscellanea kernel-type density estimation on the unit interval." Biometrika 94.4 (2007): 977-984.

Chen, Song Xi. "Probability density function estimation using gamma kernels." Annals of the Institute of Statistical Mathematics 52.3 (2000): 471-480.

Silverman, Bernard W. Density estimation for statistics and data analysis. Vol. 26. CRC press, 1986.

See Also

The stats package function stats::density().

Examples

## Use gamma kernels to model positive data, the concentration of
## theophylline

concentration <- Theoph$conc + 0.001
plot(kdensity(concentration, start = "gamma", kernel = "gamma", adjust = 1 / 3),
  ylim = c(0, 0.15), lwd = 2, main = "Concentration of theophylline"
)
lines(kdensity(concentration, start = "gamma", kernel = "gaussian"),
  lty = 2, col = "grey", lwd = 2
)
lines(kdensity(concentration, start = "gaussian", kernel = "gaussian"),
  lty = 3, col = "blue", lwd = 2
)
lines(kdensity(concentration, start = "gaussian", kernel = "gamma", adjust = 1 / 3),
  lty = 4, col = "red", lwd = 2
)
rug(concentration)

## Using a density and and estimator from another package.

skew_hyperbolic <- list(
  density   = SkewHyperbolic::dskewhyp,
  estimator = function(x) SkewHyperbolic::skewhypFit(x, printOut = FALSE)$param,
  support   = c(-Inf, Inf)
)

kde <- kdensity(diff(LakeHuron), start = skew_hyperbolic)
plot(kde,
  lwd = 2, col = "blue",
  main = "Annual differences in water level (ft) of Lake Huron, 1875 - 1972"
)
lines(kde, plot_start = TRUE, lty = 2, lwd = 2) # Plots the skew hyperbolic density.
rug(diff(LakeHuron))

kde$estimates # Also: coef(kde)
# Displays the parameter estimates:
#        mu     delta      beta        nu
# -1.140713  3.301112  2.551657 26.462469


Kernel functions

Description

Kernel functions are an important part of kdensity. This document lists the available built-in functions and the structure of them. Any kernel in the list can be used in kdensity by using kernel = "kernel" for the intended kernel.

Details

Be careful combining kernels with compact support with parametric starts, as the normalizing integral typically fails to converge. Use gaussian instead.

Symmetric kernels

⁠gaussian, normal⁠: The Gaussian kernel. The default argument when starts is supported on R. ⁠epanechnikov, rectangular (uniform), triangular, biweight, cosine, optcosine⁠: Standard symmetric kernels, also used in stats::density(). ⁠tricube, triweight⁠: Standard symmetric kernels. Not supported by stats::density(). laplace: Uses the Laplace density, also known as the double exponential density.

Asymmetric kernels

⁠gamma, gamma_biased⁠: The gamma kernel of Chen (2000). For use on the positive half-line. gamma is the recommended biased-corrected kernel. gcopula: The Gaussian copula kernel of Jones & Henderson (2007). For use on the unit interval. ⁠beta, beta_biased⁠: The beta kernel of Chen (1999). For use on the unit interval. beta is the recommended bias-corrected kernel.

Structure

A kernel is a list containing two mandatory elements and one optional element. The mandatory element 'kernel' is the kernel function. It takes arguments ⁠y, x, h⁠, where x is the data supplied to kdensity and y is the point of evaluation. h is the bandwidth. Internally, the kernel function is evaluated as 1/h*kernel(y, x, h). It should be vectorized in x, but vectorization in y is not needed.

The second mandatory element is support, stating the domain of definition for the kernel. This is used to distinguish kernels on the unit interval / positive half-line from kernels on R.

sd is used for symmetric kernels, and states the standard error of the kernel. This is used to make kernels comparable to the Gaussian kernel when calculating bandwidths.

References

Chen, Song Xi. "Probability density function estimation using gamma kernels." Annals of the Institute of Statistical Mathematics 52.3 (2000): 471-480. Jones, M. C., and D. A. Henderson. "Kernel-type density estimation on the unit interval." Biometrika 94.4 (2007): 977-984. Chen, Song Xi. "Beta kernel estimators for density functions." Computational Statistics & Data Analysis 31.2 (1999): 131-145.

See Also

kdensity(); parametric_starts(); bandwidths().

Examples

gaussian <- list(
  kernel = function(y, x, h) stats::dnorm((y - x) / h),
  sd = 1,
  support = c(-Inf, Inf)
)

gcopula <- list(
  kernel = function(y, x, h) {
    rho <- 1 - h^2
    inside <- rho^2 * (qnorm(y)^2 + qnorm(x)^2) - 2 * rho * qnorm(y) * qnorm(x)
    exp(-inside / (2 * (1 - rho^2)))
  },
  support = c(0, 1)
)


Merges two lists.

Description

Merges two lists.

Usage

listmerge(x, y, type = c("merge", "template"))

Arguments

x

A list of default arguments.

y

A list of supplied arguments

type

If merge, the list will be merge with y having priority; if template, named the elements of y not in x will be discarded after merging.

Value

A merged list where conflicts are solved in favour of y. Does not preserve ordering.


Parametric starts

Description

A parametric start is a density function with an associated estimator which is used as a starting point in kdensity. Several parametric starts are implemented, all with maximum likelihood estimation. Custom-made parametric starts are possible, see the Structure section.

Structure

The parametric start contains three elements: The density function, an estimation function, and the support of the density. The parameters of the density function must partially match the parameters of the estimator function. The estimator function takes one argument, a numeric vector, which is passed from kdensity.

Supported parametric starts

kdensity supports more than 20 built-in starts from the univariateML::univariateML package, see univariateML::univariateML_models for a list. Densities with variable support, power, are not supported. The pareto density has its support fixed to ⁠(1,Inf)⁠. The options ⁠uniform, constant⁠ makes kdensity estimate a kernel density without parametric starts.

See Also

kdensity(); kernels(); bandwidths()

Examples

start_exponential <- list(
  density = stats::dexp,
  estimator = function(data) {
    c(rate = 1 / mean(data))
  },
  support = c(0, Inf)
)

start_inverse_gaussian <- list(
  density = extraDistr::dwald,
  estimator = function(data) {
    c(
      mu = mean(data),
      lambda = mean(1 / data - 1 / mean(data))
    )
  },
  support = c(0, Inf)
)


Plot, Lines and Points Methods for Kernel Density Estimation

Description

The plot method for kdensity objects.

Usage

## S3 method for class 'kdensity'
plot(x, range = NULL, plot_start = FALSE, zero_line = TRUE, ...)

## S3 method for class 'kdensity'
lines(x, range = NULL, plot_start = FALSE, zero_line = TRUE, ...)

## S3 method for class 'kdensity'
points(x, range = NULL, plot_start = FALSE, zero_line = TRUE, ...)

Arguments

x

a kdensity object.

range

range of x values.

plot_start

logical; if TRUE, plots the parametric start instead of the kernel density estimate.

zero_line

logical; if TRUE, add a base line at y = 0.

...

further plotting parameters.

Value

None.

See Also

kdensity()

Examples

## Using the data set "precip" to eye-ball the similarity between
## a kernel fit, a parametric fit, and a kernel with parametric start fit.
kde_gamma <- kdensity(precip, kernel = "gaussian", start = "gamma")
kde <- kdensity(precip, kernel = "gaussian", start = "uniform")

plot(kde_gamma, main = "Annual Precipitation in US Cities")
lines(kde_gamma, plot_start = TRUE, lty = 2)
lines(kde, lty = 3)
rug(precip)

Helper function for the plot methods.

Description

A helper function for the plot methods that does most of the work under the hood.

Usage

plot_helper(
  x,
  range = NULL,
  plot_start = FALSE,
  zero_line = TRUE,
  ptype = c("plot", "lines", "points"),
  ...
)

Arguments

x

A kdensity object.

range

An optional range vector; like x in plot.default.

plot_start

Logical; if TRUE, plots the parametric start only.

zero_line

Logical; if TRUE, adds a line at y = 0.

ptype

The kind of plot to make

...

Passed to plot.default.

Value

None.


Recycles arguments.

Description

Recycles arguments.

Usage

recycle(..., prototype)

Arguments

...

A list of arguments to be recycled.

prototype

an optional argument. If given, repeats all arguments up to the length of the prototype. If an element of the list has the name, it is used. If not, the variable itself is used.

Details

Recycles arguments so that all vectors are equally long. If a prototype is given, each vector will have the same size as the prototype.


Checks compatibility between supports.

Description

The supplied support must never be larger than the support of the parametric start / kernel.

Usage

support_compatible(kernel, start, support)

Arguments

kernel, start, support

The kernel, start and support to check.

Value

None.

mirror server hosted at Truenetwork, Russian Federation.