Type: | Package |
Title: | Similarity and Distance Quantification Between Probability Functions |
Version: | 0.9.0 |
Date: | 2024-11-12 |
Maintainer: | Hajk-Georg Drost <hajk-georg.drost@tuebingen.mpg.de> |
Description: | Computes 46 optimized distance and similarity measures for comparing probability functions (Drost (2018) <doi:10.21105/joss.00765>). These comparisons between probability functions have their foundations in a broad range of scientific disciplines from mathematics to ecology. The aim of this package is to provide a core framework for clustering, classification, statistical inference, goodness-of-fit, non-parametric statistics, information theory, and machine learning tasks that are based on comparing univariate or multivariate probability functions. |
Depends: | R (≥ 3.1.2) |
Imports: | Rcpp, KernSmooth, poorman |
License: | GPL-2 |
LinkingTo: | Rcpp |
URL: | https://drostlab.github.io/philentropy/, https://github.com/drostlab/philentropy |
Suggests: | testthat, knitr, rmarkdown, microbenchmark |
VignetteBuilder: | knitr |
BugReports: | https://github.com/drostlab/philentropy/issues |
RoxygenNote: | 7.3.1 |
NeedsCompilation: | yes |
Packaged: | 2024-11-12 20:58:32 UTC; hdrost001 |
Author: | Hajk-Georg Drost |
Repository: | CRAN |
Date/Publication: | 2024-11-12 21:40:02 UTC |
Shannon's Conditional-Entropy H(X | Y)
Description
Compute Shannon's Conditional-Entropy based on the chain rule H(X | Y)
= H(X,Y) - H(Y)
based on a given joint-probability vector P(X,Y)
and
probability vector P(Y)
.
Usage
CE(xy, y, unit = "log2")
Arguments
xy |
a numeric joint-probability vector |
y |
a numeric probability vector |
unit |
a character string specifying the logarithm unit that shall be used to compute distances that depend on log computations. |
Details
This function might be useful to fastly compute Shannon's Conditional-Entropy for any given joint-probability vector and probability vector.
Value
Shannon's Conditional-Entropy in bit.
Note
Note that the probability vector P(Y) must be the probability
distribution of random variable Y ( P(Y) for which H(Y) is computed ) and
furthermore used for the chain rule computation of H(X | Y) = H(X,Y) -
H(Y)
.
Author(s)
Hajk-Georg Drost
References
Shannon, Claude E. 1948. "A Mathematical Theory of Communication". Bell System Technical Journal 27 (3): 379-423.
See Also
Examples
CE(1:10/sum(1:10),1:10/sum(1:10))
Shannon's Entropy H(X)
Description
Compute the Shannon's Entropy H(X) = - \sum P(X) * log2(P(X))
based on a
given probability vector P(X)
.
Usage
H(x, unit = "log2")
Arguments
x |
a numeric probability vector |
unit |
a character string specifying the logarithm unit that shall be used to compute distances that depend on log computations. |
Details
This function might be useful to fastly compute Shannon's Entropy for any given probability vector.
Value
a numeric value representing Shannon's Entropy in bit.
Author(s)
Hajk-Georg Drost
References
Shannon, Claude E. 1948. "A Mathematical Theory of Communication". Bell System Technical Journal 27 (3): 379-423.
See Also
Examples
H(1:10/sum(1:10))
Shannon's Joint-Entropy H(X,Y)
Description
This funciton computes Shannon's Joint-Entropy H(X,Y) = - \sum \sum P(X,Y) *
log2(P(X,Y))
based on a given joint-probability vector P(X,Y)
.
Usage
JE(x, unit = "log2")
Arguments
x |
a numeric joint-probability vector |
unit |
a character string specifying the logarithm unit that shall be used to compute distances that depend on log computations. |
Value
a numeric value representing Shannon's Joint-Entropy in bit.
Author(s)
Hajk-Georg Drost
References
Shannon, Claude E. 1948. "A Mathematical Theory of Communication". Bell System Technical Journal 27 (3): 379-423.
See Also
H
, CE
, KL
, JSD
, gJSD
, distance
Examples
JE(1:100/sum(1:100))
Jensen-Shannon Divergence
Description
This function computes a divergence matrix or divergence value based on the Jensen-Shannon Divergence with equal weights.
Please be aware that when aiming to compute the Jensen-Shannon Distance (rather than Divergence), you will need to apply the link{sqrt}
on the JSD()
output.
Usage
JSD(x, test.na = TRUE, unit = "log2", est.prob = NULL)
Arguments
x |
a numeric |
test.na |
a boolean value specifying whether input vectors shall be tested for NA values. |
unit |
a character string specifying the logarithm unit that shall be used to compute distances that depend on log computations. |
est.prob |
method to estimate probabilities from input count vectors such as non-probability vectors. Default:
|
Details
Function to compute the Jensen-Shannon Divergence JSD(P || Q) between two
probability distributions P and Q with equal weights \pi_1
=
\pi_2
= 1/2
.
The Jensen-Shannon Divergence JSD(P || Q) between two probability distributions P and Q is defined as:
JSD(P || Q) = 0.5 * (KL(P || R) + KL(Q || R))
where R = 0.5 * (P + Q)
denotes the mid-point of the probability
vectors P and Q, and KL(P || R), KL(Q || R) denote the Kullback-Leibler
Divergence of P and R, as well as Q and R.
General properties of the Jensen-Shannon Divergence:
-
1)
JSD is non-negative. -
2)
JSD is a symmetric measure JSD(P || Q) = JSD(Q || P). -
3)
JSD = 0, if and only if P = Q.
Value
a divergence value or matrix based on JSD computations.
Author(s)
Hajk-Georg Drost
References
Lin J. 1991. "Divergence Measures Based on the Shannon Entropy". IEEE Transactions on Information Theory. (33) 1: 145-151.
Endres M. and Schindelin J. E. 2003. "A new metric for probability distributions". IEEE Trans. on Info. Thy. (49) 3: 1858-1860.
See Also
Examples
# Jensen-Shannon Divergence between P and Q
P <- 1:10/sum(1:10)
Q <- 20:29/sum(20:29)
x <- rbind(P,Q)
JSD(x)
# Jensen-Shannon Divergence between P and Q using different log bases
JSD(x, unit = "log2") # Default
JSD(x, unit = "log")
JSD(x, unit = "log10")
# Jensen-Shannon Divergence Divergence between count vectors P.count and Q.count
P.count <- 1:10
Q.count <- 20:29
x.count <- rbind(P.count,Q.count)
JSD(x.count, est.prob = "empirical")
# Example: Divergence Matrix using JSD-Divergence
Prob <- rbind(1:10/sum(1:10), 20:29/sum(20:29), 30:39/sum(30:39))
# compute the KL matrix of a given probability matrix
JSDMatrix <- JSD(Prob)
# plot a heatmap of the corresponding JSD matrix
heatmap(JSDMatrix)
Kullback-Leibler Divergence
Description
This function computes the Kullback-Leibler divergence of two probability distributions P and Q.
Usage
KL(x, test.na = TRUE, unit = "log2", est.prob = NULL, epsilon = 1e-05)
Arguments
x |
a numeric |
test.na |
a boolean value indicating whether input vectors should be tested for NA values. |
unit |
a character string specifying the logarithm unit that shall be used to compute distances that depend on log computations. |
est.prob |
method to estimate probabilities from a count vector. Default: est.prob = NULL. |
epsilon |
a small value to address cases in the KL computation where division by zero occurs. In
these cases, x / 0 or 0 / 0 will be replaced by |
Details
KL(P||Q) = \sum P(P) * log2(P(P) / P(Q)) = H(P,Q) - H(P)
where H(P,Q) denotes the joint entropy of the probability distributions P and Q and H(P) denotes the entropy of probability distribution P. In case P = Q then KL(P,Q) = 0 and in case P != Q then KL(P,Q) > 0.
The KL divergence is a non-symmetric measure of the directed divergence between two probability distributions P and Q. It only fulfills the positivity property of a distance metric.
Because of the relation KL(P||Q) = H(P,Q) - H(P), the Kullback-Leibler divergence of two probability distributions P and Q is also named Cross Entropy of two probability distributions P and Q.
Value
The Kullback-Leibler divergence of probability vectors.
Author(s)
Hajk-Georg Drost
References
Cover Thomas M. and Thomas Joy A. 2006. Elements of Information Theory. John Wiley & Sons.
See Also
Examples
# Kulback-Leibler Divergence between P and Q
P <- 1:10/sum(1:10)
Q <- 20:29/sum(20:29)
x <- rbind(P,Q)
KL(x)
# Kulback-Leibler Divergence between P and Q using different log bases
KL(x, unit = "log2") # Default
KL(x, unit = "log")
KL(x, unit = "log10")
# Kulback-Leibler Divergence between count vectors P.count and Q.count
P.count <- 1:10
Q.count <- 20:29
x.count <- rbind(P.count,Q.count)
KL(x, est.prob = "empirical")
# Example: Distance Matrix using KL-Distance
Prob <- rbind(1:10/sum(1:10), 20:29/sum(20:29), 30:39/sum(30:39))
# compute the KL matrix of a given probability matrix
KLMatrix <- KL(Prob)
# plot a heatmap of the corresponding KL matrix
heatmap(KLMatrix)
Shannon's Mutual Information I(X,Y)
Description
Compute Shannon's Mutual Information based on the identity I(X,Y) =
H(X) + H(Y) - H(X,Y)
based on a given joint-probability vector P(X,Y)
and probability vectors P(X)
and P(Y)
.
Usage
MI(x, y, xy, unit = "log2")
Arguments
x |
a numeric probability vector |
y |
a numeric probability vector |
xy |
a numeric joint-probability vector |
unit |
a character string specifying the logarithm unit that shall be used to compute distances that depend on log computations. |
Details
This function might be useful to fastly compute Shannon's Mutual Information for any given joint-probability vector and probability vectors.
Value
Shannon's Mutual Information in bit.
Author(s)
Hajk-Georg Drost
References
Shannon, Claude E. 1948. "A Mathematical Theory of Communication". Bell System Technical Journal 27 (3): 379-423.
See Also
Examples
MI( x = 1:10/sum(1:10), y = 20:29/sum(20:29), xy = 1:10/sum(1:10) )
Additive symmetric chi-squared distance (lowlevel function)
Description
The lowlevel function for computing the additive_symm_chi_sq distance.
Usage
additive_symm_chi_sq(P, Q, testNA)
Arguments
P |
a numeric vector storing the first distribution. |
Q |
a numeric vector storing the second distribution. |
testNA |
a logical value indicating whether or not distributions shall be checked for |
Author(s)
Hajk-Georg Drost
Examples
additive_symm_chi_sq(P = 1:10/sum(1:10), Q = 20:29/sum(20:29), testNA = FALSE)
AVG distance (lowlevel function)
Description
The lowlevel function for computing the avg distance.
Usage
avg(P, Q, testNA)
Arguments
P |
a numeric vector storing the first distribution. |
Q |
a numeric vector storing the second distribution. |
testNA |
a logical value indicating whether or not distributions shall be checked for |
Author(s)
Hajk-Georg Drost
Examples
avg(P = 1:10/sum(1:10), Q = 20:29/sum(20:29), testNA = FALSE)
Bhattacharyya distance (lowlevel function)
Description
The lowlevel function for computing the bhattacharyya distance.
Usage
bhattacharyya(P, Q, testNA, unit, epsilon)
Arguments
P |
a numeric vector storing the first distribution. |
Q |
a numeric vector storing the second distribution. |
testNA |
a logical value indicating whether or not distributions shall be checked for |
unit |
type of |
epsilon |
epsilon a small value to address cases in the distance computation where division by zero occurs. In
these cases, x / 0 or 0 / 0 will be replaced by |
Author(s)
Hajk-Georg Drost
Examples
bhattacharyya(P = 1:10/sum(1:10), Q = 20:29/sum(20:29), testNA = FALSE,
unit = "log2", epsilon = 0.00001)
Kernel Density Estimation
Description
This function implements an interface to the kernel density estimation functions provided by the KernSmooth package.
Usage
binned.kernel.est(
data,
kernel = "normal",
bandwidth = NULL,
canonical = FALSE,
scalest = "minim",
level = 2L,
gridsize = 401L,
range.data = range(data),
truncate = TRUE
)
Arguments
data |
a numeric vector containing the sample on which the kernel density estimate is to be constructed. |
kernel |
character string specifying the smoothing kernel |
bandwidth |
the kernel bandwidth smoothing parameter. |
canonical |
a logical value indicating whether canonically scaled kernels should be used |
scalest |
estimate of scale.
|
level |
number of levels of functional estimation used in the plug-in rule. |
gridsize |
the number of equally-spaced points over which binning is performed to obtain kernel functional approximation. |
range.data |
vector containing the minimum and maximum values of |
truncate |
logical value indicating whether data with x values outside the range specified by |
Author(s)
Hajk-Georg Drost
References
Matt Wand (2015). KernSmooth: Functions for Kernel Smoothing Supporting Wand & Jones (1995). R package version 2.23-14.
Henry Deng and Hadley Wickham (2011). Density estimation in R. http://vita.had.co.nz/papers/density-estimation.pdf.
Canberra distance (lowlevel function)
Description
The lowlevel function for computing the canberra distance.
Usage
canberra(P, Q, testNA)
Arguments
P |
a numeric vector storing the first distribution. |
Q |
a numeric vector storing the second distribution. |
testNA |
a logical value indicating whether or not distributions shall be checked for |
Author(s)
Hajk-Georg Drost
Examples
canberra(P = 1:10/sum(1:10), Q = 20:29/sum(20:29), testNA = FALSE)
Chebyshev distance (lowlevel function)
Description
The lowlevel function for computing the chebyshev distance.
Usage
chebyshev(P, Q, testNA)
Arguments
P |
a numeric vector storing the first distribution. |
Q |
a numeric vector storing the second distribution. |
testNA |
a logical value indicating whether or not distributions shall be checked for |
Author(s)
Hajk-Georg Drost
Examples
chebyshev(P = 1:10/sum(1:10), Q = 20:29/sum(20:29), testNA = FALSE)
Clark squared distance (lowlevel function)
Description
The lowlevel function for computing the clark_sq distance.
Usage
clark_sq(P, Q, testNA)
Arguments
P |
a numeric vector storing the first distribution. |
Q |
a numeric vector storing the second distribution. |
testNA |
a logical value indicating whether or not distributions shall be checked for |
Author(s)
Hajk-Georg Drost
Examples
clark_sq(P = 1:10/sum(1:10), Q = 20:29/sum(20:29), testNA = FALSE)
Cosine distance (lowlevel function)
Description
The lowlevel function for computing the cosine_dist distance.
Usage
cosine_dist(P, Q, testNA)
Arguments
P |
a numeric vector storing the first distribution. |
Q |
a numeric vector storing the second distribution. |
testNA |
a logical value indicating whether or not distributions shall be checked for |
Author(s)
Hajk-Georg Drost
Examples
cosine_dist(P = 1:10/sum(1:10), Q = 20:29/sum(20:29), testNA = FALSE)
Czekanowski distance (lowlevel function)
Description
The lowlevel function for computing the czekanowski distance.
Usage
czekanowski(P, Q, testNA)
Arguments
P |
a numeric vector storing the first distribution. |
Q |
a numeric vector storing the second distribution. |
testNA |
a logical value indicating whether or not distributions shall be checked for |
Author(s)
Hajk-Georg Drost
Examples
czekanowski(P = 1:10/sum(1:10), Q = 20:29/sum(20:29), testNA = FALSE)
Dice distance (lowlevel function)
Description
The lowlevel function for computing the dice_dist distance.
Usage
dice_dist(P, Q, testNA)
Arguments
P |
a numeric vector storing the first distribution. |
Q |
a numeric vector storing the second distribution. |
testNA |
a logical value indicating whether or not distributions shall be checked for |
Author(s)
Hajk-Georg Drost
Examples
dice_dist(P = 1:10/sum(1:10), Q = 20:29/sum(20:29), testNA = FALSE)
Distance Diversity between Probability Density Functions
Description
This function computes all distance values between two probability density functions that are available in getDistMethods
and returns a vector storing the corresponding distance measures. This vector is named distance diversity vector.
Usage
dist.diversity(x, p, test.na = FALSE, unit = "log2")
Arguments
x |
a numeric |
p |
power of the Minkowski distance. |
test.na |
a boolean value indicating whether input vectors should be tested for NA values. Faster computations if |
unit |
a character string specifying the logarithm unit that should be used to compute distances that depend on log computations. Options are:
|
Author(s)
Hajk-Georg Drost
Examples
dist.diversity(rbind(1:10/sum(1:10), 20:29/sum(20:29)), p = 2, unit = "log2")
Distances and Similarities between Many Probability Density Functions
Description
This functions computes the distance/dissimilarity between two sets of probability density functions.
Usage
dist_many_many(
dists1,
dists2,
method,
p = NA_real_,
testNA = TRUE,
unit = "log",
epsilon = 1e-05
)
Arguments
dists1 |
a numeric matrix storing distributions in its rows. |
dists2 |
a numeric matrix storing distributions in its rows. |
method |
a character string indicating whether the distance measure that should be computed. |
p |
power of the Minkowski distance. |
testNA |
a logical value indicating whether or not distributions shall be checked for |
unit |
type of
|
epsilon |
epsilon a small value to address cases in the distance computation where division by zero occurs. In
these cases, x / 0 or 0 / 0 will be replaced by |
Value
A matrix of distance values
Examples
set.seed(2020-08-20)
M1 <- t(replicate(10, sample(1:10, size = 10) / 55))
M2 <- t(replicate(10, sample(1:10, size = 10) / 55))
result <- dist_many_many(M1, M2, method = "euclidean", testNA = FALSE)
Distances and Similarities between One and Many Probability Density Functions
Description
This functions computes the distance/dissimilarity between one probability density functions and a set of probability density functions.
Usage
dist_one_many(
P,
dists,
method,
p = NA_real_,
testNA = TRUE,
unit = "log",
epsilon = 1e-05
)
Arguments
P |
a numeric vector storing the first distribution. |
dists |
a numeric matrix storing distributions in its rows. |
method |
a character string indicating whether the distance measure that should be computed. |
p |
power of the Minkowski distance. |
testNA |
a logical value indicating whether or not distributions shall be checked for |
unit |
type of
|
epsilon |
epsilon a small value to address cases in the distance computation where division by zero occurs. In
these cases, x / 0 or 0 / 0 will be replaced by |
Value
A vector of distance values
Examples
set.seed(2020-08-20)
P <- 1:10 / sum(1:10)
M <- t(replicate(100, sample(1:10, size = 10) / 55))
dist_one_many(P, M, method = "euclidean", testNA = FALSE)
Distances and Similarities between Two Probability Density Functions
Description
This functions computes the distance/dissimilarity between two probability density functions.
Usage
dist_one_one(
P,
Q,
method,
p = NA_real_,
testNA = TRUE,
unit = "log",
epsilon = 1e-05
)
Arguments
P |
a numeric vector storing the first distribution. |
Q |
a numeric vector storing the second distribution. |
method |
a character string indicating whether the distance measure that should be computed. |
p |
power of the Minkowski distance. |
testNA |
a logical value indicating whether or not distributions shall be checked for |
unit |
type of
|
epsilon |
epsilon a small value to address cases in the distance computation where division by zero occurs. In
these cases, x / 0 or 0 / 0 will be replaced by |
Value
A single distance value
Examples
P <- 1:10 / sum(1:10)
Q <- 20:29 / sum(20:29)
dist_one_one(P, Q, method = "euclidean", testNA = FALSE)
Distances and Similarities between Probability Density Functions
Description
This functions computes the distance/dissimilarity between two probability density functions.
Usage
distance(
x,
method = "euclidean",
p = NULL,
test.na = TRUE,
unit = "log",
epsilon = 1e-05,
est.prob = NULL,
use.row.names = FALSE,
as.dist.obj = FALSE,
diag = FALSE,
upper = FALSE,
mute.message = FALSE
)
Arguments
x |
a numeric |
method |
a character string indicating whether the distance measure that should be computed. |
p |
power of the Minkowski distance. |
test.na |
a boolean value indicating whether input vectors should be tested for |
unit |
a character string specifying the logarithm unit that should be used to compute distances that depend on log computations. |
epsilon |
a small value to address cases in the distance computation where division by zero occurs. In
these cases, x / 0 or 0 / 0 will be replaced by |
est.prob |
method to estimate probabilities from input count vectors such as non-probability vectors. Default:
|
use.row.names |
a logical value indicating whether or not row names from
the input matrix shall be used as rownames and colnames of the output distance matrix. Default value is |
as.dist.obj |
shall the return value or matrix be an object of class |
diag |
if |
upper |
if |
mute.message |
a logical value indicating whether or not messages printed by |
Details
Here a distance is defined as a quantitative degree of how far two mathematical objects are apart from eachother (Cha, 2007).
This function implements the following distance/similarity measures to quantify the distance between probability density functions:
L_p Minkowski family
Euclidean :
d = sqrt( \sum | P_i - Q_i |^2)
Manhattan :
d = \sum | P_i - Q_i |
Minkowski :
d = ( \sum | P_i - Q_i |^p)^1/p
Chebyshev :
d = max | P_i - Q_i |
L_1 family
Sorensen :
d = \sum | P_i - Q_i | / \sum (P_i + Q_i)
Gower :
d = 1/d * \sum | P_i - Q_i |
Soergel :
d = \sum | P_i - Q_i | / \sum max(P_i , Q_i)
Kulczynski d :
d = \sum | P_i - Q_i | / \sum min(P_i , Q_i)
Canberra :
d = \sum | P_i - Q_i | / (P_i + Q_i)
Lorentzian :
d = \sum ln(1 + | P_i - Q_i |)
Intersection family
Intersection :
s = \sum min(P_i , Q_i)
Non-Intersection :
d = 1 - \sum min(P_i , Q_i)
Wave Hedges :
d = \sum | P_i - Q_i | / max(P_i , Q_i)
Czekanowski :
d = \sum | P_i - Q_i | / \sum | P_i + Q_i |
Motyka :
d = \sum min(P_i , Q_i) / (P_i + Q_i)
Kulczynski s :
d = 1 / \sum | P_i - Q_i | / \sum min(P_i , Q_i)
Tanimoto :
d = \sum (max(P_i , Q_i) - min(P_i , Q_i)) / \sum max(P_i , Q_i)
; equivalent to SoergelRuzicka :
s = \sum min(P_i , Q_i) / \sum max(P_i , Q_i)
; equivalent to 1 - Tanimoto = 1 - Soergel
Inner Product family
Inner Product :
s = \sum P_i * Q_i
Harmonic mean :
s = 2 * \sum (P_i * Q_i) / (P_i + Q_i)
Cosine :
s = \sum (P_i * Q_i) / sqrt(\sum P_i^2) * sqrt(\sum Q_i^2)
Kumar-Hassebrook (PCE) :
s = \sum (P_i * Q_i) / (\sum P_i^2 + \sum Q_i^2 - \sum (P_i * Q_i))
Jaccard :
d = 1 - \sum (P_i * Q_i) / (\sum P_i^2 + \sum Q_i^2 - \sum (P_i * Q_i))
; equivalent to 1 - Kumar-HassebrookDice :
d = \sum (P_i - Q_i)^2 / (\sum P_i^2 + \sum Q_i^2)
Squared-chord family
Fidelity :
s = \sum sqrt(P_i * Q_i)
Bhattacharyya :
d = - ln \sum sqrt(P_i * Q_i)
Hellinger :
d = 2 * sqrt( 1 - \sum sqrt(P_i * Q_i))
Matusita :
d = sqrt( 2 - 2 * \sum sqrt(P_i * Q_i))
Squared-chord :
d = \sum ( sqrt(P_i) - sqrt(Q_i) )^2
Squared L_2 family (
X
^2 squared family)Squared Euclidean :
d = \sum ( P_i - Q_i )^2
Pearson
X
^2 :d = \sum ( (P_i - Q_i )^2 / Q_i )
Neyman
X
^2 :d = \sum ( (P_i - Q_i )^2 / P_i )
Squared
X
^2 :d = \sum ( (P_i - Q_i )^2 / (P_i + Q_i) )
Probabilistic Symmetric
X
^2 :d = 2 * \sum ( (P_i - Q_i )^2 / (P_i + Q_i) )
Divergence :
X
^2 :d = 2 * \sum ( (P_i - Q_i )^2 / (P_i + Q_i)^2 )
Clark :
d = sqrt ( \sum (| P_i - Q_i | / (P_i + Q_i))^2 )
Additive Symmetric
X
^2 :d = \sum ( ((P_i - Q_i)^2 * (P_i + Q_i)) / (P_i * Q_i) )
Shannon's entropy family
Kullback-Leibler :
d = \sum P_i * log(P_i / Q_i)
Jeffreys :
d = \sum (P_i - Q_i) * log(P_i / Q_i)
K divergence :
d = \sum P_i * log(2 * P_i / P_i + Q_i)
Topsoe :
d = \sum ( P_i * log(2 * P_i / P_i + Q_i) ) + ( Q_i * log(2 * Q_i / P_i + Q_i) )
Jensen-Shannon :
d = 0.5 * ( \sum P_i * log(2 * P_i / P_i + Q_i) + \sum Q_i * log(2 * Q_i / P_i + Q_i))
Jensen difference :
d = \sum ( (P_i * log(P_i) + Q_i * log(Q_i) / 2) - (P_i + Q_i / 2) * log(P_i + Q_i / 2) )
Combinations
Taneja :
d = \sum ( P_i + Q_i / 2) * log( P_i + Q_i / ( 2 * sqrt( P_i * Q_i)) )
Kumar-Johnson :
d = \sum (P_i^2 - Q_i^2)^2 / 2 * (P_i * Q_i)^1.5
Avg(L_1, L_n) :
d = \sum | P_i - Q_i| + max{ | P_i - Q_i |} / 2
In cases where
x
specifies a count matrix, the argumentest.prob
can be selected to first estimate probability vectors from input count vectors and second compute the corresponding distance measure based on the estimated probability vectors.The following probability estimation methods are implemented in this function:
-
est.prob = "empirical"
: relative frequencies of counts.
Value
The following results are returned depending on the dimension of x
:
in case
nrow(x)
= 2 : a single distance value.in case
nrow(x)
> 2 : a distancematrix
storing distance values for all pairwise probability vector comparisons.
Note
According to the reference in some distance measure computations invalid computations can occur when dealing with 0 probabilities.
In these cases the convention is treated as follows:
division by zero - case
0/0
: when the divisor and dividend become zero,0/0
is treated as0
.division by zero - case
n/0
: when only the divisor becomes0
, the corresponsning0
is replaced by a small\epsilon = 0.00001
.log of zero - case
0 * log(0)
: is treated as0
.log of zero - case
log(0)
: zero is replaced by a small\epsilon = 0.00001
.
Author(s)
Hajk-Georg Drost
References
Sung-Hyuk Cha. (2007). Comprehensive Survey on Distance/Similarity Measures between Probability Density Functions. International Journal of Mathematical Models and Methods in Applied Sciences 4: 1.
See Also
getDistMethods
, estimate.probability
, dist.diversity
Examples
# Simple Examples
# receive a list of implemented probability distance measures
getDistMethods()
## compute the euclidean distance between two probability vectors
distance(rbind(1:10/sum(1:10), 20:29/sum(20:29)), method = "euclidean")
## compute the euclidean distance between all pairwise comparisons of probability vectors
ProbMatrix <- rbind(1:10/sum(1:10), 20:29/sum(20:29),30:39/sum(30:39))
distance(ProbMatrix, method = "euclidean")
# compute distance matrix without testing for NA values in the input matrix
distance(ProbMatrix, method = "euclidean", test.na = FALSE)
# alternatively use the colnames of the input data for the rownames and colnames
# of the output distance matrix
ProbMatrix <- rbind(1:10/sum(1:10), 20:29/sum(20:29),30:39/sum(30:39))
rownames(ProbMatrix) <- paste0("Example", 1:3)
distance(ProbMatrix, method = "euclidean", use.row.names = TRUE)
# Specialized Examples
CountMatrix <- rbind(1:10, 20:29, 30:39)
## estimate probabilities from a count matrix
distance(CountMatrix, method = "euclidean", est.prob = "empirical")
## compute the euclidean distance for count data
## NOTE: some distance measures are only defined for probability values,
distance(CountMatrix, method = "euclidean")
## compute the Kullback-Leibler Divergence with different logarithm bases:
### case: unit = log (Default)
distance(ProbMatrix, method = "kullback-leibler", unit = "log")
### case: unit = log2
distance(ProbMatrix, method = "kullback-leibler", unit = "log2")
### case: unit = log10
distance(ProbMatrix, method = "kullback-leibler", unit = "log10")
Divergence squared distance (lowlevel function)
Description
The lowlevel function for computing the divergence_sq distance.
Usage
divergence_sq(P, Q, testNA)
Arguments
P |
a numeric vector storing the first distribution. |
Q |
a numeric vector storing the second distribution. |
testNA |
a logical value indicating whether or not distributions shall be checked for |
Author(s)
Hajk-Georg Drost
Examples
divergence_sq(P = 1:10/sum(1:10), Q = 20:29/sum(20:29), testNA = FALSE)
Estimate Probability Vectors From Count Vectors
Description
This function takes a numeric count vector and returns estimated probabilities of the corresponding counts.
The following probability estimation methods are implemented in this function:
-
method = "empirical"
: generates the relative frequency of the datax/sum(x)
.
Usage
estimate.probability(x, method = "empirical")
Arguments
x |
a numeric vector storing count values. |
method |
a character string specifying the estimation method tht should be used to estimate probabilities from input counts. |
Value
a numeric probability vector.
Author(s)
Hajk-Georg Drost
Examples
# generate a count vector
x <- runif(100)
# generate a probability vector from corresponding counts
x.prob <- estimate.probability(x, method = 'empirical')
Euclidean distance (lowlevel function)
Description
The lowlevel function for computing the euclidean distance.
Usage
euclidean(P, Q, testNA)
Arguments
P |
a numeric vector storing the first distribution. |
Q |
a numeric vector storing the second distribution. |
testNA |
a logical value indicating whether or not distributions shall be checked for |
Author(s)
Hajk-Georg Drost
Examples
euclidean(P = 1:10/sum(1:10), Q = 20:29/sum(20:29), testNA = FALSE)
Fidelity distance (lowlevel function)
Description
The lowlevel function for computing the fidelity distance.
Usage
fidelity(P, Q, testNA)
Arguments
P |
a numeric vector storing the first distribution. |
Q |
a numeric vector storing the second distribution. |
testNA |
a logical value indicating whether or not distributions shall be checked for |
Author(s)
Hajk-Georg Drost
Examples
fidelity(P = 1:10/sum(1:10), Q = 20:29/sum(20:29), testNA = FALSE)
Generalized Jensen-Shannon Divergence
Description
This function computes the Generalized Jensen-Shannon Divergence of a probability matrix.
Usage
gJSD(x, unit = "log2", weights = NULL, est.prob = NULL)
Arguments
x |
a probability matrix. |
unit |
a character string specifying the logarithm unit that shall be used to compute distances that depend on log computations. |
weights |
a numeric vector specifying the weights for each distribution in |
est.prob |
method to estimate probabilities from input count vectors such as non-probability vectors. Default:
|
Details
Function to compute the Generalized Jensen-Shannon Divergence
JSD_{\pi_1,...,\pi_n}(P_1, ..., P_n) = H(\sum_{i = 1}^n \pi_i * P_i) - \sum_{i = 1}^n \pi_i*H(P_i)
where \pi_1,...,\pi_n
denote the weights selected for the probability vectors P_1,...,P_n
and H(P_i)
denotes the Shannon Entropy of probability vector P_i
.
Value
The Jensen-Shannon divergence between all possible combinations of comparisons.
Author(s)
Hajk-Georg Drost
See Also
Examples
# define input probability matrix
Prob <- rbind(1:10/sum(1:10), 20:29/sum(20:29), 30:39/sum(30:39))
# compute the Generalized JSD comparing the PS probability matrix
gJSD(Prob)
# Generalized Jensen-Shannon Divergence between three vectors using different log bases
gJSD(Prob, unit = "log2") # Default
gJSD(Prob, unit = "log")
gJSD(Prob, unit = "log10")
# Jensen-Shannon Divergence Divergence between count vectors P.count and Q.count
P.count <- 1:10
Q.count <- 20:29
R.count <- 30:39
x.count <- rbind(P.count, Q.count, R.count)
gJSD(x.count, est.prob = "empirical")
Get method names for distance
Description
This function returns the names of the methods that can be applied
to compute distances between probability density functions using the distance
function.
Usage
getDistMethods()
Author(s)
Hajk-Georg Drost
Examples
getDistMethods()
Gower distance (lowlevel function)
Description
The lowlevel function for computing the gower distance.
Usage
gower(P, Q, testNA)
Arguments
P |
a numeric vector storing the first distribution. |
Q |
a numeric vector storing the second distribution. |
testNA |
a logical value indicating whether or not distributions shall be checked for |
Author(s)
Hajk-Georg Drost
Examples
gower(P = 1:10/sum(1:10), Q = 20:29/sum(20:29), testNA = FALSE)
Harmonic mean distance (lowlevel function)
Description
The lowlevel function for computing the harmonic_mean_dist distance.
Usage
harmonic_mean_dist(P, Q, testNA)
Arguments
P |
a numeric vector storing the first distribution. |
Q |
a numeric vector storing the second distribution. |
testNA |
a logical value indicating whether or not distributions shall be checked for |
Author(s)
Hajk-Georg Drost
Examples
harmonic_mean_dist(P = 1:10/sum(1:10), Q = 20:29/sum(20:29), testNA = FALSE)
Hellinger distance (lowlevel function)
Description
The lowlevel function for computing the hellinger distance.
Usage
hellinger(P, Q, testNA)
Arguments
P |
a numeric vector storing the first distribution. |
Q |
a numeric vector storing the second distribution. |
testNA |
a logical value indicating whether or not distributions shall be checked for |
Author(s)
Hajk-Georg Drost
Examples
hellinger(P = 1:10/sum(1:10), Q = 20:29/sum(20:29), testNA = FALSE)
Inner product distance (lowlevel function)
Description
The lowlevel function for computing the inner_product distance.
Usage
inner_product(P, Q, testNA)
Arguments
P |
a numeric vector storing the first distribution. |
Q |
a numeric vector storing the second distribution. |
testNA |
a logical value indicating whether or not distributions shall be checked for |
Author(s)
Hajk-Georg Drost
Examples
inner_product(P = 1:10/sum(1:10), Q = 20:29/sum(20:29), testNA = FALSE)
Intersection distance (lowlevel function)
Description
The lowlevel function for computing the intersection_dist distance.
Usage
intersection_dist(P, Q, testNA)
Arguments
P |
a numeric vector storing the first distribution. |
Q |
a numeric vector storing the second distribution. |
testNA |
a logical value indicating whether or not distributions shall be checked for |
Author(s)
Hajk-Georg Drost
Examples
intersection_dist(P = 1:10/sum(1:10), Q = 20:29/sum(20:29), testNA = FALSE)
Jaccard distance (lowlevel function)
Description
The lowlevel function for computing the jaccard distance.
Usage
jaccard(P, Q, testNA)
Arguments
P |
a numeric vector storing the first distribution. |
Q |
a numeric vector storing the second distribution. |
testNA |
a logical value indicating whether or not distributions shall be checked for |
Author(s)
Hajk-Georg Drost
Examples
jaccard(P = 1:10/sum(1:10), Q = 20:29/sum(20:29), testNA = FALSE)
Jeffreys distance (lowlevel function)
Description
The lowlevel function for computing the jeffreys distance.
Usage
jeffreys(P, Q, testNA, unit, epsilon)
Arguments
P |
a numeric vector storing the first distribution. |
Q |
a numeric vector storing the second distribution. |
testNA |
a logical value indicating whether or not distributions shall be checked for |
unit |
type of
|
epsilon |
epsilon a small value to address cases in the distance computation where division by zero occurs. In
these cases, x / 0 or 0 / 0 will be replaced by |
Author(s)
Hajk-Georg Drost
Examples
jeffreys(P = 1:10/sum(1:10), Q = 20:29/sum(20:29), testNA = FALSE,
unit = "log2", epsilon = 0.00001)
Jensen difference (lowlevel function)
Description
The lowlevel function for computing the jensen_difference distance.
Usage
jensen_difference(P, Q, testNA, unit)
Arguments
P |
a numeric vector storing the first distribution. |
Q |
a numeric vector storing the second distribution. |
testNA |
a logical value indicating whether or not distributions shall be checked for |
unit |
type of
|
Author(s)
Hajk-Georg Drost
Examples
jensen_difference(P = 1:10/sum(1:10), Q = 20:29/sum(20:29), testNA = FALSE, unit = "log2")
Jensen-Shannon distance (lowlevel function)
Description
The lowlevel function for computing the jensen_shannon distance.
Usage
jensen_shannon(P, Q, testNA, unit)
Arguments
P |
a numeric vector storing the first distribution. |
Q |
a numeric vector storing the second distribution. |
testNA |
a logical value indicating whether or not distributions shall be checked for |
unit |
type of
|
Author(s)
Hajk-Georg Drost
Examples
jensen_shannon(P = 1:10/sum(1:10), Q = 20:29/sum(20:29), testNA = FALSE, unit = "log2")
K-Divergence (lowlevel function)
Description
The lowlevel function for computing the k_divergence distance.
Usage
k_divergence(P, Q, testNA, unit)
Arguments
P |
a numeric vector storing the first distribution. |
Q |
a numeric vector storing the second distribution. |
testNA |
a logical value indicating whether or not distributions shall be checked for |
unit |
type of
|
Author(s)
Hajk-Georg Drost
Examples
k_divergence(P = 1:10/sum(1:10), Q = 20:29/sum(20:29), testNA = FALSE, unit = "log2")
Kulczynski_d distance (lowlevel function)
Description
The lowlevel function for computing the kulczynski_d distance.
Usage
kulczynski_d(P, Q, testNA, epsilon)
Arguments
P |
a numeric vector storing the first distribution. |
Q |
a numeric vector storing the second distribution. |
testNA |
a logical value indicating whether or not distributions shall be checked for |
epsilon |
epsilon a small value to address cases in the distance computation where division by zero occurs. In
these cases, x / 0 or 0 / 0 will be replaced by |
Author(s)
Hajk-Georg Drost
Examples
kulczynski_d(P = 1:10/sum(1:10), Q = 20:29/sum(20:29),
testNA = FALSE, epsilon = 0.00001)
kullback-Leibler distance (lowlevel function)
Description
The lowlevel function for computing the kullback_leibler_distance distance.
Usage
kullback_leibler_distance(P, Q, testNA, unit, epsilon)
Arguments
P |
a numeric vector storing the first distribution. |
Q |
a numeric vector storing the second distribution. |
testNA |
a logical value indicating whether or not distributions shall be checked for |
unit |
type of
|
epsilon |
epsilon a small value to address cases in the distance computation where division by zero occurs. In
these cases, x / 0 or 0 / 0 will be replaced by |
Author(s)
Hajk-Georg Drost
Examples
kullback_leibler_distance(P = 1:10/sum(1:10), Q = 20:29/sum(20:29), testNA = FALSE,
unit = "log2", epsilon = 0.00001)
Kumar hassebrook distance (lowlevel function)
Description
The lowlevel function for computing the kumar_hassebrook distance.
Usage
kumar_hassebrook(P, Q, testNA)
Arguments
P |
a numeric vector storing the first distribution. |
Q |
a numeric vector storing the second distribution. |
testNA |
a logical value indicating whether or not distributions shall be checked for |
Author(s)
Hajk-Georg Drost
Examples
kumar_hassebrook(P = 1:10/sum(1:10), Q = 20:29/sum(20:29), testNA = FALSE)
Kumar-Johnson distance (lowlevel function)
Description
The lowlevel function for computing the kumar_johnson distance.
Usage
kumar_johnson(P, Q, testNA, epsilon)
Arguments
P |
a numeric vector storing the first distribution. |
Q |
a numeric vector storing the second distribution. |
testNA |
a logical value indicating whether or not distributions shall be checked for |
epsilon |
epsilon a small value to address cases in the distance computation where division by zero occurs. In
these cases, x / 0 or 0 / 0 will be replaced by |
Author(s)
Hajk-Georg Drost
Examples
kumar_johnson(P = 1:10/sum(1:10), Q = 20:29/sum(20:29),
testNA = FALSE, epsilon = 0.00001)
Linear Correlation
Description
This function computed the linear correlation between two vectors or a correlation matrix for an input matrix.
The following methods to compute linear correlations are implemented in this function:
Usage
lin.cor(x, y = NULL, method = "pearson", test.na = FALSE)
Arguments
x |
a numeric |
y |
a numeric |
method |
the method to compute the linear correlation between |
test.na |
a boolean value indicating whether input data should be checked for |
Details
-
method = "pearson"
: Pearson's correlation coefficient (centred). -
method = "pearson2"
: Pearson's uncentred correlation coefficient. -
method = "sq_pearson"
. Squared Pearson's correlation coefficient. -
method = "kendall"
: Kendall's correlation coefficient. -
method = "spearman"
: Spearman's correlation coefficient.
Further Details:
-
Pearson's correlation coefficient (centred) :
Author(s)
Hajk-Georg Drost
Lorentzian distance (lowlevel function)
Description
The low-level function for computing the lorentzian distance.
Usage
lorentzian(P, Q, testNA, unit)
Arguments
P |
a numeric vector storing the first distribution. |
Q |
a numeric vector storing the second distribution. |
testNA |
a logical value indicating whether or not distributions shall be checked for |
unit |
type of
|
Author(s)
Hajk-Georg Drost
Examples
lorentzian(P = 1:10/sum(1:10), Q = 20:29/sum(20:29), testNA = FALSE, unit = "log2")
Manhattan distance (lowlevel function)
Description
The lowlevel function for computing the manhattan distance.
Usage
manhattan(P, Q, testNA)
Arguments
P |
a numeric vector storing the first distribution. |
Q |
a numeric vector storing the second distribution. |
testNA |
a logical value indicating whether or not distributions shall be checked for |
Author(s)
Hajk-Georg Drost
Examples
manhattan(P = 1:10/sum(1:10), Q = 20:29/sum(20:29), testNA = FALSE)
Matusita distance (lowlevel function)
Description
The lowlevel function for computing the matusita distance.
Usage
matusita(P, Q, testNA)
Arguments
P |
a numeric vector storing the first distribution. |
Q |
a numeric vector storing the second distribution. |
testNA |
a logical value indicating whether or not distributions shall be checked for |
Author(s)
Hajk-Georg Drost
Examples
matusita(P = 1:10/sum(1:10), Q = 20:29/sum(20:29), testNA = FALSE)
Minkowski distance (lowlevel function)
Description
The lowlevel function for computing the minkowski distance.
Usage
minkowski(P, Q, n, testNA)
Arguments
P |
a numeric vector storing the first distribution. |
Q |
a numeric vector storing the second distribution. |
n |
index for the minkowski exponent. |
testNA |
a logical value indicating whether or not distributions shall be checked for |
Author(s)
Hajk-Georg Drost
Examples
minkowski(P = 1:10/sum(1:10), Q = 20:29/sum(20:29), n = 2, testNA = FALSE)
Motyka distance (lowlevel function)
Description
The lowlevel function for computing the motyka distance.
Usage
motyka(P, Q, testNA)
Arguments
P |
a numeric vector storing the first distribution. |
Q |
a numeric vector storing the second distribution. |
testNA |
a logical value indicating whether or not distributions shall be checked for |
Author(s)
Hajk-Georg Drost
Examples
motyka(P = 1:10/sum(1:10), Q = 20:29/sum(20:29), testNA = FALSE)
Neyman chi-squared distance (lowlevel function)
Description
The lowlevel function for computing the neyman_chi_sq distance.
Usage
neyman_chi_sq(P, Q, testNA, epsilon)
Arguments
P |
a numeric vector storing the first distribution. |
Q |
a numeric vector storing the second distribution. |
testNA |
a logical value indicating whether or not distributions shall be checked for |
epsilon |
epsilon a small value to address cases in the distance computation where division by zero occurs. In
these cases, x / 0 or 0 / 0 will be replaced by |
Author(s)
Hajk-Georg Drost
Examples
neyman_chi_sq(P = 1:10/sum(1:10), Q = 20:29/sum(20:29),
testNA = FALSE, epsilon = 0.00001)
Pearson chi-squared distance (lowlevel function)
Description
The lowlevel function for computing the pearson_chi_sq distance.
Usage
pearson_chi_sq(P, Q, testNA, epsilon)
Arguments
P |
a numeric vector storing the first distribution. |
Q |
a numeric vector storing the second distribution. |
testNA |
a logical value indicating whether or not distributions shall be checked for |
epsilon |
epsilon a small value to address cases in the distance computation where division by zero occurs. In
these cases, x / 0 or 0 / 0 will be replaced by |
Author(s)
Hajk-Georg Drost
Examples
pearson_chi_sq(P = 1:10/sum(1:10), Q = 20:29/sum(20:29),
testNA = FALSE, epsilon = 0.00001)
Probability symmetric chi-squared distance (lowlevel function)
Description
The lowlevel function for computing the prob_symm_chi_sq distance.
Usage
prob_symm_chi_sq(P, Q, testNA)
Arguments
P |
a numeric vector storing the first distribution. |
Q |
a numeric vector storing the second distribution. |
testNA |
a logical value indicating whether or not distributions shall be checked for |
Author(s)
Hajk-Georg Drost
Examples
prob_symm_chi_sq(P = 1:10/sum(1:10), Q = 20:29/sum(20:29), testNA = FALSE)
Ruzicka distance (lowlevel function)
Description
The lowlevel function for computing the ruzicka distance.
Usage
ruzicka(P, Q, testNA)
Arguments
P |
a numeric vector storing the first distribution. |
Q |
a numeric vector storing the second distribution. |
testNA |
a logical value indicating whether or not distributions shall be checked for |
Author(s)
Hajk-Georg Drost
Examples
ruzicka(P = 1:10/sum(1:10), Q = 20:29/sum(20:29), testNA = FALSE)
Soergel distance (lowlevel function)
Description
The lowlevel function for computing the soergel distance.
Usage
soergel(P, Q, testNA)
Arguments
P |
a numeric vector storing the first distribution. |
Q |
a numeric vector storing the second distribution. |
testNA |
a logical value indicating whether or not distributions shall be checked for |
Author(s)
Hajk-Georg Drost
Examples
soergel(P = 1:10/sum(1:10), Q = 20:29/sum(20:29), testNA = FALSE)
Sorensen distance (lowlevel function)
Description
The lowlevel function for computing the sorensen distance.
Usage
sorensen(P, Q, testNA)
Arguments
P |
a numeric vector storing the first distribution. |
Q |
a numeric vector storing the second distribution. |
testNA |
a logical value indicating whether or not distributions shall be checked for |
Author(s)
Hajk-Georg Drost
Examples
sorensen(P = 1:10/sum(1:10), Q = 20:29/sum(20:29), testNA = FALSE)
Squared chi-squared distance (lowlevel function)
Description
The lowlevel function for computing the squared_chi_sq distance.
Usage
squared_chi_sq(P, Q, testNA)
Arguments
P |
a numeric vector storing the first distribution. |
Q |
a numeric vector storing the second distribution. |
testNA |
a logical value indicating whether or not distributions shall be checked for |
Author(s)
Hajk-Georg Drost
Examples
squared_chi_sq(P = 1:10/sum(1:10), Q = 20:29/sum(20:29), testNA = FALSE)
Squared chord distance (lowlevel function)
Description
The lowlevel function for computing the squared_chord distance.
Usage
squared_chord(P, Q, testNA)
Arguments
P |
a numeric vector storing the first distribution. |
Q |
a numeric vector storing the second distribution. |
testNA |
a logical value indicating whether or not distributions shall be checked for |
Author(s)
Hajk-Georg Drost
Examples
squared_chord(P = 1:10/sum(1:10), Q = 20:29/sum(20:29), testNA = FALSE)
Squared euclidean distance (lowlevel function)
Description
The lowlevel function for computing the squared_euclidean distance.
Usage
squared_euclidean(P, Q, testNA)
Arguments
P |
a numeric vector storing the first distribution. |
Q |
a numeric vector storing the second distribution. |
testNA |
a logical value indicating whether or not distributions shall be checked for |
Author(s)
Hajk-Georg Drost
Examples
squared_euclidean(P = 1:10/sum(1:10), Q = 20:29/sum(20:29), testNA = FALSE)
Taneja difference (lowlevel function)
Description
The lowlevel function for computing the taneja distance.
Usage
taneja(P, Q, testNA, unit, epsilon)
Arguments
P |
a numeric vector storing the first distribution. |
Q |
a numeric vector storing the second distribution. |
testNA |
a logical value indicating whether or not distributions shall be checked for |
unit |
type of
|
epsilon |
epsilon a small value to address cases in the distance computation where division by zero occurs. In
these cases, x / 0 or 0 / 0 will be replaced by |
Author(s)
Hajk-Georg Drost
Examples
taneja(P = 1:10/sum(1:10), Q = 20:29/sum(20:29), testNA = FALSE,
unit = "log2", epsilon = 0.00001)
Tanimoto distance (lowlevel function)
Description
The lowlevel function for computing the tanimoto distance.
Usage
tanimoto(P, Q, testNA)
Arguments
P |
a numeric vector storing the first distribution. |
Q |
a numeric vector storing the second distribution. |
testNA |
a logical value indicating whether or not distributions shall be checked for |
Author(s)
Hajk-Georg Drost
Examples
tanimoto(P = 1:10/sum(1:10), Q = 20:29/sum(20:29), testNA = FALSE)
Topsoe distance (lowlevel function)
Description
The lowlevel function for computing the topsoe distance.
Usage
topsoe(P, Q, testNA, unit)
Arguments
P |
a numeric vector storing the first distribution. |
Q |
a numeric vector storing the second distribution. |
testNA |
a logical value indicating whether or not distributions shall be checked for |
unit |
type of
|
Author(s)
Hajk-Georg Drost
Examples
topsoe(P = 1:10/sum(1:10), Q = 20:29/sum(20:29), testNA = FALSE, unit = "log2")
Wave hedges distance (lowlevel function)
Description
The lowlevel function for computing the wave_hedges distance.
Usage
wave_hedges(P, Q, testNA)
Arguments
P |
a numeric vector storing the first distribution. |
Q |
a numeric vector storing the second distribution. |
testNA |
a logical value indicating whether or not distributions shall be checked for |
Author(s)
Hajk-Georg Drost
Examples
wave_hedges(P = 1:10/sum(1:10), Q = 20:29/sum(20:29), testNA = FALSE)