philentropy — Information Theory and Distance Quantification with R

CRAN status rstudio mirror downloads rstudio mirror downloads

🧭 Similarity and Distance Quantification between Probability Functions

Describe and understand the world through data.

Data collection and data comparison are the foundations of scientific research.
Mathematics provides the abstract framework to describe patterns we observe in nature and Statistics provides the framework to quantify the uncertainty of these patterns.

In statistics, natural patterns are described in the form of probability distributions that either follow fixed patterns (parametric distributions) or more dynamic ones (non-parametric distributions).

The philentropy package implements fundamental distance and similarity measures to quantify distances between probability density functions as well as traditional information theory measures.
In this regard, it aims to provide a framework for comparing natural patterns in a statistical notation.

🧡 This project is born out of my passion for statistics and I hope it will be useful to those who share it with me.


⚙️ Installation

# install philentropy version 0.10.0 from CRAN
install.packages("philentropy")

Or get the latest developer version:

# install.packages("devtools")
library(devtools)
install_github("HajkD/philentropy", build_vignettes = TRUE, dependencies = TRUE)

🧾 Citation

HG Drost (2018).
Philentropy: Information Theory and Distance Quantification with R.
Journal of Open Source Software, 3(26), 765.
https://doi.org/10.21105/joss.00765

🪶 I am developing philentropy in my spare time and would be very grateful if you would consider citing the paper above if it was useful for your research. These citations help me continue maintaining and extending the package.


🧩 Quick Start

library(philentropy)

P <- c(0.1, 0.2, 0.7)
Q <- c(0.2, 0.2, 0.6)

distance(rbind(P, Q), method = "jensen-shannon")
jensen-shannon using unit 'log'.
jensen-shannon 
    0.02628933

💡 Tip: Got a large matrix (rows = samples, cols = features)?
Use distance(X, method="cosine", mute.message=TRUE) to compute the full pairwise matrix quickly and quietly.


📘 Tutorials


🧪 When should I use which distance?

Goal Recommended Methods
🔁 Clustering / similarity cosine, correlation, euclidean
📊 Probability or compositional data jensen-shannon, hellinger, kullback-leibler
🧬 Sparse counts / binary canberra, jaccard, sorensen
⚖️ Scale-invariant manhattan, chebyshev

Run getDistMethods() to explore all 50+ implemented measures.


🧮 Examples

library(philentropy)
philentropy::getDistMethods()
# define probability density functions P and Q
P <- 1:10/sum(1:10)
Q <- 20:29/sum(20:29)

x <- rbind(P, Q)
philentropy::distance(x, method = "jensen-shannon")
jensen-shannon using unit 'log'.
jensen-shannon 
    0.02628933

Alternatively, compute all available distances:

philentropy::dist.diversity(x, p = 2, unit = "log2")

🌟 Papers using philentropy (highlights)

Flagship examples with top venues. Click to expand full lists.

Nature / Cell / Science
Nature Methods / Nat Comms / Cell family
Other disciplines (selected)

🎓 philentropy has been used in dozens of peer-reviewed publications to quantify distances, divergences, and similarities in complex biological and computational datasets.


🧠 Important Functions

Distance Measures

Information Theory


🗞️ NEWS

Find the current status and version history in the
👉 NEWS section.


🧩 Appendix — full references

mirror server hosted at Truenetwork, Russian Federation.