Type: | Package |
Title: | Approximate and Exact Optimal Transport Methods |
Version: | 1.2 |
Date: | 2025-01-06 |
Maintainer: | Eric Dunipace <edunipace@mail.harvard.edu> |
Description: | R and C++ functions to perform exact and approximate optimal transport. All C++ methods can be linked to other R packages via their header files. |
License: | GPL (== 3.0) |
Imports: | Rcpp (≥ 1.0.3), stats |
LinkingTo: | Rcpp, RcppEigen, RcppCGAL, BH |
BugReports: | https://github.com/ericdunipace/approxOT/issues |
Suggests: | testthat (≥ 2.1.0), transport |
Encoding: | UTF-8 |
RoxygenNote: | 7.3.2 |
SystemRequirements: | C++17 |
URL: | https://github.com/ericdunipace/approxOT |
NeedsCompilation: | yes |
Packaged: | 2025-01-09 00:44:28 UTC; eifer |
Author: | Eric Dunipace |
Repository: | CRAN |
Date/Publication: | 2025-01-09 12:30:06 UTC |
An R package to perform exact and approximate optimal transport.
Description
R and C++ functions to perform exact and approximate optimal transport. All C++ methods are linkable to other R packages via their header files.
Author(s)
Eric Dunipace
See Also
Useful links:
Report bugs at https://github.com/ericdunipace/approxOT/issues
Transform transportation plan to transportation matrix
Description
Transform transportation plan to transportation matrix
Usage
## S3 method for class 'transport.plan'
as.matrix(x, ...)
Arguments
x |
An object of class 'transport.plan'. See output of (transport_plan)[transport_plan()] |
... |
Unused arguments |
Value
A matrix specifying the minimal joint distribution between samples. Margins will be equal to the marginal distributions of the samples
Examples
set.seed(203987)
n <- 5
d <- 2
x <- matrix(rnorm(d*n), nrow=d, ncol=n)
y <- matrix(rnorm(d*n), nrow=d, ncol=n)
#get hilbert sort orders for x in backwards way
trans_plan <- transport_plan(X=x, Y=x, ground_p = 2, p = 2,
observation.orientation = "colwise",
method = "hilbert")
trans_matrix <- as.matrix(trans_plan)
print(trans_matrix)
Transform transportation matrix to transportation plan
Description
Transform transportation matrix to transportation plan
Usage
as.transport.plan(transport_matrix, ...)
Arguments
transport_matrix |
A matrix that is a transportation matrix, i.e. the minimal joint distribution for two samples. |
... |
Unused arguments |
Value
An object of class 'transport.plan'. See output of (transport_plan)[transport_plan]
Examples
set.seed(203987)
n <- 5
d <- 2
x <- matrix(stats::rnorm(d*n), nrow=d, ncol=n)
y <- matrix(stats::rnorm(d*n), nrow=d, ncol=n)
#get hilbert sort orders for x in backwards way
trans_plan <- transport_plan(X=x, Y=x, ground_p = 2, p = 2,
observation.orientation = "colwise",
method = "hilbert")
trans_matrix <- as.matrix(trans_plan$tplan)
tplan2 <- as.transport.plan(trans_matrix)
all.equal(tplan2, trans_plan$tplan)
Calculate cost matrix
Description
Calculate cost matrix
Usage
cost_calc(X, Y, ground_p)
Arguments
X |
matrix of values in first sample. Observations should be by column, not rows. |
Y |
matrix of Values in second sample. Observations should be by column, not rows. |
ground_p |
power of the Lp norm to use in cost calculation. |
Value
matrix of costs
Examples
X <- matrix(rnorm(10*100), 10, 100)
Y <- matrix(rnorm(10*100), 10, 100)
# the Euclidean distance
cost <- cost_calc(X, Y, ground_p = 2)
Covert the 2-dimensional index to 1-dimensional index
Description
Covert the 2-dimensional index to 1-dimensional index
Usage
dist_2d_to_1d(i, j, n, m)
Arguments
i |
Index of row |
j |
Index of column |
n |
Total number of rows |
m |
Total number of columns |
Value
a 1d index for easy matrix entry
One-dimensional optimal transport for measures with more general mass
Description
One-dimensional optimal transport for measures with more general mass
Usage
general_1d_transport(
X,
Y,
a = NULL,
b = NULL,
method = c("hilbert", "univariate")
)
Arguments
X |
Data for sample one. Should be a vector if method is "univariate" or a matrix if method is "hilbert" |
Y |
Data for sample two Should be a vector if method is "univariate" or a matrix if method is "hilbert" |
a |
Empirical measure for sample one. |
b |
Empirical measure for sample two. |
method |
One of "hilbert" or "univariate" |
Value
An optimal transportation plan as a list with slots "from", "to", and "mass"
Examples
set.seed(23423)
n <- 100
d <- 10
x <- matrix(stats::rnorm((n + 11)*d), n + 11 , d)
y <- matrix(stats::rnorm(n*d), n, d)
trans <- general_1d_transport(t(x), t(y))
Get order along the Hilbert curve
Description
Get order along the Hilbert curve
Usage
hilbert.projection(X, Sigma = NULL)
Arguments
X |
matrix of values. Observations are unique by rows. |
Sigma |
Covariance of the data. If provided, uses a Mahalanobis distance. |
Value
Index of orders
Examples
X <- matrix(rnorm(10*3), 3, 10)
idx <- hilbert.projection(X)
print(idx)
Returns orders along the Hilbert space-filling Curve
Description
Returns orders along the Hilbert space-filling Curve
Usage
hilbert_proj_(A)
Arguments
A |
a matrix of data-values of class Eigen::MatrixXd |
Value
An integer vector of orders
Check if function is a transport.plan
Description
Check if function is a transport.plan
Usage
is.transport.plan(tplan)
Arguments
tplan |
An object of class 'transport.plan'. See output of (transport_plan)[transport_plan] |
Value
Logical
Examples
set.seed(203987)
n <- 5
d <- 2
x <- matrix(rnorm(d*n), nrow=d, ncol=n)
y <- matrix(rnorm(d*n), nrow=d, ncol=n)
#get hilbert sort orders for x in backwards way
trans_plan <- transport_plan(X=x, Y=x, ground_p = 2, p = 2,
observation.orientation = "colwise",
method = "hilbert")
print(is.transport.plan(trans_plan))
Round transportation matrix to feasible set
Description
Round transportation matrix to feasible set
Usage
round_transport_matrix(transport_matrix, mass_x, mass_y)
Arguments
transport_matrix |
A transportation matrix returned by an approximate method |
mass_x |
The distribution of the first margin |
mass_y |
The distribution of the second margin |
Value
Returns a transportation matrix projected to the feasible set.
Return the dual potentials for the Sinkhorn distance
Description
Return the dual potentials for the Sinkhorn distance
Usage
sinkhorn_pot(
mass_x,
mass_y,
p = 2,
cost = NULL,
cost_a = NULL,
cost_b = NULL,
...
)
Arguments
mass_x |
The empirical distribution of the first sample |
mass_y |
The empirical distribution of the second sample |
p |
The power to raise the cost by |
cost |
The cost matrix between first and second samples |
cost_a |
The cost matrix for the first sample |
cost_b |
The cost matrix for the second sample |
... |
Additional arguments including
|
Value
A list with slots "f" and "g", the potentials of the rows and margins, respectively.
Function returning supported optimal transportation methods.
Description
Function returning supported optimal transportation methods.
Usage
transport_options()
Details
The currently supported methods are
exact, networkflow: Utilize the networkflow algorithm to solve the exact optimal transport problem
shortsimplex: Use the shortsimplex algorithm to solve the exact optimal transport problem
sinkhorn: Use Sinkhorn's algorithm to solve the approximate optimal transport problem
sinkhorn_log: Use Sinkhorn's algorithm on a log-scale for added stability to solve the approximate optimal transport problem
greenkhorn: Use the Greenkhorn algorithm to solve the approximate optimal transport problem
hilbert: Use hilbert sorting to perform approximate optimal transport
rank: use the average covariate ranks to perform approximate optimal transport
univariate: Use appropriate optimal transport methods for univariate data
swapping: Utilize the swapping algorithm to perform approximate optimal transport
sliced: Use the sliced optimal transport distance
Value
Returns a vector of supported transport methods
Optimal transport plans
Description
Optimal transport plans
Usage
transport_plan(
X,
Y,
a = NULL,
b = NULL,
p = 2,
ground_p = 2,
observation.orientation = c("rowwise", "colwise"),
method = transport_options(),
...
)
Arguments
X |
The covariate data of the first sample. |
Y |
The covariate data of the second sample. |
a |
Optional. Empirical measure of the first sample |
b |
Optional. Empirical measure of the second sample |
p |
The power of the Wasserstein distance |
ground_p |
The power of the Lp norm |
observation.orientation |
Are observations by row ("rowwise") or column ("colwise"). |
method |
Which transportation method to use. See [transport_options][transport_options] |
... |
Additional arguments for various methods
|
Value
a list with slots "tplan" and "cost". "tplan" is the optimal transport plan and "cost" is the optimal transport distance.
Examples
set.seed(203987)
n <- 100
d <- 10
x <- matrix(stats::rnorm(d*n), nrow=d, ncol=n)
y <- matrix(stats::rnorm(d*n), nrow=d, ncol=n)
#get hilbert sort orders for x in backwards way
transx <- transport_plan(X=x, Y=x, ground_p = 2, p = 2,
observation.orientation = "colwise",
method = "hilbert")
Optimal transport plans given a pre-specified cost
Description
Optimal transport plans given a pre-specified cost
Usage
transport_plan_given_C(
mass_x,
mass_y,
p = 2,
cost = NULL,
method = "exact",
cost_a = NULL,
cost_b = NULL,
...
)
Arguments
mass_x |
The empirical measure of the first sample |
mass_y |
The empirical measure of the second sample. |
p |
The power of the Wasserstein distance |
cost |
Specify the cost matrix in advance. |
method |
The transportation method to use, one of "exact", "networkflow","shortsimplex", "sinkhorn", "greenkhorn" |
cost_a |
The cost matrix for the first sample with itself. Only used for unbiased Sinkhorn |
cost_b |
The cost matrix for the second sample with itself. Only used for unbiased Sinkhorn |
... |
Additional arguments for various methods
|
Value
A transportation plan as an object of class "transport.plan", which is a list with slots "from","to", and "mass".
Examples
n <- 32
d <- 5
set.seed(293897)
A <- matrix(stats::rnorm(n*d),nrow=d,ncol=n)
B <- matrix(stats::rnorm(n*d),nrow=d,ncol=n)
transp.meth <- "sinkhorn"
niter <- 1e2
test <- transport_plan_given_C(rep(1/n,n),
rep(1/n,n), 2, cost = cost_calc(A,B,2),
"sinkhorn", niter = niter)
Multimarginal optimal transport plans
Description
Multimarginal optimal transport plans
Usage
transport_plan_multimarg(
...,
p = 2,
ground_p = 2,
observation.orientation = c("rowwise", "colwise"),
method = c("hilbert", "univariate", "sliced"),
nsim = 1000
)
Arguments
... |
Either data matrices as separate arguments or a list of data matrices. Arguments after the data must be specified by name. |
p |
The power of the Wasserstein distance to use |
ground_p |
The power of the Euclidean distance to use |
observation.orientation |
Are observations by rows or columns |
method |
One of "hilbert", "univariate", or "sliced" |
nsim |
Number of simulations to use for the sliced method |
Value
transport plan
Examples
set.seed(23423)
n <- 100
d <- 10
p <- ground_p <- 2 #euclidean cost, p = 2
x <- matrix(stats::rnorm((n + 11)*d), n + 11 , d)
y <- matrix(stats::rnorm(n*d), n, d)
z <- matrix(stats::rnorm((n +455)*d), n +455, d)
# make data a list
data <- list(x,y,z)
tplan <- transport_plan_multimarg(data, p = p, ground_p = ground_p,
observation.orientation = "rowwise", method = "hilbert")
#' #transpose data works too
datat <- lapply(data, t)
tplan2 <- transport_plan_multimarg(datat, p = p, ground_p = ground_p,
observation.orientation = "colwise",method = "hilbert")
Calculate the Wasserstein distance
Description
Calculate the Wasserstein distance
Usage
wasserstein(
X = NULL,
Y = NULL,
a = NULL,
b = NULL,
cost = NULL,
tplan = NULL,
p = 2,
ground_p = 2,
method = transport_options(),
cost_a = NULL,
cost_b = NULL,
...
)
Arguments
X |
The covariate data of the first sample. |
Y |
The covariate data of the second sample. |
a |
Optional. Empirical measure of the first sample |
b |
Optional. Empirical measure of the second sample |
cost |
Specify the cost matrix in advance. |
tplan |
Give a transportation plan with slots "from", "to", and "mass", like that returned by the [tranportation_plan()] function. |
p |
The power of the Wasserstein distance |
ground_p |
The power of the Lp norm |
method |
Which transportation method to use. See [transport_options()] |
cost_a |
The cost matrix for the first sample with itself. Only used for unbiased Sinkhorn |
cost_b |
The cost matrix for the second sample with itself. Only used for unbiased Sinkhorn |
... |
Additional arguments for various methods:
|
Value
The p-Wasserstein distance, a numeric value
Examples
set.seed(11289374)
n <- 100
z <- stats::rnorm(n)
w <- stats::rnorm(n)
uni <- approxOT::wasserstein(X = z, Y = w,
p = 2, ground_p = 2,
observation.orientation = "colwise",
method = "univariate")