Type: Package
Title: Unified Interface to Distance, Dissimilarity, Similarity Matrices
Version: 0.2.0
Description: Provides a high level API to interface over sources storing distance, dissimilarity, similarity matrices with matrix style extraction, replacement and other utilities. Currently, in-memory dist object backend is supported.
URL: https://github.com/talegari/disto
BugReports: https://github.com/talegari/disto/issues
Imports: proxy (≥ 0.4.19), dplyr (≥ 0.7.4), assertthat (≥ 0.2.0), fastmatch(≥ 1.1.0), tidyr (≥ 0.8.0), factoextra (≥ 1.0.5), ggplot2 (≥ 2.2.1), broom (≥ 0.4.4), fastcluster (≥ 1.1.25), pbapply (≥ 1.3.4),
Depends: R (≥ 3.4.0)
License: GPL-3
Encoding: UTF-8
RoxygenNote: 6.0.1
Suggests: knitr (≥ 1.15.1), rmarkdown (≥ 1.4),
VignetteBuilder: knitr
NeedsCompilation: no
Packaged: 2018-08-02 12:39:49 UTC; srikanth
Author: KS Srikanth [aut, cre]
Maintainer: KS Srikanth <sri.teach@gmail.com>
Repository: CRAN
Date/Publication: 2018-08-02 12:50:02 UTC

Constructor for class 'disto'

Description

Create mapping to data sources storing distances(symmetric), dissimilarities(non-symmetric), similarities and so on

Provides a high level API to interface over backends storing distance, dissimilarity, similarity matrices with matrix style extraction, replacement and other utilities. Currently, in-memory dist object backend is supported.

Usage

disto(..., backend = "dist")

Arguments

...

Arguments for a backend. See details

backend

(string) Specify a backend. Currently supported: 'dist'

Details

This is a wrapper to create a 'disto' handle over different backends storing distances, dissimilarities, similarities etc with minimal data overhead like a database connection. The following named arguments are required to set-up the backend:

Value

Object of class 'disto' which is a thin wrapper on a list

Author(s)

Srikanth KS

See Also

Useful links:

Examples

temp <- stats::dist(iris[,1:4])
dio   <- disto(objectname = "temp")
dio
unclass(dio)

In-place replacement of values

Description

For dist backend see: dist_replace.

Usage

## S3 replacement method for class 'disto'
x[i, j, k] <- value

Arguments

x

object of class 'disto'

i

(integer vector) row index

j

(integer vector) column index

k

(integer vector) direct index

value

(integer/numeric vector) Values to replace

Value

Invisible disto object. Note that this function is called for its side effect.

Examples

temp       <- stats::dist(iris[,1:4])
dio        <- disto(objectname = "temp")
names(dio) <- paste0("a", 1:150)
dio

dio[1, 2] <- 10
dio[1,2]

dio[1:10, 2:11] <- 100
dio[1:10, 2:11, product = "inner"]

dio[paste0("a", 1:5), paste0("a", 6:10)] <- 101
dio[paste0("a", 1:5), paste0("a", 6:10), product = "inner"]

Extract from a disto object in matrix style extraction

Description

Extract a disto object in matrix style extraction and via direct indexing. 'product' specification allows both outer (matrix output, default option) and inner (vector) product type extraction. For dist backend see: dist_extract.

Usage

## S3 method for class 'disto'
x[i, j, k, product = "outer"]

Arguments

x

object of class 'disto'

i

(integer vector) row indexes

j

(integer vector) column indexes

k

(integer vector) direct indexes

product

(string) One among: "inner", "outer"

Value

When product is 'outer', returns a matrix. Else, a vector.

Examples

temp <- stats::dist(iris[,1:4])
dio <- disto(objectname = "temp")
dio
names(dio) <- paste0("a", 1:150)

dio[1, 2]
dio[2, 1]
dio[c("a1", "a10"), c("a5", "a72")]
dio[c("a1", "a10"), c("a5", "a72"), product = "inner"]
dio[k = c(1,3,5)]

Extract a single value from disto object

Description

Extract a single value from disto object in matrix style extraction and via direct indexing. This does not support using names. This is faster than link{extract}. For dist backend see: dist_extract.

Usage

## S3 method for class 'disto'
x[[i, j, k]]

Arguments

x

object of class 'disto'

i

(integer vector) row index

j

(integer vector) column index

k

(integer vector) direct index

Value

(A real number) Distance value

Examples

temp <- stats::dist(iris[,1:4])
dio  <- disto(objectname = "temp")
dio

dio[[1, 2]]
dio[[2, 1]]
dio[[k = 3]]

Set names/labels

Description

Set names/labels of the underlying distance storing backend

Usage

## S3 replacement method for class 'disto'
names(x) <- value

Arguments

x

disto object

value

A character vector

Value

invisible disto object

Examples

temp <- stats::dist(iris[,1:4])
dio <- disto(objectname = "temp")
dio
names(dio) <- paste0("a", 1:150)

Convert a disto object to dataframe

Description

Convert the underlying data of a disto object to a dataframe in long format (3 columns: item1, item2, distance). This might be a costly operation and should be used with caution.

Usage

## S3 method for class 'disto'
as.data.frame(x, ...)

Arguments

x

object of class disto

...

arguments for tidy

Value

a dataframe in long format

Examples

temp <- stats::dist(iris[,1:4])
dio  <- disto(objectname = "temp")
dio
head(as.data.frame(dio))

Matrix like apply function for disto object

Description

Apply function for data underlying disto object

Usage

dapply(x, margin = 1, fun, subset, nproc = 1)

Arguments

x

disto object

margin

(one among 1 or 2) dimension to apply function along

fun

Function to apply over the margin

subset

(integer vector) Row/Column numbers along the margin

nproc

Number of parallel processes (unix only)

Value

Simplified output of 'sapply' like function temp <- dist(iris[,1:4]) dio <- disto(objectname = "temp")

# function to pick indexes of 5 nearest neighbors # an efficient alternative with Rcpp is required udf <- function(x) dim(x) <- NULL order(x)[1:6] hi <- dapply(dio, 1, udf)[-1, ] dim(hi)


Matrix style extraction from dist object

Description

Matrix style extraction supports 'inner' and 'outer'(default) products

Usage

dist_extract(object, i, j, k, product = "outer")

Arguments

object

dist object

i

(integer vector) row positions

j

(integer vector) column positions

k

(integer vector) positions

product

(string) One among: 'inner', 'outer'(default)

Details

In k-mode, both i and j should be missing and k should not be missing. In ij-mode, k should be missing and both i and j are optional. If i or j are missing, they are interpreted as all values of i or j (similar to matrix or dataframe subsetting). If i and j are of unequal length, the smaller one is recycled.

Value

A matrix or vector of distances when product is 'outer' and 'inner' respectively

Examples

# examples for dist_extract

# create a dist object
temp <- dist(iris[,1:4])
attr(temp, "Labels") <- outer(letters, letters, paste0)[1:150]
head(temp)
max(temp)
as.matrix(temp)[1:5, 1:5]


dist_extract(temp, 1, 1)
dist_extract(temp, 1, 2)
dist_extract(temp, 2, 1)
dist_extract(temp, "aa", "ba")

dist_extract(temp, 1:10, 11:20)
dim(dist_extract(temp, 1:10, ))
dim(dist_extract(temp, , 1:10))
dist_extract(temp, 1:10, 11:20, product = "inner")
length(dist_extract(temp, 1:10, , product = "inner"))
length(dist_extract(temp, , 1:10, product = "inner"))

dist_extract(temp, c("aa", "ba", "ca"), c("ca", "da", "fa"))
dist_extract(temp, c("aa", "ba", "ca"), c("ca", "da", "fa"), product = "inner")

dist_extract(temp, k = 1:3) # product is always inner when k is specified

Vectorized version of dist_ij_k_

Description

Convert ij indexes to k indexes for a dist object

Usage

dist_ij_k(i, j, size)

Arguments

i

row indexes

j

column indexes

size

value of size attribute of the dist object

Value

k indexes


Convert ij index to k index

Description

Convert ij index to k index for a dist object

Usage

dist_ij_k_(i, j, size)

Arguments

i

row index

j

column index

size

value of size attribute of the dist object

Value

k index


Vectorized version of dist_k_ij_

Description

Convert kth indexes to ij indexes of a dist object

Usage

dist_k_ij(k, size)

Arguments

k

kth indexes

size

value of size attribute of the dist object

Value

ij indexes as 2*n matrix where n is length of k vector


Convert kth index to ij index

Description

Convert kth index to ij index of a dist object

Usage

dist_k_ij_(k, size)

Arguments

k

kth index

size

value of size attribute of the dist object

Value

ij index as a length two integer vector


Replacement values in dist

Description

Replacement values of a dist object with either ij or position indexing

Usage

dist_replace(object, i, j, value, k)

Arguments

object

dist object

i

(integer vector) row positions

j

(integer vector) column positions

value

(integer/numeric vector) Values to replace

k

(integer vector) positions

Details

There are two modes to specify the positions:

Value

dist object

Examples


# create a dist object
d <- dist(iris[,1:4])
attr(d, "Labels") <- outer(letters, letters, paste0)[1:150]
head(d)
max(d)
as.matrix(d)[1:5, 1:5]

# replacement in ij-mode
d <- dist_replace(d, 1, 2, 100)
dist_extract(d, 1, 2, product = "inner")
d <- dist_replace(d, "ca", "ba", 102)
dist_extract(d, "ca", "ba", product = "inner")

d <- dist_replace(d, 1:5, 6:10, 11:15)
dist_extract(d, 1:5, 6:10, product = "inner")
d <- dist_replace(d, c("ca", "da"), c("aa", "ba"), 102)
dist_extract(d, c("ca", "da"), c("aa", "ba"), product = "inner")

# replacement in k-mode
d <- dist_replace(d, k = 2, value = 101)
dist_extract(d, k = 2)
dist_extract(d, 3, 1, product = "inner") # extracting k=2 in ij-mode

dist_subset

Description

Compute subset faster than regular '[[' on a dist object. This is from proxy package (not exported by proxy).

Usage

dist_subset(x, subset, ...)

Arguments

x

dist object

subset

index of the subset. This has to be unique.

...

additional arguments

Value

returns a dist subset


Constructior of disto with dist backend

Description

Constructior of disto with dist backend

Usage

disto_dist(arguments)

Arguments

arguments

to construct disto object

Details

to be used by disto constructor function

Value

returns a list


Get names/labels

Description

Get names/labels of the underlying distance storing backend

Usage

## S3 method for class 'disto'
names(x)

Arguments

x

disto object

Value

A character vector

Examples

temp <- stats::dist(iris[,1:4])
dio <- disto(objectname = "temp")
dio
names(dio) <- paste0("a", 1:150)

Plot a disto object

Description

Various plotting options for subsets of disto objects

Usage

## S3 method for class 'disto'
plot(x, ...)

Arguments

x

object of class disto

...

Additional arguments. See details.

Details

Among the additional arguments,

Value

ggplot object

Examples

temp <- stats::dist(iris[,1:4])
dio  <- disto(objectname = "temp")
plot(dio, type = "heatmap")
plot(dio, type = "dendrogram")

Print method for dist class

Description

Print method for dist class

Usage

## S3 method for class 'disto'
print(x, ...)

Arguments

x

object of class disto

...

currently not in use

Value

invisible NULL. Function writes backend type and size to terminal as a message.

Examples

temp <- stats::dist(iris[,1:4])
dio   <- disto(objectname = "temp")
print(dio)

Obtain size of the disto object

Description

Obtain size of the disto object

Usage

size(disto, ...)

Arguments

disto

object of class disto

...

currently not in use

Value

Integer vector of length 1

Examples

temp <- stats::dist(iris[,1:4])
dio   <- disto(objectname = "temp")
size(dio)

Summary method for dist class

Description

Summary method for dist class

Usage

## S3 method for class 'disto'
summary(object, ...)

Arguments

object

object of class disto

...

currently not in use

Value

invisibly returns the tidy output of summary as a dataframe.

Examples

temp <- stats::dist(iris[,1:4])
dio   <- disto(objectname = "temp")
dio
summary(dio)

mirror server hosted at Truenetwork, Russian Federation.