Help for package piecenorms

Title:

Calculate a Piecewise Normalised Score Using Class Intervals

Version:

1.1.0

Author:

David Hammond [aut, cre]

URL:

https://github.com/david-hammond/piecenorms

BugReports:

https://github.com/david-hammond/piecenorms/issues

Maintainer:

David Hammond <anotherdavidhammond@gmail.com>

Description:

Provides an implementation of piecewise normalisation techniques useful when dealing with the communication of skewed and highly skewed data. It also provides utilities that recommends a normalisation technique based on the distribution of the data.

License:

MIT + file LICENSE

Suggests:

testthat (≥ 3.0.0)

Config/testthat/edition:

Encoding:

UTF-8

RoxygenNote:

7.3.2

Imports:

dplyr, rlang, scales, R6, classInt, univariateML, COINr, stats, vdiffr

Depends:

R (≥ 2.10)

NeedsCompilation:

Packaged:

2024-07-24 05:49:58 UTC; david

Repository:

CRAN

Date/Publication:

2024-07-29 17:20:08 UTC

piecenorms: Calculate a Piecewise Normalised Score Using Class Intervals

Description

piecenorms has been built to calculate normalised data piecewise using class intervals. This is useful in communication of highly skewed data.

Details

For highly skewed data, the package classInt provides a series of options for selecting class intervals. The classInts can be used as the breaks for calculating the piecewise normalisation function piecenorm. The function also allows the user to select their own breaks manually.

For any call to piecenorm, the user provides a vector of observations, a vector of breaks and a direction for the normalisation. The data is then cut into classes and normalised within its class.

Number of Bins:

n = \text{length}(\text{brks}) - 1

Normalisation Class Intervals:

\left(\frac{i-1}{n}, \frac{i}{n}\right] \forall i \in \{1:n\}

In cases where there is only one bin defined as c(min(obs), max(obs)), the function piecenorm resolves to standard minmax normalisation.

The piecenorms package also provides a normalisr R6 class that

Classifies data into a likely distribution family
Provides a recommendation of an appropriate normalisation technique
Provides functionality to apply this normalisation technique to a new data set

This is useful when the user would like to analyse how distributions have changed over time.

Note

As with any non-linear transformation, piecewise normalization preserves ordinal invariance within each class but does not preserve global relative magnitudes. However, it does maintain relative magnitudes within each class. On the other hand, more standard techniques like min-max normalization preserves both ordinal invariance and global relative magnitudes.

Definitions of each are as follows:

Ordinal Invariance: The property that the order of the data points is preserved. If one normalized value is larger than another, it reflects the same order as in the original data.
Non-Preservation of Relative Magnitudes (Global): This refers to the loss of the proportionality of the original data values when normalized. If one value is twice as large as another in the original data, this relationship might not be preserved in the normalized data.
Ordinal Invariance: The property that the order of the data points is preserved. If one normalized value is larger than another, it reflects the same order as in the original data.

Author(s)

Maintainer: David Hammond anotherdavidhammond@gmail.com

Helper function to check for outliers

Description

Helper function to check for outliers

Usage

.check_for_outliers(data)

Arguments

data

Observed data

Value

numeric()

Classified observed data into a distribution class.

Description

Based on a series of statistical tests, uses bootstrapping to classify observed data into one of the following distributions: Binary, Uniform, Normal, Lognormal, Weibull, Pareto, Exponential and Power.

Usage

.classify_distribution(x, potential_distrs)

Arguments

x

A numeric vector of observations

Value

character

Helper function to classify a single sample

Description

Helper function to classify a single sample

Usage

.classify_sample(sample, potential_distrs)

Arguments

sample

sample observations

potential_distrs

The types of distributions to fit

Value

character

Description

Helper function to check for recommendations

Usage

.recommend(x, distr, outliers, classint_pref, nclasses, potential_distrs)

Arguments

x

The observations

distr

The likely distribution

outliers

Does the data have IQR outliers

classint_pref

The preferred classInt style

nclasses

The number of desired classes for classInt

potential_distrs

The types of distributions to fit

Value

list with the following description:

norm: character() the recommended normalisation technique
breaks: numeric The recommended breaks
mdl: the univariateML model

Creates a recommended classInt based on the type of distribution.

Description

Creates a recommended classInt based on the type of distribution.

Details

Creates a normalisr R6 class for recommending a classInt based on the shape of the distribution of the observed data

Public fields

data: (numeric())
Original observations
outliers: (logical())
Logical vector indicating is observations are outliers
quantiles: (numeric())
Vector of quantiles
fitted_distribution: (character())
Suggested distribution
normalisation: (character())
Recommended class interval style based on distribution
breaks: (numeric())
Recommended breaks for classes
number_of_classes: (numeric())
Number of classes identified
normalised_data: (numeric())
Normalised values based on recommendations
polarity: (numeric(1))
Which direction should the normalisation occur
percentiles: (numeric())
Observation percentiles
fittedmodel: (character())
Fitted univariate model
model: (univariateML())
Fitted univariate model parameters

Methods

Method `new()`

Creates a new instance of this R6 class.

Create a new normalisr object.

Usage

normalisr$new(
  x,
  polarity = 1,
  classint_preference = "jenks",
  num_classes = NULL,
  potential_distrs = c("unif", "power", "norm", "lnorm", "weibull", "pareto", "exp")
)

Arguments

x

A numeric vector of observations

polarity

Which direction should the normalisation occur, defaults to 1 but can either be:

1:: Lowest value is normalised to 0, highest value is normalised to 1
-1: Highest value is normalised to 0, lowest value is normalised to 1

classint_preference

Preference for classInt breaks (see ?classInt::classIntervals)

num_classes

Preference for number of classInt breaks, defaults to Sturges number (see
?grDevices::nclass.Sturges)

potential_distrs

The types of distributions to fit, defaults to c("unif", "power", "norm", "lnorm", "weibull", "pareto", "exp")

Returns

A new normalisr object.

Method `print()`

Prints the normalisr

Usage

normalisr$print()

Method `plot()`

Plots the normalised values against the original

Usage

normalisr$plot()

Method `hist()`

Histogram of normalised values against the original

Usage

normalisr$hist()

Method `setManualBreaks()`

Allows user to set manual breaks

Usage

normalisr$setManualBreaks(brks)

Arguments

brks: User Defined Breaks

Method `applyto()`

Applies the normalisation model to new data

Usage

normalisr$applyto(x)

Arguments

x: A numeric vector of observations

Method `as.data.frame()`

Returns a data frame of the normalisation

Usage

normalisr$as.data.frame()

Method `clone()`

The objects of this class are cloneable with this method.

Usage

normalisr$clone(deep = FALSE)

Arguments

deep: Whether to make a deep clone.

Examples

set.seed(12345)

# Binary distribution test
x <- sample(c(0,1), 100, replace = TRUE)
y <- sample(c(0,1), 100, replace = TRUE)
mdl <- normalisr$new(x)
print(mdl)
mdl$plot()
mdl$hist()
head(mdl$as.data.frame())
mdl$applyto(y)

# Uniform distribution test
x <- runif(100)
y <- runif(100)
mdl <- normalisr$new(x)
print(mdl)
mdl$plot()
mdl$hist()
head(mdl$as.data.frame())
mdl$applyto(y)


# Normal distribution tests
x <- rnorm(100)
y <- rnorm(100)
mdl <- normalisr$new(x)
print(mdl)
mdl$plot()
mdl$hist()
head(mdl$as.data.frame())
mdl$applyto(y)

# Lognormal distribution tests
x <- rlnorm(100)
y <- rlnorm(100)
mdl <- normalisr$new(x)
print(mdl)
mdl$plot()
mdl$hist()
head(mdl$as.data.frame())
mdl$applyto(y)

# Lognormal distribution tests with 5 classes
x <- rlnorm(100)
y <- rlnorm(100)
mdl <- normalisr$new(x, num_classes = 5)
print(mdl)
mdl$plot()
mdl$hist()
head(mdl$as.data.frame())
mdl$applyto(y)

# Exponential distribution test
x <- exp(1:100)
y <- exp(1:100)
mdl <- normalisr$new(x)
print(mdl)
mdl$plot()
mdl$hist()
head(mdl$as.data.frame())
mdl$applyto(y)

# Poisson distribution test
x <- rpois(100, lambda = 0.5)
y <- rpois(100, lambda = 0.5)
mdl <- normalisr$new(x)
print(mdl)
mdl$plot()
mdl$hist()
head(mdl$as.data.frame())
mdl$applyto(y)

# Weibull distribution test
x <- rweibull(100, shape = 0.5)
y <- rweibull(100, shape = 0.5)
mdl <- normalisr$new(x)
print(mdl)
mdl$plot()
mdl$hist()
head(mdl$as.data.frame())
mdl$applyto(y)

# Set user defined breaks
mdl$setManualBreaks(c(5,10))
print(mdl)
mdl$plot()
mdl$hist()
head(mdl$as.data.frame())
mdl$applyto(y)

Get piecewse normalised values from a vector of observations

Description

Get piecewse normalised values from a vector of observations

Usage

piecenorm(obs, breaks, polarity = 1)

Arguments

obs

A vector of observations.

breaks

The breaks to normalise to.

polarity

Which direction should the normalisation occur.

Value

Vector of normalised observations

Examples

obs <- exp(1:10)
breaks <- c(min(obs), 8, 20, 100, 1000, 25000)
y <- piecenorm(obs, breaks)
plot(obs, y, type = 'l',
xlab = "Original Values",
ylab = "Normalised Values")

piecenorms: Calculate a Piecewise Normalised Score Using Class Intervals

Description

Details

Note

Author(s)

See Also

Helper function to check for outliers

Description

Usage

Arguments

Value

Classified observed data into a distribution class.

Description

Usage

Arguments

Value

Helper function to classify a single sample

Description

Usage

Arguments

Value

Helper function to check for recommendations

Description

Usage

Arguments

Value

Creates a recommended classInt based on the type of distribution.

Description

Details

Public fields

Methods

Public methods

Method new()

Usage

Arguments

Returns

Method print()

Usage

Method plot()

Usage

Method hist()

Usage

Method setManualBreaks()

Usage

Arguments

Method applyto()

Usage

Arguments

Method as.data.frame()

Usage

Method clone()

Usage

Arguments

Examples

Get piecewse normalised values from a vector of observations

Description

Usage

Arguments

Value

Examples

Method `new()`

Method `print()`

Method `plot()`

Method `hist()`

Method `setManualBreaks()`

Method `applyto()`

Method `as.data.frame()`

Method `clone()`