Help for package ztils

Type:

Package

Title:

Various Common Statistical Utilities

Version:

1.0.0

Description:

Utilities for simplifying common statistical operations including probability density functions, cumulative distribution functions, Kolmogorov-Smirnov tests, principal component analysis plots, and prediction plots.

License:

MIT + file LICENSE

URL:

https://zachpeagler.github.io/ztils/, https://github.com/zachpeagler/ztils

BugReports:

https://github.com/zachpeagler/ztils/issues

Encoding:

UTF-8

RoxygenNote:

7.3.2

Depends:

R (≥ 3.3.0)

Imports:

ggplot2, MASS, scico, vegan

Suggests:

testthat (≥ 3.0.0)

Config/testthat/edition:

NeedsCompilation:

Packaged:

2025-04-09 17:22:52 UTC; Zach

Author:

Zach Peagler [aut, cre, cph]

Maintainer:

Zach Peagler <zachpeagler00@gmail.com>

Repository:

CRAN

Date/Publication:

2025-04-10 14:30:01 UTC

glm_pseudoR2

Description

A function for calculating the pseudo R^2 of a glm object

Usage

glm_pseudor2(mod)

Arguments

mod

The model for which to calculate the pseudo R^2

Value

The pseudo R^2 value of the model

Examples

gmod <- glm(Sepal.Length ~ Petal.Length + Species, data = iris)
glm_pseudor2(gmod)

Multiple Cumulative Distribution Functions for Continuous Variables

Description

This function gets the cumulative distribution function for selected distributions against a continuous, non-negative input variable. Possible distributions include "normal", "lognormal", "gamma", "exponential", "cauchy", "t", "weibull", "logistic", and "all".

Usage

multicdf_cont(var, seq_length = 50, distributions = "all")

Arguments

var

The variable of which to get the CDF

seq_length

The length of sequence to fit the distribution to

distributions

The distributions to fit x against

Value

A dataframe with x, the real density, and the pdf of the desired distributions with length (nrows) equal to seq_length +1.

Examples

multicdf_cont(iris$Petal.Length)

multicdf_cont(iris$Sepal.Length,
              100,
              c("normal", "lognormal")
              )

multicdf_plot

Description

This function extends 'multiCDF_cont' and gets the cumulative distribution functions (CDFs) for selected distributions against a continuous variable. Possible distributions include any combination of "normal", "lognormal", "gamma", "exponential", and "all" (which just uses all of the prior distributions). It then plots this using 'ggplot2' and a 'scico' palette, using var_name for the plot labeling, if specified. If not specified, it will use var instead.

Usage

multicdf_plot(
  var,
  seq_length = 50,
  distributions = "all",
  palette = "oslo",
  var_name = NULL
)

Arguments

var

The variable to for which to plot CDFs

seq_length

The number of points over which to fit x

distributions

The distributions to fit x against

palette

The color palette to use on the graph

var_name

The variable name to use for x

Value

A plot showing the CDF of the selected variable against the selected distributions over the selected sequence length

Examples

multicdf_plot(iris$Sepal.Length)

multicdf_plot(iris$Sepal.Length,
              seq_length = 100,
              distributions = c("normal", "lognormal", "gamma"),
              palette = "bilbao",
              var_name = "Sepal Length (cm)"
              )

Multiple Kolmogorov-Smirnov Tests for Continuous Variables

Description

This function gets the distance and p-value from a Kolmogorov-smirnov test for selected distributions against a continuous input variable. Possible distributions include "normal", "lognormal", "gamma", "exponential", and "all".

Usage

multiks_cont(var, distributions = "all")

Arguments

var

The variable to perform ks tests against

distributions

The distributions to test x against

Value

A dataframe with the distance and p value for each performed ks test

Examples

multiks_cont(iris$Sepal.Length)

multiks_cont(iris$Sepal.Length, c("normal", "lognormal"))

Multiple Proportional Density Functions for Continuous Variables

Description

This function gets the proportional density functions for selected distributions against continuous, non-negative numbers. Possible distributions include "normal", "lognormal", "gamma", "exponential", and "all".

Usage

multipdf_cont(var, seq_length = 50, distributions = "all")

Arguments

var

The variable of which to get the PDF.

seq_length

The length of sequence to fit the distribution to

distributions

The distributions to fit x against

Value

A dataframe with x, the real density, and the pdf of the desired distributions with length (nrows) equal to seq_length +1.

Examples

multipdf_cont(iris$Petal.Length)

multipdf_cont(iris$Sepal.Length, 100, c("normal", "lognormal"))

multipdf_plot

Description

This function extends 'multiPDF_cont' and gets the probability density functions (PDFs) for selected distributions against continuous variables. Possible distributions include any combination of "normal", "lognormal", "gamma", "exponential", and "all" (which just uses all of the prior distributions). It then plots this using 'ggplot2' and a 'scico' palette, using var_name for the plot labeling, if specified. If not specified, it will use var instead.

Usage

multipdf_plot(
  var,
  seq_length = 50,
  distributions = "all",
  palette = "oslo",
  var_name = NULL
)

Arguments

var

The variable to for which to plot PDFs

seq_length

The number of points over which to fit x

distributions

The distributions to fit x against

palette

The color palette to use on the graph

var_name

The variable name to use for x. If no name is provided, the function will grab the column name provided in x

Value

A plot showing the PDF of the selected variable against the selected distributions over the selected sequence length

Examples

multipdf_plot(iris$Sepal.Length)

multipdf_plot(iris$Sepal.Length,
              seq_length = 100,
              distributions = c("normal", "lognormal", "gamma"),
              palette = "bilbao",
              var_name = "Sepal Length (cm)"
              )

No extremes

Description

This function returns a dataframe subsetted to not include observations that are beyond the extremes of the specified variable. Extremes are defined by the quantiles +- 3 times the interquartile range.

Usage

no_extremes(data, var)

Arguments

data

The data to subset

var

The variable to subset by.

Value

A dataframe without entries containing extremes in the selected variable.

Examples

no_extremes(iris, Sepal.Length)

No outliers

Description

This function returns a dataframe subsetted to not include observations that are beyond the outliers of the specified variable. Outliers are defined by the quantiles +- 1.5 times the interquartile range.

Usage

no_outliers(data, var)

Arguments

data

The data to subset

var

The variable to subset by

Value

A dataframe without entries containing outliers in the selected variable.

Examples

no_outliers(iris, Sepal.Length)

Principal Component Analysis Data

Description

This function uses a dataframe, PCA variables, and a scaled boolean to generate a dataframe with principal components as columns.

Usage

pca_data(data, pcavars, scaled = FALSE)

Arguments

data

The dataframe to add principal components to.

pcavars

The variables to include in the principle component analysis

scaled

A boolean (TRUE or FALSE) indicating if the pcavars are already scaled

Value

A plot showing PC1 on the x axis, PC2 on the y axis, colored by group, with vectors and labels showing the individual pca variables.

Examples

pca_data(iris, iris[,c(1:4)], FALSE)

Principal Component Analysis Plot

Description

This function uses a group, PCA variables, and a scaled boolean to generate a biplot.using 'ggplot2' and 'scico'.

Usage

pca_plot(group, pcavars, scaled = FALSE, palette = "oslo")

Arguments

group

The group variable (column)

pcavars

The variables to include in the principle component analysis

scaled

A boolean (TRUE or FALSE) indicating if the pcavars are already scaled

palette

A color palette to use on the plot, with each group assigned to a color.

Value

A plot showing PC1 on the x axis, PC2 on the y axis, colored by group, with vectors and labels showing the individual pca variables.

Examples

pca_plot(iris$Species, iris[,c(1:4)])

pca_plot(iris$Species, iris[,c(1:4)], FALSE, "bilbao")

Prediction Plot

Description

This function uses a model, dataframe, and supplied predictor, response, and group variables to make predictions based off the model over a user-defined length with options to predict over the confidence or prediction interval and to apply a mathematical correction. It then graphs both the real data and the specified interval using 'ggplot2'. You can also choose the color palette from 'scico' palettes.

Usage

predict_plot(
  mod,
  data,
  rvar,
  pvar,
  group = NULL,
  length = 50,
  interval = "confidence",
  correction = "normal",
  palette = "oslo"
)

Arguments

mod

the model used for predictions

data

the data used to render the "real" points on the graph and for aggregating groups to determine prediction limits (should be the same as the data used in the model)

rvar

the response variable (y variable / variable the model is predicting)

pvar

the predictor variable (x variable / variable the model will predict against)

group

the group; should be a factor; one response curve will be made for each group

length

the length of the variable over which to predict (higher = more resolution, essentially)

interval

the type of interval to predict ("confidence" or "prediction")

correction

the type of correction to apply to the prediction ("normal", "exponential", or "logit")

palette

the color palette used to color the graph, with each group corresponding to a color

Value

A plot showing the real data and the model's predicted 95% CI or PI over a number of groups, with optional corrections.

Examples

## Example 1
mod1 <- lm(Sepal.Length ~ Petal.Length + Species, data = iris)
predict_plot(mod1, iris, Sepal.Length, Petal.Length, Species)