Help for package ggpca

Title:

Publication-Ready PCA, t-SNE, and UMAP Plots

Version:

0.1.3

Description:

Provides tools for creating publication-ready dimensionality reduction plots, including Principal Component Analysis (PCA), t-Distributed Stochastic Neighbor Embedding (t-SNE), and Uniform Manifold Approximation and Projection (UMAP). This package helps visualize high-dimensional data with options for custom labels, density plots, and faceting, using the 'ggplot2' framework Wickham (2016) <doi:10.1007/978-3-319-24277-4>.

License:

GPL-3

Imports:

config (≥ 0.3.2), golem (≥ 0.4.1), shiny (≥ 1.8.1.1), rlang, Rtsne, cowplot, dplyr, ggplot2, umap

Encoding:

UTF-8

RoxygenNote:

7.3.2

Suggests:

knitr, tibble, rmarkdown

VignetteBuilder:

knitr

NeedsCompilation:

Packaged:

2025-02-04 01:38:37 UTC; Bach

Author:

Yaoxiang Li [cre, aut]

Maintainer:

Yaoxiang Li <liyaoxiang@outlook.com>

Repository:

CRAN

Date/Publication:

2025-02-04 03:20:02 UTC

Create publication-ready PCA, t-SNE, or UMAP plots

Description

This function generates dimensionality reduction plots (PCA, t-SNE, UMAP) with options for custom labels, titles, density plots, and faceting. It allows users to visualize high-dimensional data using various dimensionality reduction techniques.

Usage

ggpca(
  data,
  metadata_cols,
  mode = c("pca", "tsne", "umap"),
  scale = TRUE,
  x_pc = "PC1",
  y_pc = "PC2",
  color_var = NULL,
  ellipse = TRUE,
  ellipse_level = 0.9,
  ellipse_type = "norm",
  ellipse_alpha = 0.9,
  point_size = 3,
  point_alpha = 0.6,
  facet_var = NULL,
  tsne_perplexity = 30,
  umap_n_neighbors = 15,
  density_plot = "none",
  color_palette = "Set1",
  xlab = NULL,
  ylab = NULL,
  title = NULL,
  subtitle = NULL,
  caption = NULL
)

Arguments

data

A data frame containing the data to be plotted. Must include both feature columns (numeric) and metadata columns (categorical).

metadata_cols

A character vector of column names or a numeric vector of column indices for the metadata columns. These columns are used for grouping and faceting.

mode

The dimensionality reduction method to use. One of "pca" (Principal Component Analysis), "tsne" (t-Distributed Stochastic Neighbor Embedding), or "umap" (Uniform Manifold Approximation and Projection).

scale

Logical indicating whether to scale features (default: TRUE for PCA). Not used for "tsne" or "umap".

x_pc

Name of the principal component or dimension to plot on the x-axis (default: "PC1" for PCA).

y_pc

Name of the principal component or dimension to plot on the y-axis (default: "PC2" for PCA).

color_var

(Optional) Name of the column used to color points in the plot. If NULL, no color is applied. Supports both discrete and continuous variables. Default: NULL.

ellipse

Logical indicating whether to add confidence ellipses for groups (only supported for PCA and only if color_var is discrete; default: TRUE).

ellipse_level

Confidence level for ellipses (default: 0.9).

ellipse_type

Type of ellipse to plot, e.g., "norm" for normal distribution (default: "norm").

ellipse_alpha

Transparency level for ellipses, where 0 is fully transparent and 1 is fully opaque (default: 0.9).

point_size

Size of the points in the plot (default: 3).

point_alpha

Transparency level for the points, where 0 is fully transparent and 1 is fully opaque (default: 0.6).

facet_var

Formula for faceting the plot (e.g., Category ~ .), allowing users to split the plot by different groups.

tsne_perplexity

Perplexity parameter for t-SNE, which balances local and global aspects of the data (default: 30).

umap_n_neighbors

Number of neighbors for UMAP, which determines the local structure (default: 15).

density_plot

Controls whether to add density plots for the x, y, or both axes. Accepts one of "none", "x", "y", or "both" (default: "none").

color_palette

Name of the color palette (used for discrete variables) to use for the plot. Supports "Set1", "Set2", etc. from RColorBrewer (default: "Set1").

xlab

Custom x-axis label (default: NULL, will be auto-generated based on the data).

ylab

Custom y-axis label (default: NULL, will be auto-generated based on the data).

title

Plot title (default: NULL).

subtitle

Plot subtitle (default: NULL).

caption

Plot caption (default: NULL).

Value

A ggplot2 object representing the dimensionality reduction plot, including scatter plots, optional density plots, and faceting options. The plot can be further customized using ggplot2 functions.

Author(s)

Yaoxiang Li

Examples


# Load dataset
pca_data <- read.csv(system.file("extdata", "example.csv", package = "ggpca"))

# PCA example
p_pca_y_group <- ggpca(
  pca_data,
  metadata_cols = c(1:6),
  mode = "pca",
  color_var = "group",
  ellipse = TRUE,
  density_plot = "y",
  title = "PCA with Y-axis Density Plot",
  subtitle = "Example dataset, colored by group",
  caption = "Data source: Example dataset"
)
print(p_pca_y_group)

# t-SNE example
p_tsne_time <- ggpca(
  pca_data,
  metadata_cols = c(1:6),
  mode = "tsne",
  color_var = "time",
  tsne_perplexity = 30,
  title = "t-SNE Plot of Example Dataset",
  subtitle = "Colored by time",
  caption = "Data source: Example dataset"
)
print(p_tsne_time)

Process Missing Values in a Data Frame

Description

This function filters columns in a data frame based on a specified threshold for missing values and performs imputation on remaining non-metadata columns using half of the minimum value found in each column. Metadata columns are specified by the user and are exempt from filtering and imputation.

Usage

process_missing_value(data, missing_threshold = 25, metadata_cols = NULL)

Arguments

data

A data frame containing the data to be processed.

missing_threshold

A numeric value representing the percentage threshold of missing values which should lead to the removal of a column. Default is 25.

metadata_cols

A vector of either column names or indices that should be treated as metadata and thus exempt from missing value filtering and imputation. If NULL, no columns are treated as metadata.

Value

A data frame with filtered and imputed columns as necessary.

Examples


data <- data.frame(
  A = c(1, 2, NA, 4),
  B = c(NA, NA, NA, 4),
  C = c(1, 2, 3, 4)
)
# Process missing values while ignoring column 'C' as metadata
processed_data <- process_missing_value(data, missing_threshold = 50, metadata_cols = "C")

Run the Shiny Application

Description

This function launches the Shiny application with the specified user interface and server function. The function does not return a value but starts the Shiny app, allowing users to interact with it.

Usage

run_app(
  onStart = NULL,
  options = list(),
  enableBookmarking = NULL,
  uiPattern = "/",
  ...
)

Arguments

onStart

A function that will be called before the app is actually run. This is only needed for shinyAppObj, since in the shinyAppDir case, a global.R file can be used for this purpose.

options

Named options that should be passed to the runApp call (these can be any of the following: "port", "launch.browser", "host", "quiet", "display.mode" and "test.mode"). You can also specify width and height parameters which provide a hint to the embedding environment about the ideal height/width for the app.

enableBookmarking

Can be one of "url", "server", or "disable". The default value, NULL, will respect the setting from any previous calls to enableBookmarking(). See enableBookmarking() for more information on bookmarking your app.

uiPattern

A regular expression that will be applied to each GET request to determine whether the ui should be used to handle the request. Note that the entire request path must match the regular expression in order for the match to be considered successful.

...

Arguments to pass to 'golem_opts'. See '?golem::get_golem_options' for more details.

Value

No return value, called for side effects.