| Type: | Package | 
| Title: | Principal Component BiSulfite | 
| Version: | 0.1.1 | 
| Description: | A system for fast, accurate, and flexible whole genome bisulfite sequencing (WGBS) data analysis of two-condition comparisons. Principal Component BiSulfite, 'PCBS', assigns methylated loci eigenvector values from the treatment-delineating principal component in lieu of running millions of pairwise statistical tests, which dramatically increases analysis flexibility and reduces computational requirements. Methods: https://katlande.github.io/PCBS/articles/Differential_Methylation.html. | 
| License: | GPL-3 | 
| Encoding: | UTF-8 | 
| LazyData: | true | 
| Depends: | R (≥ 2.10) | 
| URL: | https://github.com/katlande/PCBS | 
| BugReports: | https://github.com/katlande/PCBS/issues | 
| Imports: | ggplot2, tibble, ggrepel, dplyr, data.table | 
| NeedsCompilation: | no | 
| Packaged: | 2024-08-27 16:54:50 UTC; kathrynlande | 
| Author: | Kathryn Lande [aut, cre, cph] | 
| Maintainer: | Kathryn Lande <kathryn.lande@mail.mcgill.ca> | 
| Repository: | CRAN | 
| Date/Publication: | 2024-08-27 18:00:02 UTC | 
Check if DMR calling seed number is ovcompressed.
Description
Identifies if seed number to use for DMR calling causes overcompression.
Usage
CheckOvercompression(ranks, CpG_cutoff, values, max.dmr.size, return.plot)
Arguments
ranks | 
 Rank data frame from getPCRanks.  | 
CpG_cutoff | 
 NULL or numeric. If NULL, seed numbers tested will be input of the values argument. If numeric, seed numbers tested will be CpG_cutoff*values argument. Recommended to us rankDist estimate if not null  | 
values | 
 Numeric vector, either seed numbers to test if CpG_cutoff=NULL or multipliers if CpG_cutoff is numeric  | 
max.dmr.size | 
 Automatic=5000. Maximum DMR expansion size in downstream analysis. Note: pipeline is optimized for 5000bp max DMR size, it is not recommended to play with this value.  | 
return.plot | 
 T/F, whether to return a plot or a numeric representing the best seed number for downstream analysis  | 
Value
If return.plot=T, a grob plotting input seed number vs. compressed seed number is returned. Otherwise, a numeric is returned containing the largest tested input value without detectable overcompression.
Examples
ranks <- getPCRanks(eigen, IDs = c("trt", "ctl"), PC = 1)
CheckOvercompression(ranks, 980)
Identify your best principle component.
Description
Defines the best principle component to use for downstream analysis.
Usage
DefineBestPC(mat, IDs, filter_thresh, return.plot)
Arguments
mat | 
 Bismark2Matrix.R output file, or data frame object  | 
IDs | 
 A character vector of IDs containing the common names for compared conditions. E.g., for samples trt1, trt2 vs. ctl1, ctl2, IDs=c("trt", "ctl")  | 
filter_thresh | 
 A coverage threshold for filtering, where CpG coverage of all samples must be larger than this value  | 
return.plot | 
 T/F, whether to return a PCA plot or a numeric representing the best principle component for downstream analysis  | 
Value
If return.plot=T, a grob plotting a PCA of percent methylation of all samples is returned. Otherwise, a numeric representing the best principal component to use for PCBS analysis is returned.
Examples
DefineBestPC(eigen, IDs = c("trt", "ctl"))
Call DMRs from WGBS data.
Description
DMR Calling.
Usage
Get_Novel_DMRs(ranks, nSeeds, chromDictObj, DMR_resolution, 
QueryLimit, minCpGs, minZ, perms)
Arguments
ranks | 
 Rank data frame from getPCRanks.  | 
nSeeds | 
 Integer, number of input seeds for DMR expansion.  | 
chromDictObj | 
 chromDict() output. If null, chromDict() is run internally.  | 
DMR_resolution | 
 Automatic=NULL. Integer, number of bases to increase the DMR by with each expansion. If NULL, QueryLimit/25.  | 
QueryLimit | 
 Automatic=5000. Maximum DMR expansion size (bp)  | 
minCpGs | 
 Automatic=15. Minimum CpGs in a DMR region, regions with fewer CpGs will be discarded.  | 
minZ | 
 Automatic=1. Absolute Z score threshold for DMR calling; internal value. Not recommended to play with this setting.  | 
perms | 
 Automatic=1000. Number of permutations to use when defining the null distribution. Increasing this value largely influences computational time with minimal return  | 
Value
Returns a data.frame of all novel DMRs.
Examples
ranks <- getPCRanks(eigen, IDs = c("trt", "ctl"), PC = 1)
DMRs <- Get_Novel_DMRs(ranks, 2940, minCpGs=10)
Add a mean methylation column to a data.frame of regions.
Description
Using a chromDictMeth() output object, quickly calculate the mean methylation difference across set of regions.
Usage
MethyDiff_Set(chromDictMeth, regions)
Arguments
chromDictMeth | 
 chromDictMeth() output object  | 
regions | 
 
  | 
Value
Returns the regions object with a mean percent methylation column.
Examples
chromDictMethylDiff <- chromDictMeth(eigen, c("trt", "ctl"))
regions <- data.frame(chr=c("chr3", "chr3", "chr1"),
                      s=c(4920450, 3961576, 300000),
                      e=c(4923267, 3963805, 302900),
                      ID=c("Hypo-DMR", "partial Hyper-DMR", "random"))
MethyDiff_Set(chromDictMethylDiff, regions)
Get the mean methylation difference across a specified region.
Description
Using a chromDictMeth() output object, quickly calculate the mean methylation difference across a user-specified region.
Usage
MethylDiff(chromDictMeth, chrom, start, end)
Arguments
chromDictMeth | 
 chromDictMeth() output object  | 
chrom | 
 
  | 
start | 
 
  | 
end | 
 
  | 
Value
Returns a list of data.tables for each chromosome, for faster analysis.
Examples
chromDictMethylDiff <- chromDictMeth(eigen, c("trt", "ctl"))
MethylDiff(chromDictMethylDiff, "chr3", 4920450, 4923267)
PCBS ggplot theme.
Description
Custom theme for ggplot used by all PCBS output figures.
Usage
Ol_Reliable()
Value
Theme for ggplot objects used by PCBS.
Examples
df <- data.frame(A=c(1,2,3), B=c(1,2,3))
ggplot2::ggplot(df, ggplot2::aes(x=A, y=B))+
ggplot2::geom_point()+
Ol_Reliable()
Add ranks to eigenvector scores.
Description
Defines the best principle component to use for downstream analysis.
Usage
addRanks(ranks)
Arguments
ranks | 
 getPCRanks output data frame.  | 
Value
The input data.frame with rank order and absolute rank order columns.
Examples
ranks <- getPCRanks(eigen, IDs = c("trt", "ctl"))
ranks <- addRanks(ranks)
Check rank cut-off values manually.
Description
Plots a score vs. rank plot with a manually chosen rank cut-off for manual k selection.
Usage
checkRank(ranks, cutoff)
Arguments
ranks | 
 getPCRanks output data frame  | 
cutoff | 
 integer, rank value to check  | 
Value
Returns a grob plotting the input cutoff on a plot of absolute eigenvector score vs. absolute rank order.
Examples
ranks <- getPCRanks(eigen, IDs = c("trt", "ctl"), PC = 1)
test_50 <- checkRank(ranks, 50) # set cut-off to 50
test_500 <- checkRank(ranks, 500) # set cut-off to 500
Convert a rank object into a chromDict.
Description
Internal to many functions; creates a chromDict for faster computing times. chromDict can be run separately to speed up functions run iteratively. A chromDict is a list of chromosome-specific data.tables generated from ranks.
Usage
chromDict(ranks)
Arguments
ranks | 
 getPCRanks output data frame  | 
Value
Returns a list of data.tables for each chromosome, for faster analysis. Used internall by many PCBS functions.
Examples
ranks <- getPCRanks(eigen, IDs = c("trt", "ctl"), PC = 1)
chromDictObj <- chromDict(ranks)
Create a chromDict of percent methylation difference at all sites.
Description
chromDicts are lists of keyed data.tables that enable very fast computing times. Much like the primary PCBS function chromDict() which makes a chromDict of all locus ranks, the chromDictMeth() function makes is a list of chromosome-specific data.tables containing percent methylation differences at all loci..
Usage
chromDictMeth(mat, IDs, filter_thresh)
Arguments
mat | 
 
  | 
IDs | 
 
  | 
filter_thresh | 
 
  | 
Value
Returns a list of data.tables for each chromosome, for faster analysis.
Examples
chromDictMethylDiff <- chromDictMeth(eigen, c("trt", "ctl"))
Simulated WGBS data for PCBS vignettes
Description
simulated WGBS data with added DMRs and DMLs
Usage
eigen
Format
A data frame with 50000 observations on the following 13 variables.
cpgIDa character vector, chrom:locus
trt1_PercMetha numeric vector, percent methylated of sample trt1
trt1_nCpGa numeric vector, depth of sample trt1
trt2_PercMetha numeric vector, percent methylated of sample trt2
trt2_nCpGa numeric vector, depth of sample trt2
trt3_PercMetha numeric vector, percent methylated of sample trt3
trt3_nCpGa numeric vector, depth of sample trt3
ctl1_PercMetha numeric vector, percent methylated of sample ctl1
ctl1_nCpGa numeric vector, depth of sample ctl1
ctl2_PercMetha numeric vector, percent methylated of sample ctl2
ctl2_nCpGa numeric vector, depth of sample ctl2
ctl3_PercMetha numeric vector, percent methylated of sample ctl3
ctl3_nCpGa numeric vector, depth of sample ctl3
Source
generated through simulation
Get CpG eigenvector scores from a principle component.
Description
Returns eigenvector scores for input CpG sites.
Usage
getPCRanks(mat, IDs, PC, filter_thresh)
Arguments
mat | 
 Bismark2Matrix.R output file, or data frame object  | 
IDs | 
 A character vector of IDs containing the common names for compared conditions. E.g., for samples trt1 & trt2 vs. ctl1 & ctl2, IDs=c("trt1", "ctl")  | 
PC | 
 Integer, which principle component to use. Use to DefineBestPC if unsure.  | 
filter_thresh | 
 Integer, a coverage threshold for filtering, where CpG coverage of all samples must be larger than this value.  | 
Value
Returns a data.frame of eigenvector scores for all loci.
Examples
ranks <- getPCRanks(eigen, IDs = c("trt", "ctl"), PC = 1)
Calculated methylation significance in a set of regions.
Description
Returns p-values and Z-scores for CpGs in a set of regions, compared to a local null background distribution.
Usage
getRegionScores(ranks, regions,  chromDictObj)
Arguments
ranks | 
 getPCRanks output data.frame, only necessary if chromDictObj=NULL  | 
regions | 
 A three-column dataframe containing a set of regions to test. Columns = chrom, start, end.  | 
chromDictObj | 
 chromDict() output object, recommended input instead of ranks.  | 
Value
Returns a data.frame with significance scores for all input regions.
Examples
ranks <- getPCRanks(eigen, IDs = c("trt", "ctl"), PC = 1)
DMLs <- addRanks(ranks)
# data.frame of regions to test:
regions <- data.frame(chr=c("chr3", "chr3", "chr1"),
                      s=c(4920450, 3961576, 300000),
                      e=c(4923267, 3963805, 302900),
                      ID=c("Hypo-DMR", "partial Hyper-DMR", "random"))
                
getRegionScores(DMLs, regions)
Nested DMR calling function within Get_Novel_DMRs()
Description
Expands DMRs from collapsed seeds iteratively.
Usage
get_all_DMRs(chromDictObj, seeds, res=40, max.dmr.size=3000, min.dmr.cpgs=10,
min.absZscore, null)
Arguments
chromDictObj | 
 chromDict() output.  | 
seeds | 
 compressed seeds  | 
res | 
 Get_Novel_DMRs arg DMR_resolution  | 
max.dmr.size | 
 Get_Novel_DMRs arg QueryLimit  | 
min.dmr.cpgs | 
 Get_Novel_DMRs arg minCpGs  | 
min.absZscore | 
 Get_Novel_DMRs arg minZ  | 
null | 
 null distribution  | 
Value
Returns a list with two indicies, representing intermediate DMR calls within the Get_Novel_DMRs() function and a list of background regions..
Find y value of linear regression given x.
Description
Find y value of linear regression given x.
Usage
lin(x, l)
Arguments
x | 
 x coordinate  | 
l | 
 linear model of lm()  | 
Value
Returns a column numeric representing the y coordinate at the input x of the linear model l.
PC-Intersect nested function.
Description
Finds the intersection point of two linear regression models, lm().
Usage
lmIntx(fit1, fit2, rnd=2)
Arguments
fit1 | 
 lm() model 1  | 
fit2 | 
 lm() model 2  | 
rnd | 
 number of significant figures  | 
Value
Returns a 2 column data.frame of one row, containing the x and y coordinates of the intersection point of the input models.
Make a metagene from mean percent methylation differences.
Description
Uses mean binned percent methylation differences across a set of regions to draw a metagene.
Usage
methylDiff_metagene(chromDictMethObj, regions, bin, title,
xaxis, yaxis, return.data, linecol, value)
Arguments
chromDictMethObj | 
 chromDictMeth() output object  | 
regions | 
 A three-column data.frame containing a set of regions to test. Columns = chrom, start, end.  | 
bin | 
 integer, number of bins to use in metagenes. Default=100.  | 
title | 
 Output plot title  | 
xaxis | 
 Output plot x-axis title  | 
yaxis | 
 Output plot y-axis title  | 
return.data | 
 T/F, whether to return a plot, or data that can be run with plot_metagene() or multiple_metagenes().  | 
linecol | 
 Colour for line, auto="red"  | 
value | 
 Name of the plotted metric in chromDictMethObj. Only needs to be set explicity for abnormal use cases where chromDictMethObj contains a non-rank value output by chromDict().  | 
Value
If return.data=F, returns a grob containing a metagene plot. Otherwise, returns a list of two data.frames containing metagene and metagene standard error plotting information.
Examples
chromDictMethylDiff <- chromDictMeth(eigen, c("trt", "ctl"))
regions <- data.frame(chr=c("chr3", "chr3", "chr1"),
                      s=c(4920450, 3961576, 300000),
                      e=c(4923267, 3963805, 302900),
                      ID=c("Hypo-DMR", "partial Hyper-DMR", "random"))
 
methylDiff_metagene(chromDictMethylDiff, regions)
                     
Plot multiple metagene data objects on a single plot.
Description
Plots multiple metagene object using the raw data generated by score_metagene().
Usage
multiple_metagenes(data_list, set_names, title, xaxis, yaxis, legend.title, col, se_alpha)
Arguments
data_list | 
 List of score_metagene() raw data output  | 
set_names | 
 Character vector of names for score_metagene() object  | 
title | 
 Output plot title  | 
xaxis | 
 Output plot x-axis title  | 
yaxis | 
 Output plot y-axis title  | 
legend.title | 
 T/F, whether to show legend title  | 
col | 
 Vector of colours to use for lines  | 
se_alpha | 
 0-1, alpha value for standard error shading  | 
Value
Returns a grob containing a plot of the input metagene data.
Examples
ranks <- getPCRanks(eigen, IDs = c("trt", "ctl"), PC = 1)
DMRs <- Get_Novel_DMRs(ranks, 2940, minCpGs=10)
# Select all significantly hypomethylated DMRs:
hypo_DMRs <- DMRs[DMRs$FDR <= 0.05 & DMRs$DMR_Zscore < 0,] 
# Select all significantly hypermethylated DMRs:
hyper_DMRs <- DMRs[DMRs$FDR <= 0.05 & DMRs$DMR_Zscore > 0,]
# select chrom, start, and end of all hyper DMRs
regions_hypo <- hypo_DMRs[c(1:3)] 
regions_hyper <- hyper_DMRs[c(1:3)] 
# return.data = T returns raw data instead of a plot:
hyper_metagene <- score_metagene(ranks, regions_hyper, return.data = TRUE)
hypo_metagene <- score_metagene(ranks, regions_hypo, return.data = TRUE)
# The multiple_metagenes function plots multiple metagenes 
# using a list of raw data objects from score_metagene().
multiple_metagenes(data_list = list(hyper_metagene, hypo_metagene),
set_names = c("Hyper DMRs", "Hypo DMRs"),
title="Metagenes of DMR Regions", legend.title = FALSE)
Nested DMR calling function within within get_all_DMRs(), Get_Novel_DMRs()
Description
Expands one DMR from a single point.
Usage
oneSeed(chroms, seed, resolution, max.size, mincpgs, null_list, Zlim=1)
Arguments
chroms | 
 chromDict() output.  | 
seed | 
 seed value input  | 
resolution | 
 Get_Novel_DMRs arg DMR_resolution  | 
max.size | 
 Get_Novel_DMRs arg QueryLimit  | 
mincpgs | 
 Get_Novel_DMRs arg minCpGs  | 
null_list | 
 get_all_DMRs arg null  | 
Zlim | 
 Get_Novel_DMRs arg minZ  | 
Value
returns a list of two indices, containing a data.frame with the output from a single compressed seed expansion and a data.frame of locus information around the seed expansion, intended for use within the Get_Novel_DMRs() function.
Generate a metagene plot from raw metagene data.
Description
Plots a metagene object using the raw data generated by score_metagene().
Usage
plot_metagene(data, title, xaxis, yaxis, linecol)
Arguments
data | 
 list, score_metagene() raw data output  | 
title | 
 Output plot title  | 
xaxis | 
 Output plot x-axis title  | 
yaxis | 
 Output plot y-axis title  | 
linecol | 
 Colour for line, auto="red"  | 
Value
Returns a grob containing a plot of the input metagene data.
Examples
ranks <- getPCRanks(eigen, IDs = c("trt", "ctl"), PC = 1)
DMRs <- Get_Novel_DMRs(ranks, 2940, minCpGs=10)
# Select all significantly hypomethylated DMRs:
hypo_DMRs <- DMRs[DMRs$FDR <= 0.05 & DMRs$DMR_Zscore < 0,] 
# select chrom, start, and end of all hyper DMRs
regions_hypo <- hypo_DMRs[c(1:3)] 
# return.data = T returns raw data instead of a plot:
hypo_metagene <- score_metagene(ranks, regions_hypo, return.data = TRUE)
plot_metagene(hypo_metagene)
Identify the best rank cut-off for significant CpGs.
Description
Automated rank cut-off estimator for input CpGs.
Usage
rankDist(ranks, draw_intersects, noise_perc, mode, return.plot)
Arguments
ranks | 
 getPCRanks output data frame.  | 
draw_intersects | 
 T/F whether to draw intersect lines if return.plot=T  | 
noise_perc | 
 Automatic=0.5, numeric between 0 and 1. Fraction of ranks to use to model the background noise. Not recommended to play with this value. Increasing/decreasing returns a looser/stricter threshold, respectively.  | 
mode | 
 "intersect" or "strict", determine cut-off with "intersect" or "strict" method. "Strict" is recommended for sets with lower variability  | 
return.plot | 
 T/F, whether to return a plot or a numeric  | 
Value
If return.plot=T, a grob plotting the estimated cutoff on a plot of absolute eigenvector score vs. absolute rank order is returned. Otherwise, a numeric of the estimated cut-off is returned.
Examples
ranks <- getPCRanks(eigen, IDs = c("trt", "ctl"), PC = 1)
rankDist(ranks, mode="intersect")
Make a metagene from PC Scores.
Description
Uses mean binned PC scores across a set of regions to draw a metagene.
Usage
score_metagene(ranks, regions, bin, title, xaxis, yaxis, 
chromDictObj, return.data, linecol)
Arguments
ranks | 
 getPCRanks output data.frame  | 
regions | 
 A three-column data.frame containing a set of regions to test. Columns = chrom, start, end.  | 
bin | 
 integer, number of bins to use in metagenes. Default=100.  | 
title | 
 Output plot title  | 
xaxis | 
 Output plot x-axis title  | 
yaxis | 
 Output plot y-axis title  | 
chromDictObj | 
 Optional chromDictObject made from chromDict(), runs internally if set to NULL (default). Scripts that run this function multiple times will be sped up by setting this option.  | 
return.data | 
 T/F, whether to return a plot, or data that can be run with plot_metagene() or multiple_metagenes().  | 
linecol | 
 Colour for line, auto="red"  | 
Value
If return.data=F, returns a grob containing a metagene plot. Otherwise, returns a list of two data.frames containing metagene and metagene standard error plotting information.
Examples
ranks <- getPCRanks(eigen, IDs = c("trt", "ctl"), PC = 1)
DMRs <- Get_Novel_DMRs(ranks, 2940, minCpGs=10)
# select chrom, start, and end of all hyper DMRs:
hyper_DMRs <- DMRs[DMRs$FDR <= 0.05 & DMRs$DMR_Zscore > 0,]
regions_hyper <- hyper_DMRs[c(1:3)] 
score_metagene(ranks, regions_hyper)
Standard error of a vector.
Description
Takes the standard error of a vector.
Usage
se(x)
Arguments
x | 
 numeric vector.  | 
Value
Returns a numeric, representing the standard error of the input vector.
Examples
x <- sample(1:100, 20)
se(x)
Component of PCBS ggplot theme.
Description
Wrapper to title x-axis text in ggplot objects.
Usage
tilt()
Value
Theme for ggplot objects that cleanly rotates x-axis text.
Nested DMR calling function within within Get_Novel_DMRs()
Description
Trims the edges off of DMR expansions.
Usage
trimDMR(df, region, min.dmr.cpgs, max.dmr.size, null_summary, null_values)
Arguments
df | 
 DMR expansion output dataframe.  | 
region | 
 get_all_DMRs() region output  | 
min.dmr.cpgs | 
 Get_Novel_DMRs arg minCpGs  | 
max.dmr.size | 
 Get_Novel_DMRs arg QueryLimit  | 
null_summary | 
 null distribution conrtainer  | 
null_values | 
 null distribution conrtainer  | 
Value
Returns a data.frame of all trimmed DMRs for use within the Get_Novel_DMRs() function.