Type: | Package |
Title: | Disk-Based Constrained Change-Point Detection |
Version: | 2024.10.1 |
Maintainer: | Toby Dylan Hocking <toby.hocking@r-project.org> |
Description: | Disk-based implementation of Functional Pruning Optimal Partitioning with up-down constraints <doi:10.18637/jss.v101.i10> for single-sample peak calling (independently for each sample and genomic problem), can handle huge data sets (10^7 or more). |
License: | GPL-3 |
Depends: | R (≥ 2.10) |
Imports: | data.table (≥ 1.9.8) |
Suggests: | testthat, ggplot2, future.apply, future, knitr, markdown |
VignetteBuilder: | knitr |
URL: | https://github.com/tdhock/PeakSegDisk |
BugReports: | https://github.com/tdhock/PeakSegDisk/issues |
NeedsCompilation: | yes |
Packaged: | 2024-10-02 02:31:59 UTC; tdhock |
Author: | Toby Dylan Hocking [aut, cre] |
Repository: | CRAN |
Date/Publication: | 2024-10-02 04:30:07 UTC |
Reads aligned to hg19 from two ChIP-seq experiments
Description
These data are raw aligned reads which have been mapped to the human genome, hg19. One is sample ID McGill0004, experiment H3K36me3, chr9, chunk H3K36me3_AM_immune/8. The other is sample ID McGill0002, experiment H3K4me3, chr2, chunk H3K4me3_PGP_immune/7. The chunk ID numbers refer to parts of the McGill benchmark data set, https://rcdata.nau.edu/genomic-ml/chip-seq-chunk-db/
Usage
data("ChIPreads")
Format
A data frame with 40396 observations on the following 4 variables.
experiment
either H3K36me3 or H3K4me3
chrom
either chr9 or chr2
chromStart
0-based start position of read
chromEnd
1-based end position of read
count
number of times a read occured with the given chromStart/end in this sample/experiment
Details
Peak detection algorithm are typically run on a sequence of non-negative integer count data, one data point for each genomic position. These data are useful for proving that peak detection methods are robust to different sequences: (1) spatially correlated, non-independent aligned read coverage; (2) un-correlated, independent representations such as first or last read.
A small ChIP-seq data set in which peaks can be found using PeakSegFPOP
Description
The data come from an H3K27ac ChIP-seq experiment which was aligned to the human reference genome (hg19), aligned read counts were used to produce the coverage data; looking at these data in a genome browser was used to produce the labels. ChIP-seq means Chromatin Immunoprecipitation followed by high-throughput DNA sequencing; it is an assay used to characterize genome-wide DNA-protein interactions. In this experiment the protein of interest is histone H3, with the specific modification K27ac (hence the name H3K27ac). Large counts (peaks) therefore indicate regions of the reference genome with high likelihood of interaction between DNA and that specific protein, in the specific Monocyte sample tested.
Usage
data("Mono27ac")
Format
A list of 2 data.tables: coverage has 4 columns (chrom, chromStart, chromEnd, count=number of aligned reads at each position on chrom:chromStart-chromEnd); labels has 4 columns (chrom, chromStart, chromEnd, annotation=label at chrom:chromStart-chromEnd). chrom refers to the chromosome on whcih the data were gathered (chr11), chromStart is the 0-based position before the first base of the data/label, chromEnd is the 1-based position which is the last base of the data/label. Therefore, each chromEnd on each row should be equal to the chromStart of the next row.
Source
UCI Machine Learning Repository, chipseq data set, problem directory H3K27ac-H3K4me3_TDHAM_BP/samples/Mono1_H3K27ac/S001YW_NCMLS/problems/chr11:60000-580000 Links: https://archive.ics.uci.edu/ml/datasets/chipseq for the UCI web page; https://github.com/tdhock/feature-learning-benchmark for a more detailed explanation.
PeakSeg penalized solver for data.frame
Description
Write data frame to disk then run PeakSegFPOP_dir
solver.
Usage
PeakSegFPOP_df(count.df,
pen.num, base.dir = tempdir())
Arguments
count.df |
data.frame with columns count, chromStart, chromEnd. These data
will be saved via |
pen.num |
Non-negative numeric scalar. |
base.dir |
base.dir/chrXX-start-end/coverage.bedGraph will be written, where chrXX is the chrom column, start is the first chromStart position, and end is the last chromEnd position. |
Value
List of solver results, same as PeakSegFPOP_dir
.
Author(s)
Toby Dylan Hocking <toby.hocking@r-project.org> [aut, cre]
Examples
## Simulate a sequence of Poisson count data.
sim.seg <- function(seg.mean, size.mean=15){
seg.size <- rpois(1, size.mean)
rpois(seg.size, seg.mean)
}
set.seed(1)
seg.mean.vec <- c(1.5, 3.5, 0.5, 4.5, 2.5)
z.list <- lapply(seg.mean.vec, sim.seg)
z.rep.vec <- unlist(z.list)
## Plot the simulated data sequence.
if(require(ggplot2)){
count.df <- data.frame(
position=seq_along(z.rep.vec),
count=z.rep.vec)
gg.count <- ggplot()+
geom_point(aes(
position, count),
shape=1,
data=count.df)
gg.count
}
## Plot the true changes.
n.segs <- length(seg.mean.vec)
seg.size.vec <- sapply(z.list, length)
seg.end.vec <- cumsum(seg.size.vec)
change.vec <- seg.end.vec[-n.segs]+0.5
change.df <- data.frame(
changepoint=change.vec)
gg.change <- gg.count+
geom_vline(aes(
xintercept=changepoint),
data=change.df)
gg.change
## Plot the run-length encoding of the same data.
z.rle.vec <- rle(z.rep.vec)
chromEnd <- cumsum(z.rle.vec$lengths)
coverage.df <- data.frame(
chrom="chrUnknown",
chromStart=c(0L, chromEnd[-length(chromEnd)]),
chromEnd,
count=z.rle.vec$values)
gg.rle <- gg.change+
geom_segment(aes(
chromStart+0.5, count, xend=chromEnd+0.5, yend=count),
data=coverage.df)
gg.rle
## Fit a peak model and plot the segment means.
fit <- PeakSegDisk::PeakSegFPOP_df(coverage.df, 10.5)
gg.rle+
geom_segment(aes(
chromStart+0.5, mean, xend=chromEnd+0.5, yend=mean),
color="green",
data=fit$segments)
## Default plot method shows data as geom_step.
(gg <- plot(fit))
## Plot data as points to verify the step representation.
gg+
geom_point(aes(
position, count),
color="grey",
shape=1,
data=count.df)
PeakSeg penalized solver with caching
Description
Main function/interface for the PeakSegDisk package.
Run the low-level solver, PeakSegFPOP_file
,
on one genomic segmentation problem
directory, and read the result files into R. Actually, this
function will first check if the result files are already present
(and consistent), and if so, it will simply read them into R
(without running PeakSegFPOP_file
) – this is a caching mechanism
that can save a lot of time.
To run the algo on an integer vector, use PeakSegFPOP_vec
;
for a data.frame, use PeakSegFPOP_df
.
To compute the optimal model for a given number of peaks,
use sequentialSearch_dir
.
Usage
PeakSegFPOP_dir(problem.dir,
penalty.param, db.file = NULL)
Arguments
problem.dir |
Path to a directory like sampleID/problems/chrXX-start-end which
contains a coverage.bedGraph file with the aligned read counts for
one genomic segmentation problem. This must be a plain text file
with the following four columns: chrom (character chromosome
name), chromStart (integer start position), chromEnd (integer end
position), count (integer aligned read count on chrom from
chromStart+1 to chromEnd); see also
https://genome.ucsc.edu/goldenPath/help/bedgraph.html. Note that
the standard coverage.bedGraph file name is required; for full
flexibility the user can run the algo on an arbitrarily named file
via |
penalty.param |
non-negative numeric penalty parameter (larger values for fewer peaks), or character scalar which can be interpreted as such. 0 means max peaks, Inf means no peaks. |
db.file |
character scalar: file for writing temporary cost function database – there will be a lot of disk writing to this file. Default NULL means to write the same disk where the input bedGraph file is stored; another option is tempfile() which may result in speedups if the input bedGraph file is on a slow network disk and the temporary storage is a fast local disk. |
Details
Finds the optimal change-points using the Poisson loss
and the PeakSeg constraint (changes in mean alternate between
non-decreasing and non-increasing). For N
data points, the
functional pruning algorithm is O(\log N)
memory. It is
O(N \log N)
time and disk space. It computes the
exact solution to the optimization problem in
vignette("Examples", package="PeakSegDisk")
.
Value
Named list of two data.tables:
segments |
has one row for every segment in the optimal model, |
loss |
has one row and contains the following columns: |
- penalty
same as input parameter
- segments
number of segments in optimal model
- peaks
number of peaks in optimal model
- bases
number of positions described in bedGraph file
- bedGraph.lines
number of lines in bedGraph file
- total.loss
total Poisson loss =
\sum_i m_i-z_i*\log(m_i)
= mean.pen.cost*bases-penalty*peaks- mean.pen.cost
mean penalized cost = (total.loss+penalty*peaks)/bases
- equality.constraints
number of adjacent segment means that have equal values in the optimal solution
- mean.intervals
mean number of intervals/candidate changepoints stored in optimal cost functions – useful for characterizing the computational complexity of the algorithm
- max.intervals
maximum number of intervals
- megabytes
disk usage of *.db file
- seconds
timing of PeakSegFPOP_file
Author(s)
Toby Dylan Hocking <toby.hocking@r-project.org> [aut, cre]
Examples
data(Mono27ac, package="PeakSegDisk", envir=environment())
data.dir <- file.path(
tempfile(),
"H3K27ac-H3K4me3_TDHAM_BP",
"samples",
"Mono1_H3K27ac",
"S001YW_NCMLS",
"problems",
"chr11-60000-580000")
dir.create(data.dir, recursive=TRUE, showWarnings=FALSE)
write.table(
Mono27ac$coverage, file.path(data.dir, "coverage.bedGraph"),
col.names=FALSE, row.names=FALSE, quote=FALSE, sep="\t")
## Compute one model with penalty=1952.6
(fit <- PeakSegDisk::PeakSegFPOP_dir(data.dir, 1952.6))
summary(fit)#same as fit$loss
## Visualize that model.
ann.colors <- c(
noPeaks="#f6f4bf",
peakStart="#ffafaf",
peakEnd="#ff4c4c",
peaks="#a445ee")
if(require(ggplot2)){
lab.min <- Mono27ac$labels[1, chromStart]
lab.max <- Mono27ac$labels[.N, chromEnd]
plist <- coef(fit)
gg <- ggplot()+
theme_bw()+
geom_rect(aes(
xmin=chromStart/1e3, xmax=chromEnd/1e3,
ymin=-Inf, ymax=Inf,
fill=annotation),
color="grey",
alpha=0.5,
data=Mono27ac$labels)+
scale_fill_manual("label", values=ann.colors)+
geom_step(aes(
chromStart/1e3, count),
color="grey50",
data=Mono27ac$coverage)+
geom_segment(aes(
chromStart/1e3, mean,
xend=chromEnd/1e3, yend=mean),
color="green",
size=1,
data=plist$segments)+
geom_vline(aes(
xintercept=chromEnd/1e3, linetype=constraint),
color="green",
data=plist$changes)+
scale_linetype_manual(
values=c(
inequality="dotted",
equality="solid"))
print(gg)
print(gg+coord_cartesian(xlim=c(lab.min, lab.max)/1e3, ylim=c(0, 10)))
## Default plotting method only shows model.
print(gg <- plot(fit))
## Data can be added on top of model.
print(
gg+
geom_step(aes(
chromStart, count),
color="grey50",
data=Mono27ac$coverage)
)
}
PeakSegFPOP using disk storage
Description
Run the PeakSeg Functional Pruning Optimal Partitioning algorithm,
using a file on disk to store the O(N) function piece lists,
each of size O(log N).
This is a low-level function that just runs the algo
and produces the result files (without reading them into R),
so normal users are recommended to instead use PeakSegFPOP_dir
,
which calls this function then reads the result files into R.
Usage
PeakSegFPOP_file(bedGraph.file,
pen.str, db.file = NULL)
Arguments
bedGraph.file |
character scalar: tab-delimited tabular text file with four columns: chrom, chromStart, chromEnd, coverage. The algorithm creates a large temporary file in the same directory, so make sure that there is disk space available on that device. |
pen.str |
character scalar that can be converted to a numeric scalar via
as.numeric: non-negative penalty. More penalty means fewer
peaks. "0" and "Inf" are OK. Character is required rather than
numeric, so that the user can reliably find the results in the
output files, which are in the same directory as |
db.file |
character scalar: file for writing temporary cost function database – there will be a lot of disk writing to this file. Default NULL means to write the same disk where the input bedGraph file is stored; another option is tempfile() which may result in speedups if the input bedGraph file is on a slow network disk and the temporary storage is a fast local disk. |
Value
A named list of input parameters, and the temporary cost function database file size in megabytes.
Author(s)
Toby Dylan Hocking <toby.hocking@r-project.org> [aut, cre]
Examples
r <- function(chrom, chromStart, chromEnd, coverage){
data.frame(chrom, chromStart, chromEnd, coverage)
}
four <- rbind(
r("chr1", 0, 10, 2),
r("chr1", 10, 20, 10),
r("chr1", 20, 30, 14),
r("chr1", 30, 40, 13))
dir.create(prob.dir <- tempfile())
coverage.bedGraph <- file.path(prob.dir, "coverage.bedGraph")
write.table(
four, coverage.bedGraph,
sep="\t", row.names=FALSE, col.names=FALSE)
pstr <- "10.5"
result.list <- PeakSegDisk::PeakSegFPOP_file(coverage.bedGraph, pstr)
dir(prob.dir)
## segments file can be read to see optimal segment means.
outf <- function(suffix){
paste0(coverage.bedGraph, "_penalty=", pstr, suffix)
}
segments.bed <- outf("_segments.bed")
seg.df <- read.table(segments.bed)
names(seg.df) <- col.name.list$segments
seg.df
## loss file can be read to see optimal Poisson loss, etc.
loss.tsv <- outf("_loss.tsv")
loss.df <- read.table(loss.tsv)
names(loss.df) <- col.name.list$loss
loss.df
PeakSeg penalized solver for integer vector
Description
Convert integer data vector to run-length encoding,
then run PeakSegFPOP_df
.
Usage
PeakSegFPOP_vec(count.vec,
pen.num)
Arguments
count.vec |
integer vector, noisy non-negatve count data to segment. |
pen.num |
Non-negative numeric scalar. |
Value
List of solver results, same as PeakSegFPOP_dir
.
Author(s)
Toby Dylan Hocking <toby.hocking@r-project.org> [aut, cre]
Examples
## Simulate a sequence of Poisson data.
sim.seg <- function(seg.mean, size.mean=15){
seg.size <- rpois(1, size.mean)
rpois(seg.size, seg.mean)
}
set.seed(1)
seg.mean.vec <- c(1.5, 3.5, 0.5, 4.5, 2.5)
z.list <- lapply(seg.mean.vec, sim.seg)
z.rep.vec <- unlist(z.list)
## Plot the simulated data.
if(require(ggplot2)){
count.df <- data.frame(
position=seq_along(z.rep.vec),
count=z.rep.vec)
gg.count <- ggplot()+
geom_point(aes(
position, count),
shape=1,
data=count.df)
print(gg.count)
## Plot the true changepoints.
n.segs <- length(seg.mean.vec)
seg.size.vec <- sapply(z.list, length)
seg.end.vec <- cumsum(seg.size.vec)
change.vec <- seg.end.vec[-n.segs]+0.5
change.df <- data.frame(
changepoint=change.vec)
gg.change <- gg.count+
geom_vline(aes(
xintercept=changepoint),
data=change.df)
print(gg.change)
## Fit a peak model and plot it.
fit <- PeakSegDisk::PeakSegFPOP_vec(z.rep.vec, 10.5)
print(
gg.change+
geom_segment(aes(
chromStart+0.5, mean, xend=chromEnd+0.5, yend=mean),
color="green",
data=fit$segments)
)
## A pathological data set.
z.slow.vec <- 1:length(z.rep.vec)
fit.slow <- PeakSegDisk::PeakSegFPOP_vec(z.slow.vec, 10.5)
rbind(fit.slow$loss, fit$loss)
}
coef PeakSegFPOP df
Description
Create a list of data tables describing PeakSegFPOP model and data.
Usage
## S3 method for class 'PeakSegFPOP_df'
coef(object,
...)
Arguments
object |
object |
... |
... |
Value
list of data tables with named elements segments, loss, data, changes, peaks.
Author(s)
Toby Dylan Hocking <toby.hocking@r-project.org> [aut, cre]
coef PeakSegFPOP dir
Description
Compute changes and peaks to display/plot.
Usage
## S3 method for class 'PeakSegFPOP_dir'
coef(object,
...)
Arguments
object |
object |
... |
... |
Value
model list with additional named elements peaks and changes.
Author(s)
Toby Dylan Hocking <toby.hocking@r-project.org> [aut, cre]
col name list
Description
Named list of character vectors (column names of bed/bedGraph/tsv
files), used to read data files, which do not contain a header /
column names. Each name corresponds to a data/file type, and each
value is a character vector of column names expected in that
file. loss is for the coverage.bedGraph_penalty=VALUE_loss.tsv
file generated by PeakSegFPOP_file
; segments is for the
coverage.bedGraph_penalty=VALUE_segments.bed generated by
PeakSegFPOP_file
; coverage is for the coverage.bedGraph file which
is used as input to PeakSegFPOP_file
.
Usage
"col.name.list"
Examples
library(PeakSegDisk)
r <- function(chrom, chromStart, chromEnd, coverage){
data.frame(chrom, chromStart, chromEnd, coverage)
}
four <- rbind(
r("chr1", 0, 10, 2),
r("chr1", 10, 20, 10),
r("chr1", 20, 30, 14),
r("chr1", 30, 40, 13))
write.table(
four, tmp <- tempfile(),
sep="\t", row.names=FALSE, col.names=FALSE)
read.table(
tmp, col.names=col.name.list$coverage)
pstr <- "10.5"
PeakSegFPOP_file(tmp, pstr)
outf <- function(suffix){
paste0(tmp, "_penalty=", pstr, "_", suffix)
}
fread.first(outf("segments.bed"), col.name.list$segments)
fread.first(outf("loss.tsv"), col.name.list$loss)
Quickly read first line
Description
Read the first line of a text file. Useful for quickly checking if
the coverage.bedGraph_penalty=VALUE_segments.bed file is
consistent with the coverage.bedGraph_penalty=VALUE_loss.tsv
file. (used by the PeakSegFPOP_dir
caching mechanism)
Usage
fread.first(file.name,
col.name.vec)
Arguments
file.name |
Name of file to read. |
col.name.vec |
Character vector of column names. |
Value
Data table with one row.
Author(s)
Toby Dylan Hocking <toby.hocking@r-project.org> [aut, cre]
Examples
library(PeakSegDisk)
r <- function(chrom, chromStart, chromEnd, coverage){
data.frame(chrom, chromStart, chromEnd, coverage)
}
four <- rbind(
r("chr1", 0, 10, 2),
r("chr1", 10, 20, 10),
r("chr1", 20, 30, 14),
r("chr1", 30, 40, 13))
write.table(
four, tmp <- tempfile(),
sep="\t", row.names=FALSE, col.names=FALSE)
pstr <- "10.5"
PeakSegFPOP_file(tmp, pstr)
outf <- function(suffix){
paste0(tmp, "_penalty=", pstr, "_", suffix)
}
segments.bed <- outf("segments.bed")
first.seg.line <- fread.first(segments.bed, col.name.list$segments)
last.seg.line <- fread.last(segments.bed, col.name.list$segments)
loss.tsv <- outf("loss.tsv")
loss.row <- fread.first(loss.tsv, col.name.list$loss)
seg.bases <- first.seg.line$chromEnd - last.seg.line$chromStart
loss.row$bases == seg.bases
Quickly read last line
Description
Read the last line of a text file. Useful for quickly checking if
the coverage.bedGraph_penalty=VALUE_segments.bed file is
consistent with the coverage.bedGraph_penalty=VALUE_loss.tsv
file. (used by the PeakSegFPOP_dir
caching mechanism)
Usage
fread.last(file.name,
col.name.vec)
Arguments
file.name |
Name of file to read. |
col.name.vec |
Character vector of column names. |
Value
Data table with one row.
Author(s)
Toby Dylan Hocking <toby.hocking@r-project.org> [aut, cre]
Examples
library(PeakSegDisk)
r <- function(chrom, chromStart, chromEnd, coverage){
data.frame(chrom, chromStart, chromEnd, coverage)
}
four <- rbind(
r("chr1", 0, 10, 2),
r("chr1", 10, 20, 10),
r("chr1", 20, 30, 14),
r("chr1", 30, 40, 13))
write.table(
four, tmp <- tempfile(),
sep="\t", row.names=FALSE, col.names=FALSE)
pstr <- "10.5"
PeakSegFPOP_file(tmp, pstr)
outf <- function(suffix){
paste0(tmp, "_penalty=", pstr, "_", suffix)
}
segments.bed <- outf("segments.bed")
first.seg.line <- fread.first(segments.bed, col.name.list$segments)
last.seg.line <- fread.last(segments.bed, col.name.list$segments)
loss.tsv <- outf("loss.tsv")
loss.row <- fread.first(loss.tsv, col.name.list$loss)
seg.bases <- first.seg.line$chromEnd - last.seg.line$chromStart
loss.row$bases == seg.bases
plot PeakSegFPOP df
Description
Plot a PeakSeg model with attached data.
Usage
## S3 method for class 'PeakSegFPOP_df'
plot(x,
...)
Arguments
x |
x |
... |
... |
Value
a ggplot.
Author(s)
Toby Dylan Hocking <toby.hocking@r-project.org> [aut, cre]
plot PeakSegFPOP dir
Description
Plot a PeakSeg model with attached data.
Usage
## S3 method for class 'PeakSegFPOP_dir'
plot(x,
...)
Arguments
x |
x |
... |
... |
Value
a ggplot.
Author(s)
Toby Dylan Hocking <toby.hocking@r-project.org> [aut, cre]
Compute PeakSeg model with given number of peaks
Description
Compute the most likely peak model with at most the number of
peaks given by peaks.int
. This function repeated calls
PeakSegFPOP_dir
with different penalty values, until either
(1) it finds the peaks.int
model, or (2) it concludes that there
is no peaks.int
model, in which case it returns the next simplest
model (with fewer peaks than peaks.int
).
The first pair of penalty values (0, Inf) is run in parallel
via the user-specified future plan,
if the future.apply package is available.
Usage
sequentialSearch_dir(problem.dir,
peaks.int, verbose = 0)
Arguments
problem.dir |
problemID directory in which coverage.bedGraph has already been computed. If there is a labels.bed file then the number of incorrect labels will be computed in order to find the target interval of minimal error penalty values. |
peaks.int |
int: target number of peaks. |
verbose |
numeric verbosity: if >0 then |
Value
Same result list from PeakSegFPOP_dir
, with an additional
component "others" describing the other models that were computed
before finding the optimal model with peaks.int
(or fewer)
peaks. Additional loss columns are as follows: under=number of
peaks in smaller model during binary search; over=number of peaks
in larger model during binary search; iteration=number of times
PeakSegFPOP has been run.
Author(s)
Toby Dylan Hocking <toby.hocking@r-project.org> [aut, cre]
Examples
## Create simple 6 point data set discussed in supplementary
## materials. GFPOP/GPDPA computes up-down model with 2 peaks, but
## neither CDPA (PeakSegDP::cDPA) nor PDPA (jointseg)
r <- function(chrom, chromStart, chromEnd, coverage){
data.frame(chrom, chromStart, chromEnd, coverage)
}
supp <- rbind(
r("chr1", 0, 1, 3),
r("chr1", 1, 2, 9),
r("chr1", 2, 3, 18),
r("chr1", 3, 4, 15),
r("chr1", 4, 5, 20),
r("chr1", 5, 6, 2)
)
data.dir <- file.path(tempfile(), "chr1-0-6")
dir.create(data.dir, recursive=TRUE)
write.table(
supp, file.path(data.dir, "coverage.bedGraph"),
sep="\t", row.names=FALSE, col.names=FALSE)
## register a parallel future plan to compute the first two
## penalties in parallel during the sequential search.
if(interactive() && requireNamespace("future"))future::plan("multisession")
## Compute optimal up-down model with 2 peaks via sequential search.
fit <- PeakSegDisk::sequentialSearch_dir(data.dir, 2L)
if(require(ggplot2)){
ggplot()+
theme_bw()+
geom_point(aes(
chromEnd, coverage),
data=supp)+
geom_segment(aes(
chromStart+0.5, mean,
xend=chromEnd+0.5, yend=mean),
data=fit$segments,
color="green")
}
summary PeakSegFPOP dir
Description
Summary of PeakSegFPOP_dir
object
.
Usage
## S3 method for class 'PeakSegFPOP_dir'
summary(object,
...)
Arguments
object |
object |
... |
... |
Value
Data table with one row and columns describing model summary.
Author(s)
Toby Dylan Hocking <toby.hocking@r-project.org> [aut, cre]
wc2int
Description
Convert wc output to integer number of lines.
Usage
wc2int(wc.output)
Arguments
wc.output |
Character scalar: output from wc. |
Value
integer
Author(s)
Toby Dylan Hocking <toby.hocking@r-project.org> [aut, cre]
Write bedGraph file
Description
Write a data.frame in R to a bedGraph file on disk. This must be a plain text file with the following four columns: chrom (character chromosome name), chromStart (integer start position), chromEnd (integer end position), count (integer aligned read count on chrom from chromStart+1 to chromEnd); see also https://genome.ucsc.edu/goldenPath/help/bedgraph.html
Usage
writeBedGraph(count.df,
coverage.bedGraph)
Arguments
count.df |
data.frame with four columns: chrom, chromStart, chromEnd, count. |
coverage.bedGraph |
file path where data will be saved in plain text / bedGraph format. |
Value
NULL (same as write.table).
Author(s)
Toby Dylan Hocking <toby.hocking@r-project.org> [aut, cre]
Examples
library(PeakSegDisk)
data(Mono27ac, envir=environment())
coverage.bedGraph <- file.path(
tempfile(),
"H3K27ac-H3K4me3_TDHAM_BP",
"samples",
"Mono1_H3K27ac",
"S001YW_NCMLS",
"problems",
"chr11-60000-580000",
"coverage.bedGraph")
dir.create(
dirname(coverage.bedGraph),
recursive=TRUE, showWarnings=FALSE)
writeBedGraph(Mono27ac$coverage, coverage.bedGraph)
fread.first(coverage.bedGraph, col.name.list$coverage)
fread.last(coverage.bedGraph, col.name.list$coverage)