Title: | RNAseq Visualization Automation |
Version: | 0.0.5 |
Description: | Automate downstream visualization & pathway analysis in RNAseq analysis. 'RVA' is a collection of functions that efficiently visualize RNAseq differential expression analysis result from summary statistics tables. It also utilize the Fisher's exact test to evaluate gene set or pathway enrichment in a convenient and efficient manner. |
Maintainer: | Xingpeng Li <xingpeng.li@pfizer.com> |
URL: | https://github.com/THERMOSTATS/RVA |
License: | GPL-2 |
Encoding: | UTF-8 |
LazyData: | true |
RoxygenNote: | 7.1.2 |
Suggests: | knitr, rmarkdown |
VignetteBuilder: | knitr |
Imports: | GSVAdata (≥ 1.22.0), clusterProfiler (≥ 3.15.1), data.table (≥ 1.12.8), edgeR (≥ 3.28.1), org.Hs.eg.db (≥ 3.10.0), ComplexHeatmap (≥ 2.2.0), GSEABase (≥ 1.48.0), circlize (≥ 0.4.10), dplyr (≥ 1.0.0), ggplot2 (≥ 3.3.2), ggpubr (≥ 0.4.0), grid (≥ 3.6.1), gridExtra (≥ 2.3), haven (≥ 2.3.1), msigdbr (≥ 7.1.1), plotly (≥ 4.9.2.1), purrr (≥ 0.3.4), rWikiPathways (≥ 1.6.1), stringr (≥ 1.4.0), tibble, tidyr (≥ 1.1.0), XML, rlang |
Depends: | R (≥ 2.10) |
NeedsCompilation: | no |
Packaged: | 2021-11-01 20:48:52 UTC; lix410 |
Author: | Xingpeng Li |
Repository: | CRAN |
Date/Publication: | 2021-11-01 21:40:02 UTC |
This is data to be included in package
Description
This is data to be included in package
Usage
Sample_disease_gene_set
Format
An example disease gene set from summary statistics table as dataframe, row names are gene ID the summary statistics can be calculated from disease vs healthy, which is this example.
- logFC
log2 fold change from comparison
- AveExpr
Average expression for this gene
- P.Value
p value
- adj.P.Val
adjusted p value or FDR
...
This is data to be included in package
Description
This is data to be included in package
Usage
Sample_summary_statistics_table
Format
An example summary statistics table as dataframe, row names are gene ID
- logFC
log2 fold change from comparison
- AveExpr
Average expression for this gene
- P.Value
p value
- adj.P.Val
adjusted p value or FDR
...
This is data to be included in package
Description
This is data to be included in package
Usage
Sample_summary_statistics_table1
Format
Second example summary statistics table as dataframe, row names are gene ID
- logFC
log2 fold change from comparison
- AveExpr
Average expression for this gene
- P.Value
p value
- adj.P.Val
adjusted p value or FDR
...
This is data to be included in package
Description
This is data to be included in package
Usage
c2BroadSets
Format
GeneSetCollection
- Genesetcollection
GeneSetCollection from BroadCollection
calculate pathway scores
Description
Calculate pathway scores
Usage
cal.pathway.scores(
data,
pathway.db,
gene.id.type,
FCflag,
FDRflag,
FC.cutoff,
FDR.cutoff,
OUT.Directional = NULL,
IS.list = FALSE,
customized.pathways,
...
)
Arguments
data |
A summary statistics table (data.frame) or |
pathway.db |
pathway database used |
gene.id.type |
gene.id.type |
FCflag |
The column name (character) of fold change information, assuming the FC is log2 transformed. Default = "logFC". |
FDRflag |
The column name (character) of adjusted p value or FDR. Default = "adj.P.Val". |
FC.cutoff |
The fold change cutoff (numeric) selected to subset summary statistics table. Default = 1.5. |
FDR.cutoff |
The FDR cutoff selected (numeric) to subset summary statistics table. Default = 0.05. |
OUT.Directional |
logical, whether output directional or non-directional pathway analysis result, default: NULL. |
IS.list |
logical, whether the input is a list, default: NULL |
customized.pathways |
the customized pathways in the format of two column dataframe to be used in analysis |
... |
pass over parameters |
Value
Returns a dataframe.
References
Xingpeng Li & Siddhartha Pachhai RVA - RNAseq Visualization Automation tool.
Calculate CFB
Description
This function calculates the change from baseline.
Usage
calc.cfb(data, annot, baseline.flag, baseline.val)
Arguments
data |
Dataframe with subject id, annotation flag, gene id and cpm value (from count tables) columns. |
annot |
A long-format dataframe with any pertinent treatment data about
the samples. The only required column is one titled the |
baseline.flag |
A character vector of column names. These columns in |
baseline.val |
A character vector of values. This vector must be the
same length as |
This is data to be included in package
Description
This is data to be included in package
Usage
count_table
Format
An example count table where row names are gene ID, each column is a sample
- counttable
count table
...
DL Pathways DB
Description
Download gene database for enrichment.
Usage
dlPathwaysDB(pathway.db, customized.pathways = NULL, ...)
Arguments
pathway.db |
The databse to be used for encrichment analysis. Can be one of the following, "rWikiPathways", "KEGG", "REACTOME", "Hallmark","rWikiPathways_aug_2020" |
customized.pathways |
the user provided pathway added for analysis. |
... |
pass over parameters |
Value
Returns a dataframe.
References
Xingpeng Li & Siddhartha Pachhai RVA - RNAseq Visualization Automation tool.
Get CPM Colors
Description
This function creates the color gradient for the cpm data.
Usage
get.cpm.colors(data)
Arguments
data |
The CPM dataset. |
Create ggplot object for number of differntially expressed genes with different FDR and fold change cutoff.
Description
This function processes dataframe from plot_cutoff_single function and produces a ggplot object which depicts the number of differntially expressed genes with different FDR and fold change cutoff.
Usage
get.cutoff.df(datin, pvalues, FCs, FCflag = "logFC", FDRflag = "adj.P.Val")
Arguments
datin |
Dataframe from plot_cutoff_single. |
pvalues |
A set of p-values for FDR cutoff to be checked. |
FCs |
A set of fold change cutoff to be checked. |
FCflag |
The column name of the log2FC in the summary statistics table. |
FDRflag |
The column name of the False Discovery Rate (FDR) in the summary statistics table. |
Create ggplot object for number of differntially expressed genes with different FDR and fold change cutoff.
Description
This function processes dataframe from plot_cutoff_single function and produces a ggplot object which depicts the number of differntially expressed genes with different FDR and fold change cutoff.
Usage
get.cutoff.ggplot(df, FCflag, FDRflag)
Arguments
df |
Dataframe from plot_cutoff_single. |
FCflag |
The column name of the log2FC in the summary statistics table. |
FDRflag |
The column name of the False Discovery Rate (FDR) in the summary statistics table. |
Create plotly object for number of DE genes at different cutoff combinations
Description
This function processes summary statistics table generated by differential expression analysis
like limma
or DESeq2
to produce an interactibe visual object
which depicts the number of differntially expressed genes with different FDR and
fold change cutoff.
Usage
make.cutoff.plotly(df)
Arguments
df |
Summary statistics table from limma or DEseq2, where each row is a gene. |
Multi Plot
Description
Multi plot is for directional and non-directional plots
Usage
multiPlot(allID, backup.d.sig, nd.res, ...)
Arguments
allID |
A vector of all pathway ID's from directional and non directional enriched datasets. |
backup.d.sig |
A dataframe type of object with directional pathways data prior to any cutoff's being applied |
nd.res |
A dataframe type of object with non directional pathways data prior to any cutoff's being applied |
... |
pass on variables |
Details
Multi plot is for directional and non-directional plots, when one of the plots doesn't contain observations.
Value
Returns ggplot.
References
Xingpeng Li & Siddhartha Pachhai RVA - RNAseq Visualization Automation tool.
Null Return
Description
The function takes in a boolean value and a numeric value, which it uses to decide what to output.
Usage
nullreturn(IS.list, type = 1)
Arguments
IS.list |
Indicator of whether the data frame being input is list or not. |
type |
If type = 1(default) return directional null plot. If type = 2 return non directional null plot. |
Details
nullreturn is a function that returns NULL for single df inputs that don't hold true for threshold values. It returns an empty dataframe for list inputs which don't satisfy the cutoff's
Value
The function returns either returns a data frame or the value NULL.
References
Xingpeng Li & Siddhartha Pachhai RVA - RNAseq Visualization Automation tool.
Check number of DE genes at different cutoff combinations
Description
This function processes summary statistics table generated by differential expression analysis
like limma
or DESeq2
to evaluate the number of differntially expressed genes with different FDR and
fold change cutoff.
Usage
plot_cutoff(
data = data,
comp.names = NULL,
FCflag = "logFC",
FDRflag = "adj.P.Val",
FCmin = 1.2,
FCmax = 2,
FCstep = 0.1,
p.min = 0,
p.max = 0.2,
p.step = 0.01,
plot.save.to = NULL,
gen.3d.plot = TRUE,
gen.plot = TRUE
)
Arguments
data |
Summary statistics table or a list of summary statistics tables from limma or DEseq2, where each row is a gene. |
comp.names |
A character vector that contains the comparison names which correspond to the same order as |
FCflag |
The column name of the log2FC in the summary statistics table. Default = "logFC". |
FDRflag |
The column name of the False Discovery Rate (FDR) in the summary statistics table. Default = "adj.P.Val". |
FCmin |
The minimum starting fold change cutoff to be checked, so the minimum fold change cutoff to be evaluated will be FCmin + FCstep, FCmin default = 1. |
FCmax |
The maximum fold change cutoff to be checked, default = 2. |
FCstep |
The step from the minimum to maximum fold change cutoff, one step increase at a time, default = 0.01. |
p.min |
The minimum starting FDR cutoff to be checked, so the minimum fold change cutoff to be evaluated will be p.min + p.step, p.min default = 0. |
p.max |
The maximum FDR cutoff to be checked, default = 0.2. |
p.step |
The step from the minimum to maximum fold change cutoff, one step increase at a time, default = 0.005. |
plot.save.to |
The address where to save the plot from simplified cutoff combination with FDR of 0.01, 0.05, 0.1, and 0.2. |
gen.3d.plot |
Whether generate a 3d plotly object to visualize the result, only applys to single dataframe input, default = F. |
gen.plot |
Whether generate a plot to visualize the result, default = T. |
Details
The function takes the summary statistics and returns a list which contains 3 objects: a table which describes the number of DE genes with different cutoff combinations of FDR and fold change, a ggplot object which depicts a simplified version of cutoff selection combination, and a plotly 3d visulization object which depicts a high resolution of cutoff combinations. The default range of the fold change is from 1 to 2, and p value is from 0 to 0.2, with the step of 0.01 for FC and 0.005 for FDR.
Value
If the input data
is a data list, then a multi-facet ggplot plot object which contains each
of the summary statistics table will be returned; otherwise, if the input data
is a data frame, then the function will return a list which contains 3 elements:
df.sub |
A dataframe, which contains the number of genes(3rd column) with FDR (1st column), Fold Change (2nd column) |
plot3d |
A plotly object to show the 3d illustration of all possible cutoff selectiosn and the number of DE genes in the 3d surface |
gp |
A ggplot object to show the simplified cutoff combination result |
References
Xingpeng Li & Olya Besedina, RVA - RNAseq Visualization Automation tool.
Examples
plot_cutoff(Sample_summary_statistics_table)
plot_cutoff(data = list(Sample_summary_statistics_table, Sample_summary_statistics_table1),
comp.names = c("A", "B"))
Create plotly object for number of DE genes at different cutoff combinations
Description
This function processes summary statistics table generated by differential expression analysis
like limma
or DESeq2
and produces a table which contains gene counts
for each of the pvalue and FC combination
Usage
plot_cutoff_single(datin, FCflag, FDRflag, FCs, pvalues)
Arguments
datin |
Summary statistics table from limma or DEseq2, where each row is a gene. |
FCflag |
The column name of the log2FC in the summary statistics table. |
FDRflag |
The column name of the False Discovery Rate (FDR) in the summary statistics table. |
FCs |
A set of fold change cutoff to be checked. |
pvalues |
A set of p-values for FDR cutoff to be checked. |
Plot gene expression
Description
This is the function to process the gene count table to show gene expression variations over time or across groups.
Usage
plot_gene(
data = ~dat,
anno = ~meta,
gene.names = c("AAAS", "A2ML1", "AADACL3"),
ct.table.id.type = "ENSEMBL",
gene.id.type = "SYMBOL",
treatment = "Treatment",
sample.id = "sample_id",
time = "day",
log.option = TRUE,
plot.save.to = NULL,
input.type = "count"
)
Arguments
data |
Count table in the format of dataframe with gene id as row.names. |
anno |
Annotation table that provides design information. |
gene.names |
Genes to be visualized, in the format of character vector. |
ct.table.id.type |
The gene id format in |
gene.id.type |
The gene id format of |
treatment |
The column name to specify treatment groups. |
sample.id |
The column name to specify sample IDs. |
time |
The column name to specify different time points. |
log.option |
Logical option, whether to log2 transform the CPM as y-axis. Default = True. |
plot.save.to |
The address to save the plot from simplified cutoff combination with FDR of 0.01, 0.05, 0.1, and 0.2. |
input.type |
One of |
Details
The function takes the gene counts and returns a ggplot that shows gene expression variation over time or group.
Value
The function returns a ggplot object.
References
Xingpeng Li,Tatiana Gelaf Romer & Aliyah Olaniyan, RVA - RNAseq Visualization Automation tool.
Examples
plot_gene(data = count_table,
anno = sample_annotation)
Plot a CFB Heatmap
Description
An alias for plot_heatmap.expr(annot, cpm, fill = "CFB", ...)
.
Usage
plot_heatmap.cfb(cpm, annot, title = "RVA CFB Heatmap", ...)
Arguments
cpm |
cpm data |
annot |
A long-format dataframe with any pertinent treatment data about
the samples. The only required column is one titled the |
title |
A title for the heatmap. Default = "RVA Heatmap". |
... |
pass over parameters |
Plot a CPM Heatmap
Description
An alias for plot_heatmap.expr(annot, cpm, fill = "CPM", ...)
.
Usage
plot_heatmap.cpm(cpm, annot, title = "RVA CPM Heatmap", ...)
Arguments
cpm |
cpm data |
annot |
A long-format dataframe with any pertinent treatment data about
the samples. The only required column is one titled the |
title |
A title for the heatmap. Default = "RVA Heatmap". |
... |
pass over parameters |
Plot Heatmap From Raw CPM
Description
Create a heatmap with either CFB or CPM averaged across individual samples.
Usage
plot_heatmap.expr(
data = ~count,
annot = ~meta,
sample.id = "sample_id",
annot.flags = c("day", "Treatment", "tissue"),
ct.table.id.type = "ENSEMBL",
gene.id.type = "SYMBOL",
gene.names = NULL,
gene.count = 10,
title = "RVA Heatmap",
fill = "CFB",
baseline.flag = "day",
baseline.val = "0",
plot.save.to = NULL,
input.type = "count"
)
Arguments
data |
A wide-format dataframe with geneid rownames, sample column
names, and fill data matching |
annot |
A long-format dataframe with any pertinent treatment data about
the samples. The only required column is one titled the |
sample.id |
The column name to specify sample ID. |
annot.flags |
A vector of column names corresponding to column names
in |
ct.table.id.type |
The gene id format in |
gene.id.type |
The gene id format of |
gene.names |
A character vector or list of ensembl IDs for which to
display gene information. If |
gene.count |
The number of genes to include, where genes are selected
based on ranking by values in |
title |
A title for the heatmap. Default = "RVA Heatmap". |
fill |
One of |
baseline.flag |
A character vector of column names. If |
baseline.val |
A character vector of values. This vector must be the
same length as |
plot.save.to |
The address to save the heatmap plot. |
input.type |
One of |
Details
The function takes raw CPM data and returns both a list containing a data frame with values based on the fill parameter and a heatmap plot.
Value
The function returns a list with 2 items:
df.sub |
"A data frame of change from baselines values (fill = CFB in this example) for each gene id that is divided by a combination of treatment group and time point |
gp |
A Heatmap object from ComplexHeatmap which can be plotted |
References
Xingpeng Li,Tatiana Gelaf Romer & Aliyah Olaniyan, RVA - RNAseq Visualization Automation tool.
Examples
plot <- plot_heatmap.expr(data = count_table[,1:20],annot = sample_annotation[1:20,])
Pathway analysis and visualization
Description
This is the function to do pathway enrichment analysis (and visualization) with rWikipathways (also KEGG, REACTOME & Hallmark) from a summary statistics table generated by
differential expression analysis like limma
or DESeq2
.
Usage
plot_pathway(
data = ~df,
comp.names = NULL,
gene.id.type = "ENSEMBL",
FC.cutoff = 1.2,
FDR.cutoff = 0.05,
FCflag = "logFC",
FDRflag = "adj.P.Val",
Fisher.cutoff = 0.1,
Fisher.up.cutoff = 0.1,
Fisher.down.cutoff = 0.1,
plot.save.to = NULL,
pathway.db = "rWikiPathways",
customized.pathways = NULL,
...
)
Arguments
data |
A summary statistics table (data.frame) or |
comp.names |
A character vector containing the comparison names corresponding to the same order of the |
gene.id.type |
The gene id format in |
FC.cutoff |
The fold change cutoff (numeric) selected to subset summary statistics table. Default = 1.5. |
FDR.cutoff |
The FDR cutoff selected (numeric) to subset summary statistics table. Default = 0.05. |
FCflag |
The column name (character) of fold change information, assuming the FC is log2 transformed. Default = "logFC". |
FDRflag |
The column name (character) of adjusted p value or FDR. Default = "adj.P.Val". |
Fisher.cutoff |
The FDR cutoff selected (numeric) for the pathway enrichment analysis' Fisher's exact test with all determined
Differentially Expressed (DE) genes by |
Fisher.up.cutoff |
The FDR cutoff selected (numeric) for the pathway enrichment analysis' Fisher's exact test with the upregulated gene set. |
Fisher.down.cutoff |
The FDR cutoff selected (numeric) for the pathway enrichment analysis' Fisher's exact test with the downregulated gene set. |
plot.save.to |
The address to save the plot from simplified cutoff combination with FDR of 0.01, 0.05, 0.1, and 0.2. |
pathway.db |
The databse to be used for encrichment analysis. Can be one of the following, "rWikiPathways", "KEGG", "REACTOME", "Hallmark","rWikiPathways_aug_2020". |
customized.pathways |
the customized pathways in the format of two column dataframe (column name as "gs_name" and "entrez_gene") to be used in analysis. |
... |
pass on variables |
Details
The function takes the summary statistics table and use user selected parameter based on check.cutoff to do pathway enrichment analysis
Value
The function returns a list of 5 objects:
1 |
result table from directional pathway enrichment analysis |
2 |
result table from non-directional pathway enrichment analysis |
3 |
plot from directional pathway enrichment analysis |
4 |
plot from non-directional pathway enrichment analysis |
5 |
plot combining both directional and non-directional plot |
References
Xingpeng Li & Siddhartha Pachhai RVA - RNAseq Visualization Automation tool.
Examples
result <- plot_pathway(data = Sample_summary_statistics_table,
gene.id.type = "ENSEMBL",
FC.cutoff = 1.5,
p.cutoff = 0.05,
pathway.db = "rWikiPathways_aug_2020"
)
Plot qqplot
Description
This function generates a QQ-plot object with confidence interval from summary statistics table generated by differential expression analysis
like limma
or DESeq2
.
Usage
plot_qq(
data = data,
comp.names = NULL,
p.value.flag = "P.Value",
ci = 0.95,
plot.save.to = NULL
)
Arguments
data |
Summary statistics table or a list that contains multiple summary statistics tables from limma or DEseq2, where each row is a gene. |
comp.names |
A character vector that contains the comparison names which correspond to the same order as |
p.value.flag |
The column name of |
ci |
Confidence interval. Default = 0.95 |
plot.save.to |
The file name and the address where to save the qq-plot "~/address_to_folder/qqplot.png". Default = NULL. |
Details
The function produces the qqplot to evaluate the result from differential expression analysis. The output is a ggplot object.
Value
The function return a ggplot object of qqplot
References
Xingpeng Li & Tatiana Gelaf Romer & Olya Besedina, RVA - RNAseq Visualization Automation tool.
Examples
plot_qq(data = Sample_summary_statistics_table)
plot_qq(data = list(Sample_summary_statistics_table, Sample_summary_statistics_table1),
comp.names = c("A","B"))
Plot volcanoplot
Description
This function processes the summary statistics table generated by differential expression analysis
like limma
or DESeq2
to show on the volcano plot with the highlight gene set option (like disease
related genes from Disease vs Healthy comparison).
Usage
plot_volcano(
data = data,
comp.names = NULL,
geneset = NULL,
geneset.FCflag = "logFC",
highlight.1 = NULL,
highlight.2 = NULL,
upcolor = "#FF0000",
downcolor = "#0000FF",
plot.save.to = NULL,
xlim = c(-4, 4),
ylim = c(0, 12),
FCflag = "logFC",
FDRflag = "adj.P.Val",
highlight.FC.cutoff = 1.5,
highlight.FDR.cutoff = 0.05,
title = "Volcano plot",
xlab = "log2 Fold Change",
ylab = "log10(FDR)"
)
Arguments
data |
Summary statistics table or a list contain multiple summary statistics tables from limma or DEseq2, where each row is a gene. |
comp.names |
A character vector that contains the comparison names which correspond to the same order as |
geneset |
Summary statistic table that contains the genes which needed to be highlighted, the gene name format (in row names) needs to be consistent with the main summary statistics table). For example, this summary statistics table could be the output summary statistics table from the Disease vs Healthy comparison (Only contains the subsetted significant genes to be highlighted). |
geneset.FCflag |
The column name of fold change in |
highlight.1 |
Genes to be highlighted, in the format of a vector consists of gene names. The gene name format needs to be consistent to the main summary statistics table. |
highlight.2 |
Genes to be highlighted, in the format of a vector consists of gene names. The gene name format needs to be consistent to the main summary statistics table. |
upcolor |
The color of the gene names in |
downcolor |
The color of the gene names in |
plot.save.to |
The file name and address where to save the volcano plot, e.g. "~/address_to_folder/volcano_plot.png". |
xlim |
Range of x axis. Default = |
ylim |
Range of x axis. Default = |
FCflag |
Column name of log2FC in the summary statistics table. Default = "logFC". |
FDRflag |
Column name of FDR in the summary statistics table. Default = "adj.P.Val". |
highlight.FC.cutoff |
Fold change cutoff line want to be shown on the plot. Default = 1.5. |
highlight.FDR.cutoff |
FDR cutoff shades want to be shown on the plot. Default = 0.05. |
title |
The plot title. Default "Volcano plot". |
xlab |
The label for x-axis. Default "log2 Fold Change". |
ylab |
The label for y-axis. Default "log10(FDR)". |
Details
The function takes the summary statistics table and returns a ggplot, with the option to highlight genes, e.g. disease signature genes, the genes which are up-regulated and down-regulated in diseased subjects.
Value
The function return a volcano plot as a ggplot object.
References
Xingpeng Li & Tatiana Gelaf Romer & Olya Besedina, RVA - RNAseq Visualization Automation tool.
Examples
plot_volcano(data = Sample_summary_statistics_table,
geneset = Sample_disease_gene_set)
plot_volcano(data = list(Sample_summary_statistics_table, Sample_summary_statistics_table1),
comp.names = c("A", "B"),
geneset = Sample_disease_gene_set)
Pretty Graphs
Description
Special cases where list input and at least one treatment has signal but others don't.
Usage
prettyGraphs(vizdf, ...)
Arguments
vizdf |
A dataframes of enriched pathways. |
... |
pass on variables |
Details
Pretty Graphs is a function specifically meant to be in cases where one of the input treatments meet cutoff, but one or more of the other treatments don't meet the cutoff values. This is important so that ggplot doesn't throw any errors.
Value
Returns a dataframe.
References
Xingpeng Li & Siddhartha Pachhai RVA - RNAseq Visualization Automation tool.
Create a message about fold change and pvalues used to produce the plot.
Description
This function processes summary statistics table generated by differential expression analysis
like limma
or DESeq2
and produces a message about pvalues and fold change used.
Usage
produce.cutoff.message(
data,
FCmin,
FCmax,
FCstep,
FDRflag,
p.min,
p.max,
p.step
)
Arguments
data |
Summary statistics table from limma or DEseq2, where each row is a gene. |
FCmin |
The minimum starting fold change cutoff to be checked, so the minimum fold change cutoff to be evaluated will be FCmin + FCstep, FCmin default = 1. |
FCmax |
The maximum fold change cutoff to be checked, default = 2. |
FCstep |
The step from the minimum to maximum fold change cutoff, one step increase at a time, default = 0.01. |
FDRflag |
The column name of the False Discovery Rate (FDR) in the summary statistics table. |
p.min |
The minimum starting FDR cutoff to be checked, so the minimum fold change cutoff to be evaluated will be p.min + p.step, p.min default = 0. |
p.max |
The maximum FDR cutoff to be checked, default = 0.2. |
p.step |
The step from the minimum to maximum fold change cutoff, one step increase at a time, default = 0.005. |
Create a warning about pvalue or FDR minimum value
Description
This function processes summary statistics table generated by differential expression analysis
like limma
or DESeq2
and produces a warning about pvalue or FDR minimum value
Usage
produce.cutoff.warning(data, FDRflag)
Arguments
data |
Summary statistics table from limma or DEseq2, where each row is a gene. |
FDRflag |
The column name of the False Discovery Rate (FDR) in the summary statistics table. |
Reformat Ensembl GeneIDs
Description
This is the function to exclude the version number from the input ensembl type gene ids.
This is the function to exclude the version number from the input ensembl type gene ids.
Usage
reformat.ensembl(logcpm, ct.table.id.type)
reformat.ensembl(logcpm, ct.table.id.type)
Arguments
logcpm |
The input count table transformed into log counts per million. |
ct.table.id.type |
The gene id format in |
This is data to be included in package
Description
This is data to be included in package
Usage
sample_annotation
Format
Sample annotation document
- sample_id
sample name
- tissue
tissue for comparison
- subject_id
subject id
- day
time points
...
This is data to be included in package
Description
This is data to be included in package
Usage
sample_count_cpm
Format
An example cpm table where row names are gene ID, each column is a sample
- counttable
count cpm table
...
Second Cutoff Error
Description
The function takes in a list of dataframe, comp names and a specified type, to output a dataframe styled for ggplot.
Usage
secondCutoffErr(df, comp.names, TypeQ = 1)
Arguments
df |
A list of dataframes. |
comp.names |
a character vector contain the comparison names corresponding to the same order to the |
TypeQ |
If type = 1(default) return directional null plot. If type = 2 return non directional null plot. |
Details
secondCutoffErr is a function specifically meant to be used for list inputs. It is used for cases where after applying filter to the data, one of the comparison ID gets left out, this adversely effects the ggplot
Value
Returns a dataframe.
References
Xingpeng Li & Siddhartha Pachhai RVA - RNAseq Visualization Automation tool.
Transform GeneIDs
Description
This is the function to transform the input gene id type to another gene id type.
This is the function to transform the input gene id type to another gene id type.
Usage
## S3 method for class 'geneid'
transform(gene.names, from = ~gene.id.type, to = ~ct.table.id.type)
## S3 method for class 'geneid'
transform(gene.names, from = ~gene.id.type, to = ~ct.table.id.type)
Arguments
gene.names |
Genes,in the format of character vector, to be transformed. |
from |
The gene id format of |
to |
The new gene id format should be one of: ACCNUM, ALIAS, ENSEMBL, ENSEMBLPROT, ENSEMBLTRANS, ENTREZID, ENZYME, EVIDENCE, EVIDENCEALL, GENENAME, GO, GOALL, IPI, MAP, OMIM, ONTOLOGY, ONTOLOGYALL, PATH, PFAM, PMID, PROSITE, REFSEQ, SYMBOL, UCSCKG, UNIGENE, UNIPROT. |
Validate Foldchange
Description
This function ensures the fold change minimum, maximum, and step are valid.
Usage
validate.FC(FCmin, FCmax, FCstep)
Arguments
FCmin |
The minimum starting fold change cutoff to be checked, so the minimum fold change cutoff to be evaluated will be FCmin + FCstep, FCmin default = 1. |
FCmax |
The maximum fold change cutoff to be checked, default = 2. |
FCstep |
The step from the minimum to maximum fold change cutoff, one step increase at a time, default = 0.01. |
Details
Specifically it checks that the FCmax is greater than the FCmin, that at least 1 FCstep can fit within the FCmax and FCmin, that FCmax and FCmin values are non-negative, and that FCstep is positive.
References
Xingpeng Li, Tatiana Gelaf Romer & Siddhartha Pachhai RVA - RNAseq Visualization Automation tool.
Validate Annotation Table
Description
Ensure that an annotation has all of the required columns.
Usage
validate.annot(
data,
annot,
annot.flags,
sample.id,
fill = "CPM",
baseline.flag = NULL,
baseline.val = NULL
)
Arguments
data |
The input count data. |
annot |
The annotation dataframe. |
annot.flags |
The vector of annotation flags passed by the user. |
sample.id |
Sample id label to check if in annot. |
fill |
The fill value indicated by the user,"count" or "CPM". |
baseline.flag |
The baseline.flag passed by the user. |
baseline.val |
The baseline value passed by the user. |
Details
The function will check the following:
The
annot.flags
values are columns inannot
If
fill
= "cfb": validate thebaseline.flag
andbaseline.val
parameters.-
sample.id
is a column inannot
.
References
Xingpeng Li, Tatiana Gelaf Romer & Siddhartha Pachhai RVA - RNAseq Visualization Automation tool.
Validate Baseline Values
Description
Ensures that user-input baseline.val
and baseline.flag
parameters are valid with respect to the annot
dataframe.
Usage
validate.baseline(annot, baseline.val, baseline.flag)
Arguments
annot |
The annotation dataframe. |
baseline.val |
The baseline value passed by the user. |
baseline.flag |
The baseline.flag passed by the user. |
Details
Specifically, validates that baseline.flag
value(s) are columns
in annot
, and that baseline.val
value(s) occur at least once in
their respective baseline.flag
columns.
References
Xingpeng Li, Tatiana Gelaf Romer & Siddhartha Pachhai RVA - RNAseq Visualization Automation tool.
Check Summary Statistics Required Column Types
Description
FCflag
and FDRflag
must be numeric.
Usage
validate.col.types(datin, name = 1, flags)
Arguments
datin |
the summary statistics file. |
name |
summary statistics file position indicator |
flags |
|
References
Xingpeng Li, Tatiana Gelaf Romer & Siddhartha Pachhai RVA - RNAseq Visualization Automation tool.
Validate Comp Names
Description
This function ensures that when a list of data frames are used as input the the number of comp names are the same as the number of data frames.
Usage
validate.comp.names(comp.names, data)
Arguments
comp.names |
a character vector contain the comparison names corresponding to the same order to the |
data |
summary statistics table (data.frame) from limma or DEseq2, where rownames are gene id. |
References
Xingpeng Li, Tatiana Gelaf Romer & Siddhartha Pachhai RVA - RNAseq Visualization Automation tool.
Validate Data Input
Description
Ensures that the data input has the required formatting.
Usage
validate.data(data)
Arguments
data |
The wide-format dataframe with input data. |
Details
Specifically, checks if data
has rownaems and that all other
columns can be coerced to numeric.
References
Xingpeng Li, Tatiana Gelaf Romer & Siddhartha Pachhai RVA - RNAseq Visualization Automation tool.
Validate Data in the Context of Annotation
Description
Ensures that the annotation file matches the data file with respect to sample IDs. Throws warnings if there are discrepencies.
Usage
validate.data.annot(data, annot, sample.id)
Arguments
data |
input data |
annot |
annotation file |
sample.id |
sample id in the input |
References
Xingpeng Li, Tatiana Gelaf Romer & Siddhartha Pachhai RVA - RNAseq Visualization Automation tool.
Validate Flag Value Is Valid
Description
Enures that the value
is one of Options
and throws an error
otherwise.
Usage
validate.flag(value, name, Options)
Arguments
value |
The user-input value for the parameter |
name |
The name of the parameter to be displayed in the error |
Options |
A vector of valid values for |
References
Xingpeng Li, Tatiana Gelaf Romer & Siddhartha Pachhai RVA - RNAseq Visualization Automation tool.
Validate genes present
Description
Checks how many of the gene id's in the dataset are there in the geneset.
Usage
validate.genes.present(data.genes, geneset)
Arguments
data.genes |
The gene id's. |
geneset |
a summary statistic table contain the genes want to be highlighted, the gene name format (in row names) needs to be consistent to the main summary statistics table). For example, this summary statistics table coulb be the output summary statistics table from Disease vs Healthy comparison (Only contain the subsetted significant genes want to be highlighted). |
References
Xingpeng Li, Tatiana Gelaf Romer & Siddhartha Pachhai RVA - RNAseq Visualization Automation tool.
Validate Geneset
Description
This function ensures that the input geneset to check.cutoff is formatted properly and in a usable form.
Usage
validate.geneset(data, geneset, highlight.1, highlight.2)
Arguments
data |
summary statistics table or a list contain multiple summary statistics tables from limma or DEseq2, where each row is a gene. |
geneset |
a summary statistic table contain the genes want to be highlighted, the gene name format (in row names) needs to be consistent to the main summary statistics table). For example, this summary statistics table coulb be the output summary statistics table from Disease vs Healthy comparison (Only contain the subsetted significant genes want to be highlighted). |
highlight.1 |
genes want to be highlighted, in the format of a vector consists of gene names. The gene name format needs to be consistent to the main summary statistics table. |
highlight.2 |
genes want to be highlighted, in the format of a vector consists of gene names. The gene name format needs to be consistent to the main summary statistics table. |
Details
The function ensures that only a dataframe or vectors are supplied, that at least one or the other is supplied, and that their formatting is correct if supplied. It also checks if any of the genes overlap with the genes in the datanames.
Value
A character value indicating if the geneset was passed as a
dataframe (df
) or two vectors (vec
), if a list is input
the number of returned values equal the length of the list
References
Xingpeng Li, Tatiana Gelaf Romer & Siddhartha Pachhai RVA - RNAseq Visualization Automation tool.
Validate Numeric Column
Description
Ensures that a column in a dataframe which must be numeric is numeric and throws an error otherwise.
Usage
validate.numeric(datin, col, name = 1)
Arguments
datin |
The data in question. |
col |
The column to validate as numeric. |
name |
the position of dataset |
Details
This specifically checks if any of the values in the column can be coerced as numeric.
References
Xingpeng Li, Tatiana Gelaf Romer & Siddhartha Pachhai RVA - RNAseq Visualization Automation tool.
Validate Pathways DB
Description
To ensure selected db name is correct.
Usage
validate.pathways.db(pathway.db, customized.pathways)
Arguments
pathway.db |
The databse to be used for encrichment analysis. Can be one of the following, "rWikiPathways", "KEGG", "REACTOME", "Hallmark","rWikiPathways_aug_2020" |
customized.pathways |
the customized pathways in the format of two column dataframe (column name as "gs_name" and "entrez_gene") to be used in analysis |
References
Xingpeng Li, Tatiana Gelaf Romer & Siddhartha Pachhai RVA - RNAseq Visualization Automation tool.
Validate P-value Range
Description
Error-handling for invalid p-value.
Usage
validate.pval.range(pval, name)
Arguments
pval |
The pvalue |
name |
The name of the value to include in the error. |
References
Xingpeng Li, Tatiana Gelaf Romer & Siddhartha Pachhai RVA - RNAseq Visualization Automation tool.
Validate pval flag
Description
To ensure p value flags are the same accross datasets.
Usage
validate.pvalflag(data, value)
Arguments
data |
A list of summary statistics table (data.frame) from limma or DEseq2, where rownames are gene id. |
value |
P value flag. |
References
Xingpeng Li, Tatiana Gelaf Romer & Siddhartha Pachhai RVA - RNAseq Visualization Automation tool.
Validate Pvalues
Description
This function ensures the fold change minimum, maximum, and step are valid.
Usage
validate.pvals(p.min, p.max, p.step)
Arguments
p.min |
The minimum starting FDR cutoff to be checked, so the minimum fold change cutoff to be evaluated will be p.min + p.step, p.min default = 0. |
p.max |
The maximum FDR cutoff to be checked, default = 0.2. |
p.step |
The step from the minimum to maximum fold change cutoff, one step increase at a time, default = 0.005. |
Details
Specifically it checks that the pvalues are between 0-1, and that
at least 1 p.step
fits within the p.min
and p.max
bounds and
is positive.
References
Xingpeng Li, Tatiana Gelaf Romer & Siddhartha Pachhai RVA - RNAseq Visualization Automation tool.
Validate Single Table is not list
Description
Makes sure the summary table being input is of the right class and format.
Usage
validate.single.table.isnotlist(data)
Arguments
data |
summary statistics table (data.frame) from limma or DEseq2, where rownames are gene id. |
References
Xingpeng Li, Tatiana Gelaf Romer & Siddhartha Pachhai RVA - RNAseq Visualization Automation tool.
Validate Summary Statistics File
Description
Check for required column names and types.
Usage
validate.stats(datin, name = 1, ...)
Arguments
datin |
the summary statistics file. |
name |
summary statistics file position indicator |
... |
pass on variables |
References
Xingpeng Li, Tatiana Gelaf Romer & Siddhartha Pachhai RVA - RNAseq Visualization Automation tool.
Check Summary Statistics Required Columns
Description
Required columns are FCflag
and FDRflag
Usage
validate.stats.cols(datin, name = 1, req.cols)
Arguments
datin |
the summary statistics file. |
name |
summary statistics file position indicator |
req.cols |
required column names of |
References
Xingpeng Li, Tatiana Gelaf Romer & Siddhartha Pachhai RVA - RNAseq Visualization Automation tool.
This is data to be included in package
Description
This is data to be included in package
Usage
wpA2020
Format
Rwikipathway data downloaded version 2020
- name
pathway name
- version
version
- wpid
pathway id
- org
host name
...