Help for package DysPIA

Type:

Package

Title:

Dysregulated Pathway Identification Analysis

Version:

1.3

Date:

2020-06-26

Maintainer:

Limei Wang <lemon619@gmail.com>

Description:

It is used to identify dysregulated pathways based on a pre-ranked gene pair list. A fast algorithm is used to make the computation really fast. The data in package 'DysPIAData' is needed.

License:

GPL-2 | GPL-3 [expanded from: GPL (≥ 2)]

Depends:

R (≥ 3.5.0), DysPIAData

Imports:

Rcpp (≥ 1.0.4), BiocParallel, fastmatch, data.table, stats,parmigene

LinkingTo:

Rcpp

RoxygenNote:

7.1.0

Encoding:

UTF-8

LazyData:

true

NeedsCompilation:

yes

Packaged:

2020-06-26 03:40:11 UTC; jinli

Author:

Limei Wang [aut, cre], Jin Li [aut, ctb]

Repository:

CRAN

Date/Publication:

2020-07-10 05:10:03 UTC

DysGPS: Calculates Dysregulated gene pair score (DysGPS) for each gene pair

Description

Calculates Dysregulated gene pair score (DysGPS) for each gene pair. Two-sample Welch's T test of gene pairs between case and control samples. The package 'DysPIAData' including the background data is needed to be loaded.

Usage

DysGPS(
  dataset,
  class.labels,
  controlcharacter,
  casecharacter,
  background = combined_background
)

Arguments

dataset

Matrix of gene expression values (rownames are genes, columnnames are samples).

class.labels

Vector of category labels.

controlcharacter

Charactor of control group in the class labels.

casecharacter

Charactor of case group in the class labels.

background

Matrix of the gene pairs' background. The default is 'combined_background', which includes real pathway gene pairs and randomly producted gene pairs. The 'combined_background' was incluede in 'DysPIAData'.

Value

A vector of DysGPS for each gene pair.

Examples

data(gene_expression_p53, class.labels_p53,sample_background)
DysGPS_sample<-DysGPS(gene_expression_p53, class.labels_p53,
 "WT", "MUT", sample_background)

Example vector of DysGPS in p53 data.

Description

The score vector of 164923 gene pairs from p53 dataset. It can be loaded from the example datasets of R-package 'DysPIA', and also can be obtained by running DysGPS(), details see DysGPS.R

Usage

data(DysGPS_p53)

DysPIA: Dysregulated Pathway Identification Analysis

Description

Runs Dysregulated Pathway Identification Analysis (DysPIA).The package 'DysPIAData' including the background data is needed to be loaded.

Usage

DysPIA(
  pathwayDB = "kegg",
  stats,
  nperm = 10000,
  minSize = 15,
  maxSize = 1000,
  nproc = 0,
  DyspiaParam = 1,
  BPPARAM = NULL
)

Arguments

pathwayDB

Name of the pathway database (8 databases:reactome,kegg,biocarta,panther,pathbank,nci,smpdb,pharmgkb). The default value is "kegg".

stats

Named vector of CILP scores for each gene pair. Names should be the same as in pathways.

nperm

Number of permutations to do. Minimial possible nominal p-value is about 1/nperm. The default value is 10000.

minSize

Minimal size of a gene pair set to test. All pathways below the threshold are excluded. The default value is 15.

maxSize

Maximal size of a gene pair set to test. All pathways above the threshold are excluded. The default value is 1000.

nproc

If not equal to zero sets BPPARAM to use nproc workers (default = 0).

DyspiaParam

DysPIA parameter value, all gene pair-level status are raised to the power of 'DyspiaParam' before calculation of DysPIA enrichment scores.

BPPARAM

Parallelization parameter used in bplapply. Can be used to specify cluster to run. If not initialized explicitly or by setting 'nproc' default value 'bpparam()' is used.

Value

A table with DysPIA results. Each row corresponds to a tested pathway. The columns are the following:

pathway – name of the pathway as in 'names(pathway)';
pval – an enrichment p-value;
padj – a BH-adjusted p-value;
DysPS – enrichment score, same as in Broad DysPIA implementation;
NDysPS – enrichment score normalized to mean enrichment of random samples of the same size;
nMoreExtreme' – a number of times a random gene pair set had a more extreme enrichment score value;
size – size of the pathway after removing gene pairs not present in 'names(stats)';
leadingEdge – vector with indexes of leading edge gene pairs that drive the enrichment.

Examples

data(pathway_list,package="DysPIAData")
data(DysGPS_p53)
DyspiaRes_p53 <- DysPIA("kegg", DysGPS_p53, nperm = 100, minSize = 20, maxSize = 100)

Example list of DysPIA result in p53 data.

Description

The list includes 81 pathway results from 'DisPIA.R' as an example used in 'DyspiaSig.R'.

Usage

data(DyspiaRes_p53)

DyspiaSig

Description

Returns the significant summary of DysPIA results.

Usage

DyspiaSig(DyspiaRes, fdr)

Arguments

DyspiaRes

Table with results of running DysPIA().

fdr

Significant threshold of 'padj' (a BH-adjusted p-value).

Value

A list of significant DysPIA results, including correlation gain and correlation loss.

Examples

data(pathway_list,package="DysPIAData")
data(DyspiaRes_p53)
summary_p53 <- DyspiaSig(DyspiaRes_p53, 0.05)       # filter with padj<0.05

DyspiaSimpleImpl

Description

Runs dysregulated pathway identification analysis for preprocessed input data.

Usage

DyspiaSimpleImpl(
  pathwayScores,
  pathwaysSizes,
  pathwaysFiltered,
  leadingEdges,
  permPerProc,
  seeds,
  toKeepLength,
  stats,
  BPPARAM
)

Arguments

pathwayScores

Vector with enrichment scores for the pathways in the database.

pathwaysSizes

Vector of pathway sizes.

pathwaysFiltered

Filtered pathways.

leadingEdges

Leading edge gene pairs.

permPerProc

Parallelization parameter for permutations.

seeds

Seed vector

toKeepLength

Number of 'pathways' that meet the condition for 'minSize' and 'maxSize'.

stats

Named vector of gene pair-level scores. Names should be the same as in pathways of 'pathwayDB'.

BPPARAM

Parallelization parameter used in bplapply. Can be used to specify cluster to run. If not initialized explicitly or by setting 'nproc' default value 'bpparam()' is used.

Value

A table with DysPIA results. Each row corresponds to a tested pathway. The columns are the following:

pathway – name of the pathway as in 'names(pathway)';
pval – an enrichment p-value;
padj – a BH-adjusted p-value;
DysPS – enrichment score, same as in Broad DysPIA implementation;
NDysPS – enrichment score normalized to mean enrichment of random samples of the same size;
nMoreExtreme' – a number of times a random gene pair set had a more extreme enrichment score value;
size – size of the pathway after removing gene pairs not present in 'names(stats)';
leadingEdge – vector with indexes of leading edge gene pairs that drive the enrichment.

calEdgeCorScore_ESE

Description

Calculates differential Mutual information.

Usage

calEdgeCorScore_ESEA(
  dataset,
  class.labels,
  controlcharacter,
  casecharacter,
  background
)

Arguments

dataset

Matrix of gene expression values (rownames are genes, columnnames are samples).

class.labels

Vector of binary labels.

controlcharacter

Charactor of control in the class labels.

casecharacter

Charactor of case in the class labels.

background

Matrix of the edges' background.

Value

A vector of the aberrant correlation in phenotype P based on mutual information (MI) for each edge.

Examples

data(gene_expression_p53, class.labels_p53,sample_background)
ESEAscore_p53<-calEdgeCorScore_ESEA(gene_expression_p53, class.labels_p53,
 "WT", "MUT", sample_background)

calcDyspiaStat: Calculates DysPIA statistics

Description

Calculates DysPIA statistics for a given query gene pair set.

Usage

calcDyspiaStat(
  stats,
  selectedStats,
  DyspiaParam = 1,
  returnAllExtremes = FALSE,
  returnLeadingEdge = FALSE
)

Arguments

stats

Named numeric vector with gene pair-level statistics sorted in decreasing order (order is not checked).

selectedStats

Indexes of selected gene pairs in the 'stats' array.

DyspiaParam

DysPIA weight parameter (0 is unweighted, suggested value is 1).

returnAllExtremes

If TRUE return not only the most extreme point, but all of them. Can be used for enrichment plot.

returnLeadingEdge

If TRUE return also leading edge gene pairs.

Value

Value of DysPIA statistic if both returnAllExtremes and returnLeadingEdge are FALSE. Otherwise returns list with the folowing elements:

res – value of DysPIA statistic
tops – vector of top peak values of cumulative enrichment statistic for each gene pair;
bottoms – vector of bottom peak values of cumulative enrichment statistic for each gene pair;
leadingEdge – vector with indexes of leading edge gene pairs that drive the enrichment.

Calculates DysPIA statistic values for all the prefixes of a gene pair set

Description

Calculates DysPIA statistic values for all the prefixes of a gene pair set

Usage

calcDyspiaStatCumulative(stats, selectedStats, DyspiaParam)

Arguments

stats

Named numeric vector with gene pair-level statistics sorted in decreasing order (order is not checked)

selectedStats

indexes of selected gene pairs in a 'stats' array

DyspiaParam

DysPIA weight parameter (0 is unweighted, suggested value is 1)

Value

Numeric vector of DysPIA statistics for all prefixes of selectedStats.

Calculates DysPIA statistic values for the gene pair sets

Description

Calculates DysPIA statistic values for the gene pair sets

Usage

calcDyspiaStatCumulativeBatch(
  stats,
  DyspiaParam,
  pathwayScores,
  pathwaysSizes,
  iterations,
  seed
)

Arguments

stats

Named numeric vector with gene pair-level statistics sorted in decreasing order (order is not checked).

DyspiaParam

DysPIA weight parameter (0 is unweighted, suggested value is 1).

pathwayScores

Vector with enrichment scores for the pathways in the database.

pathwaysSizes

Vector of pathway sizes.

iterations

Number of iterations.

seed

Seed vector

Value

List of DysPIA statistics for gene pair sets.

Example vector of category labels.

Description

The labels for the 50 cell lines in p53 data. Control group's label is 'WT', case group's label is 'MUT'.

Usage

data(class.labels_p53)

Example matrix of gene expression value.

Description

A dataset of transcriptional profiles from p53+ and p53 mutant cancer cell lines. It includes the normalized gene expression for 6385 genes in 50 samples. Rownames are genes, columnnames are samples.

Usage

data(gene_expression_p53)

Example list of gene pair background.

Description

The list of background was used in ”DysGPS.R' and 'calEdgeCorScore_ESEA.R' which is a part of the 'combined_background' in 'DysPIAData'.

Usage

data(sample_background)

setUpBPPARAM

Description

Sets up parameter BPPARAM value.

Usage

setUpBPPARAM(nproc = 0, BPPARAM = NULL)

Arguments

nproc

If not equal to zero sets BPPARAM to use nproc workers (default = 0).

BPPARAM

Parallelization parameter used in bplapply. Can be used to specify cluster to run. If not initialized explicitly or by setting 'nproc' default value 'bpparam()' is used.

Value

parameter BPPARAM value