Help for package GMSimpute

Title:

Generalized Mass Spectrum Missing Peaks Abundance Imputation

Version:

0.0.1.0

Description:

Two-Step Lasso (TS-Lasso) and compound minimum methods to recover the abundance of missing peaks in mass spectrum analysis. TS-Lasso is an imputation method that handles various types of missing peaks simultaneously. This package provides the procedure to generate missing peaks (or data) for simulation study, as well as a tool to estimate and visualize the proportion of missing at random.

Depends:

R (≥ 3.5.0)

License:

GPL-2 | GPL-3 [expanded from: GPL (≥ 2)]

Encoding:

UTF-8

LazyData:

true

RoxygenNote:

6.1.1

Imports:

utils, glmnet, ggplot2, reshape2

NeedsCompilation:

Packaged:

2019-01-06 14:16:04 UTC; liq

Author:

Qian Li [aut, cre]

Maintainer:

Qian Li <qian.li10000@gmail.com>

Repository:

CRAN

Date/Publication:

2019-01-11 17:00:25 UTC

Generalized Mass Spectrum missing peaks imputation with Two-Step Lasso as default algorithm

Description

GMS.Lasso recovers the abundance of missing peaks via either TS.Lasso or the minimum abundance per compound.

Usage

GMS.Lasso(input_data, alpha = 1, nfolds = 10, log.scale = TRUE,
  TS.Lasso = TRUE)

Arguments

input_data

Raw abundance matrix with missing value, with features in rows and samples in columns.

alpha

Weights for L1 penalty in Elastic Net. The default and suggested value is alpha=1, which is for Lasso.

nfolds

The number of folds used in parameter (lambda) tuning.

log.scale

Whether the input_data needs log scale transform.The default is log.scale=T, assuming input_data is the raw abundance matrix. If input_data is log abundance matrix, log.scale=F.

TS.Lasso

Whether to use TS.Lasso or the minimum per compound for imputation.

Value

imputed.final

The imputed abundance matrix at the scale of input_data.

Examples

data('tcga.bc')
# tcga.bc contains mass specturm abundance of 150 metabolites for 30 breast cancer 
# tumor and normal tissue samples with missing values.

imputed.compound.min=GMS.Lasso(tcga.bc,log.scale=TRUE,TS.Lasso=FALSE)
# Impute raw abundance matrix tcga.bc with compound minimum

imputed.tslasso=GMS.Lasso(tcga.bc,log.scale=TRUE,TS.Lasso=TRUE)
# Impute raw abundance matrix tcga.bc with TS.Lasso

Missing At Random (MAR) proportion estimation based on technical replicates.

Description

MAR.est estimates the proportion of missing peaks at random (MAR) caused by preprocessing tools with exactly two technical replicates per sample.

Usage

MAR.est(abundance, sample, log.scale = TRUE, violin.plot = FALSE)

Arguments

abundance

The full abundance matrix without missing value, with features in rows and samples in columns.

sample

A vector of characters or integers. It is the sample name for each pair of replicates.

log.scale

A scalar or vector of proportions. It is the total percentage of missing peaks throughout the full matrix.

violin.plot

Logical, whether to generate violin and box plots to visualize abundance distribution of missing and nonmissing peaks.

Value

MAR.Proportion

Estimated MAR proportion

plot

Violin and box plots generated by ggplot2

Examples

data('replicates')
# replicates contains mass specturm log abundance of 85 peptides
# with missing values for 4 pairs of technical replicates.


MAR=MAR.est(replicates,sample=rep(1:4,each=2),log.scale=FALSE,violin.plot=TRUE)
# Estimates the MAR proportion in the 4 pairs of replicates and output violin/box plots object.

print(MAR$plot)
# Print violin/box plots

Two-Step Lasso for missing peaks imputation

Description

TS.Lasso recovers the abundance of various types of missing peaks.

Usage

TS.Lasso(input_data, alpha = 1, nfolds = 10, log.scale = TRUE)

Arguments

input_data

Raw abundance matrix with missing value, with features in rows and samples in columns.

alpha

Weights for L1 penalty in Elastic Net. The default and suggested value is alpha=1, which is for Lasso.

nfolds

The number of folds used in parameter (lambda) tuning.

log.scale

Whether the input_data needs log scale transform.The default is log.scale=T, assuming input_data is the raw abundance matrix. If input_data is log abundance matrix, set log.scale=F.

Value

imputed.final

The imputed abundance matrix at the scale of input_data.

Examples

data('tcga.bc')
# tcga.bc contains mass specturm abundance of 150 metabolites for 30 breast cancer
# tumor and normal tissue samples with missing values.

imputed=TS.Lasso(tcga.bc,log.scale=TRUE)
# Impute raw abundance matrix tcga.bc

Missing peaks generating procedure for simulation study

Description

missing.sim generates various types of missing peaks based on specified missing proportion.

Usage

missing.sim(complete.data, total.missing, random, pct.full,
  seednum = 365)

Arguments

complete.data

The full abundance matrix without missing value, with features in rows and samples in columns.

total.missing

A scalar or vector of proportions. It is the total percentage of missing peaks throughout the full matrix.

random

A scalar or vector of proportions. It is the percentage of random missing in all the missing peaks.

pct.full

A scalar for the percentage of alighned features (metabolites or peptides) without missing peaks.

seednum

The seed set for generating missing peaks index. Default seed is seednum=365.

Value

simulated.data

The list of all simulated scenarios

Labels

The description for each simulated scenario

Examples

data('tcga.bc.full')
# tcga.bc.full contains mass specturm abundance of 100 metabolites for 30 breast cancer 
# tumor and normal tissue samples without missing values.


simulated.data=missing.sim(tcga.bc.full,total.missing=c(0.2,0.4),random=c(0.3,0.5,0.7),pct.full=0.4)
# Generate missing (NA) values in full abundance matrix tcga.bc.full permuting all scenarios

Raw mass spectrum proteomics log abundance for 4 pairs of technical replicates.

Description

Raw mass spectrum proteomics log abundance for 4 pairs of technical replicates.

Usage

replicates

Format

A data frame of 85 rows and 8 columns with missing peaks' abundance as NA.

Raw mass spectrum metabolomics data for TCGA breast cancer study.

Description

Raw mass spectrum metabolomics data for TCGA breast cancer study.

Usage

tcga.bc

Format

A data frame of 40 rows and 30 columns with missing peaks' abundance as NA.

A subset of mass spectrum metabolomics data for TCGA breast cancer study without missing peaks.

Description

A subset of mass spectrum metabolomics data for TCGA breast cancer study without missing peaks.

Usage

tcga.bc.full

Format

A data frame of 100 rows and 30 columns without missing value (NA).