| Type: | Package | 
| Title: | Statistical Methods for Microbiome Compositional Data | 
| Version: | 1.2 | 
| Date: | 2024-03-13 | 
| Author: | Xianyang Zhang [aut], Jun Chen [aut, cre], Huijuan Zhou [ctb] | 
| Maintainer: | Jun Chen <chen.jun2@mayo.edu> | 
| Description: | A suite of methods for powerful and robust microbiome data analysis addressing zero-inflation, phylogenetic structure and compositional effects (Zhou et al. (2022)<doi:10.1186/s13059-022-02655-5>). The methods can be applied to the analysis of other (high-dimensional) compositional data arising from sequencing experiments. | 
| Depends: | R (≥ 3.5.0) | 
| Imports: | ggplot2, matrixStats, parallel, stats, utils, Matrix, statmod, MASS, ggrepel, lmerTest, foreach, modeest | 
| NeedsCompilation: | yes | 
| License: | GPL-3 | 
| Encoding: | UTF-8 | 
| Packaged: | 2024-04-01 22:00:40 UTC; m123485 | 
| Repository: | CRAN | 
| Date/Publication: | 2024-04-01 22:30:02 UTC | 
Linear (Lin) Model for Differential Abundance (DA) Analysis of High-dimensional Compositional Data
Description
The function implements a simple, robust and highly scalable approach to tackle the compositional effects in differential abundance analysis of high-dimensional compositional data. It fits linear regression models on the centered log2-ratio transformed data, identifies a bias term due to the transformation and compositional effect, and corrects the bias using the mode of the regression coefficients. It could fit mixed-effect models for analysis of correlated data.
Usage
linda(
  feature.dat,
  meta.dat,
  formula,
  feature.dat.type = c('count', 'proportion'),
  prev.filter = 0,
  mean.abund.filter = 0, 
  max.abund.filter = 0,
  is.winsor = TRUE,
  outlier.pct = 0.03,
  adaptive = TRUE,
  zero.handling = c('pseudo-count', 'imputation'),
  pseudo.cnt = 0.5,
  corr.cut = 0.1,
  p.adj.method = "BH",
  alpha = 0.05,
  n.cores = 1, 
  verbose = TRUE
)
Arguments
feature.dat | 
 a matrix of counts/proportions, row - features (OTUs, genes, etc) , column - samples.  | 
meta.dat | 
 a data frame containing the sample meta data. If there are NAs, the corresponding samples will be removed in the analysis.  | 
formula | 
 a character string for the formula. The formula should conform to that used by   | 
feature.dat.type | 
 the type of the feature data. It could be "count" or "proportion".  | 
prev.filter | 
 the prevalence (percentage of non-zeros) cutoff, under which the features will be filtered. The default is 0.  | 
mean.abund.filter | 
 the mean relative abundance cutoff, under which the features will be filtered. The default is 0.  | 
max.abund.filter | 
 the max relative abundance cutoff, under which the features will be filtered. The default is 0.  | 
is.winsor | 
 a logical value indicating whether winsorization should be performed to replace outliers (high values). The default is TRUE.  | 
outlier.pct | 
 the expected percentage of outliers. These outliers will be winsorized. The default is 0.03.  | 
adaptive | 
 a logical value indicating whether the approach to handle zeros (pseudo-count or imputation)
will be determined based on the correlations between the log(sequencing depth) and the explanatory variables
in   | 
zero.handling | 
 a character string of 'pseudo-count' or 'imputation' indicating the zero handling method
used when   | 
pseudo.cnt | 
 a positive numeric value for the pseudo-count to be added if   | 
corr.cut | 
 a numerical value between 0 and 1, indicating the significance level used for determining
the zero-handling approach when   | 
p.adj.method | 
 a character string indicating the p-value adjustment approach for 
addressing multiple testing. See R function   | 
alpha | 
 a numerical value between 0 and 1 indicating the significance level for declaring differential features. Default is 0.05.  | 
n.cores | 
 a positive integer. If   | 
verbose | 
 a logical value indicating whether the trace information should be printed out.  | 
Value
A list with the elements
variables | 
 A vector of variable names of all fixed effects in   | 
bias | 
 numeric vector; each element corresponds to one variable in   | 
output | 
 a list of data frames with columns 'baseMean', 'log2FoldChange', 'lfcSE', 'stat', 'pvalue', 'padj', 'reject',
'df';  
  | 
covariance | 
 a list of data frames; the data frame records the covariances between a regression coefficient with other coefficients;
  | 
otu.tab.use | 
 the OTU table used in the abundance analysis (the   | 
meta.use | 
 the meta data used in the abundance analysis (only variables in   | 
wald | 
 a list for use in Wald test. If the fitting model is a linear model, then it includes 
 If the fitting model is a linear mixed-effect model, then it includes 
  | 
Author(s)
Huijuan Zhou, Jun Chen, Xianyang Zhang
References
Zhou, H., He, K., Chen, J., Zhang, X. (2022). LinDA: linear models for differential abundance analysis of microbiome compositional data. Genome biology, 23(1), 95.
Examples
data(smokers)
ind <- smokers$meta$AIRWAYSITE == 'Throat'
otu.tab <- as.data.frame(smokers$otu[, ind])
depth <- colSums(otu.tab)
meta <- cbind.data.frame(Smoke = factor(smokers$meta$SMOKER[ind]),
                         Sex = factor(smokers$meta$SEX[ind]),
                         Site = factor(smokers$meta$SIDEOFBODY[ind]),
                         SubjectID = factor(smokers$meta$HOST_SUBJECT_ID[ind]))
# Differential abundance analysis using the left throat data                       
ind1 <- meta$Site == 'Left' & depth >= 1000
linda.obj  <- linda(otu.tab[, ind1], meta[ind1, ], formula = '~Smoke+Sex', 
           feature.dat.type = 'count', 
           prev.filter = 0.1, is.winsor = TRUE, outlier.pct = 0.03,
           p.adj.method = "BH", alpha = 0.1)
           
           
linda.plot(linda.obj, c('Smokey', 'Sexmale'),
           titles = c('Smoke: n v.s. y', 'Sex: female v.s. male'), 
           alpha = 0.1, lfc.cut = 1, legend = TRUE, directory = NULL,
            width = 11, height = 8)
            
rownames(linda.obj $output[[1]])[which(linda.obj $output[[1]]$reject)]
# Differential abundance analysis pooling both the left and right throat data 
# Mixed effects model is used 
ind  <- depth >= 1000
linda.obj <- linda(otu.tab[, ind], meta[ind, ], formula = '~Smoke+Sex+(1|SubjectID)',
           feature.dat.type = 'count', 
           prev.filter = 0.1, is.winsor = TRUE, outlier.pct = 0.03,
           p.adj.method = "BH", alpha = 0.1)
       
    
# For proportion data   
otu.tab.p <- t(t(otu.tab) / colSums(otu.tab))
ind1 <- meta$Site == 'Left' & depth >= 1000
lind.obj  <- linda(otu.tab[, ind1], meta[ind1, ], formula = '~Smoke+Sex', 
           feature.dat.type = 'proportion', 
           prev.filter = 0.1, is.winsor = TRUE, outlier.pct = 0.03,
           p.adj.method = "BH", alpha = 0.1)
Plot LinDA Results
Description
The function produces the effect size plot of the differential features and volcano plot based on the output from linda.
Usage
linda.plot(
  linda.obj,
  variables.plot,
  titles = NULL,
  alpha = 0.05,
  lfc.cut = 1,
  legend = FALSE,
  directory = NULL,
  width = 11,
  height = 8
)
Arguments
linda.obj | 
 return from function   | 
variables.plot | 
 vector; variables whose results are to be plotted. For example, suppose the return
value   | 
titles | 
 vector; titles of the effect size plot and volcano plot for each variable in   | 
alpha | 
 a numerical value between 0 and 1; cutoff for   | 
lfc.cut | 
 a positive numerical value; cutoff for   | 
legend | 
 TRUE or FALSE; whether to show the legends of the effect size plot and volcano plot.  | 
directory | 
 character; the directory to save the figures, e.g.,   | 
width | 
 the width of the graphics region in inches. See R function   | 
height | 
 the height of the graphics region in inches. See R function   | 
Value
A list of ggplot2 objects.
plot.lfc | 
 a list of effect size plots. Each plot corresponds to one variable in   | 
plot.volcano | 
 a list of volcano plots. Each plot corresponds to one variable in   | 
Author(s)
Huijuan Zhou, Jun Chen, Xianyang Zhang
References
Zhou, H., He, K., Chen, J., Zhang, X. (2022). LinDA: linear models for differential abundance analysis of microbiome compositional data. Genome biology, 23(1), 95.
Examples
data(smokers)
ind <- smokers$meta$AIRWAYSITE == 'Throat' & smokers$meta$SIDEOFBODY == 'Left'
otu.tab <- as.data.frame(smokers$otu[, ind])
depth <- colSums(otu.tab)
meta <- cbind.data.frame(Smoke = factor(smokers$meta$SMOKER[ind]),
                         Sex = factor(smokers$meta$SEX[ind]))
                         
ind  <- depth >= 1000
linda.obj <- linda(otu.tab[, ind], meta[ind, ], formula = '~Smoke+Sex',
           feature.dat.type = 'count', 
           prev.filter = 0.1, is.winsor = TRUE, outlier.pct = 0.03,
           p.adj.method = "BH", alpha = 0.1)
           
linda.plot(linda.obj, c('Smokey', 'Sexmale'),
           titles = c('Smoke: n v.s. y', 'Sex: female v.s. male'), 
           alpha = 0.1, lfc.cut = 1, legend = TRUE, directory = NULL,
            width = 11, height = 8)
Wald test for bias-corrected regression coefficients
Description
The function implements Wald test for bias-corrected regression coefficients generated from the linda function.
One can utilize the function to perform ANOVA-type analyses. For example, if you have a cateogrical variable with three levels, you can test whether all levels have the same effect.
Usage
linda.wald.test(
  linda.obj,
  L,
  model = c("LM", "LMM"),
  alpha = 0.05,
  p.adj.method = "BH"
)
Arguments
linda.obj | 
 return from the   | 
L | 
 A matrix for testing   | 
model | 
 
  | 
alpha | 
 significance level for testing   | 
p.adj.method | 
 P-value adjustment approach. See R function   | 
Value
A data frame with columns
Fstat | 
 Wald statistics for each taxon  | 
df1 | 
 The numerator degrees of freedom  | 
df2 | 
 The denominator degrees of freedom  | 
pvalue | 
  
  | 
padj | 
  
  | 
reject | 
  
  | 
Author(s)
Huijuan Zhou huijuanzhou2019@gmail.com Jun Chen Chen.Jun2@mayo.edu Xianyang Zhang zhangxiany@stat.tamu.edu
References
Zhou, H., He, K., Chen, J., Zhang, X. (2022). LinDA: linear models for differential abundance analysis of microbiome compositional data. Genome biology, 23(1), 95.
Examples
data(smokers)
ind <- smokers$meta$AIRWAYSITE == 'Throat'
otu.tab <- as.data.frame(smokers$otu[, ind])
depth <- colSums(otu.tab)
meta <- cbind.data.frame(Smoke = factor(smokers$meta$SMOKER[ind]),
                         Sex = factor(smokers$meta$SEX[ind]),
                         Site = factor(smokers$meta$SIDEOFBODY[ind]),
                         SubjectID = factor(smokers$meta$HOST_SUBJECT_ID[ind]))
ind  <- depth >= 1000
linda.obj <- linda(otu.tab[, ind], meta[ind, ], formula = '~Smoke+Sex+(1|SubjectID)',
           feature.dat.type = 'count', 
           prev.filter = 0.1, is.winsor = TRUE, outlier.pct = 0.03,
           p.adj.method = "BH", alpha = 0.1)
#  L matrix (2x3) is designed to test the second (Smoke) and the third (Sex) coefficient to be 0.
# For a categorical variable > two levels, similar L can be designed to do ANOVA-type test. 
L <- matrix(c(0, 1, 0, 0, 0, 1), nrow = 2, byrow = TRUE)
result <- linda.wald.test(linda.obj, L, 'LMM', alpha = 0.1)
Microbiome data from the human upper respiratory tract (left and right throat)
Description
A dataset containing "otu", "tax", meta", "genus", family"
Usage
data(smokers)
Format
A list with elements
- otu
 otu table, 2156 taxa by 290 samples
- tax
 taxonomy table, 2156 taxa by 7 taxonomic ranks
- meta
 meta table, 290 samples by 53 sample variables
- genus
 304 by 290
- family
 113 by 290
Source
https://qiita.ucsd.edu/ study ID:524 Reference: Charlson ES, Chen J, Custers-Allen R, Bittinger K, Li H, et al. (2010) Disordered Microbial Communities in the Upper Respiratory Tract of Cigarette Smokers. PLoS ONE 5(12): e15216.