Title: | Compositional Data Analysis for Microbiome Studies |
Version: | 0.2.4 |
Description: | Functions for microbiome data analysis that take into account its compositional nature. Performs variable selection through penalized regression for both, cross-sectional and longitudinal studies, and for binary and continuous outcomes. |
License: | MIT + file LICENSE |
URL: | https://malucalle.github.io/coda4microbiome/ |
BugReports: | https://github.com/malucalle/coda4microbiome/issues |
Encoding: | UTF-8 |
RoxygenNote: | 7.2.3 |
LazyData: | false |
Imports: | corrplot, glmnet, plyr, pROC, ggpubr, ggplot2, ComplexHeatmap, circlize, survival, survminer |
Suggests: | rmarkdown |
NeedsCompilation: | no |
Packaged: | 2024-07-17 15:25:42 UTC; Toni |
Author: | Malu Calle |
Maintainer: | Toni Susin <toni.susin@upc.edu> |
Depends: | R (≥ 3.5.0) |
Repository: | CRAN |
Date/Publication: | 2024-07-17 15:50:02 UTC |
Crohn
Description
Microbiome composition at genus level from a Crohn's disease study: 48 taxa and 975 individuals (662 patients with Crohn's disease and 313 controls)
Format
The dataset contains two objects:
- x_Crohn
microbiome abundance matrix for 975 individuals (rows) and 48 genera (columns)
- y_Crohn
a
factor
, indicating if the sample corresponds to a case (CD) or a control (no).
References
doi:10.1016/j.chom.2014.02.005
data_survival
Description
Survival Data simulated from the Crohn's disease original study: 48 taxa and 150 individuals
Format
The dataset contains three objects:
- x
microbiome abundance matrix for 150 individuals (rows) and 48 genera (columns)
- Event
a
numeric
, event occurrence. Vector (type: numeric or logical) specifying 0 or FALSE for no event occurrence, and 1 or TRUE for event occurrence.- Event_time
a
numeric
, time to event or follow up time for right censored data. Must be a vector (type:numeric) specifying time to event for each sample for right censored data.
References
doi:10.1016/j.chom.2014.02.005
data_survival
Description
Survival Data simulated from the Crohn's disease original study: 48 taxa and 150 individuals
Format
The dataset contains three objects:
- x
microbiome abundance matrix for 150 individuals (rows) and 48 genera (columns)
- Event
a
numeric
, event occurrence. Vector (type: numeric or logical) specifying 0 or FALSE for no event occurrence, and 1 or TRUE for event occurrence.- Event_time
a
numeric
, time to event or follow up time for right censored data. Must be a vector (type:numeric) specifying time to event for each sample for right censored data.
References
doi:10.1016/j.chom.2014.02.005
HIV
Description
Microbiome abundances (60 taxa and 155 individuals) from an HIV study (Noguera-Julian et al. 2016).
Format
The dataset contains three objects:
- x_HIV
microbiome abundance matrix for 155 individuals (rows) and 60 genera (columns)
- y_HIV
a factor, specifying if the individual is HIV positive or (
Pos
) or negative (Neg
).- MSM_HIV
a factor, indicating sexual preferences:
MSM
(Men who have Sex with Men) or not (nonMSM
).
References
doi:10.1016/j.ebiom.2016.01.032
HIV
Description
Microbiome abundances (60 taxa and 155 individuals) from an HIV study (Noguera-Julian et al. 2016).
Format
The dataset contains three objects:
- x_HIV
microbiome abundance matrix for 155 individuals (rows) and 60 genera (columns)
- y_HIV
a factor, specifying if the individual is HIV positive or (
Pos
) or negative (Neg
).- MSM_HIV
a factor, indicating sexual preferences:
MSM
(Men who have Sex with Men) or not (nonMSM
).
References
doi:10.1016/j.ebiom.2016.01.032
coda4microbiome: Compositional Data Analysis for Microbiome Studies
Description
This package provides a set of functions to explore and study microbiome data within the CoDA framework, with a special focus on identification of microbial signatures (variable selection).
coda_coxnet
Description
Microbial signatures in survival studies The algorithm performs variable selection through an elastic-net penalized Cox regression conveniently adapted to CoDA. The result is expressed as the (weighted) balance between two groups of taxa. It allows the use of non-compositional covariates.
Usage
coda_coxnet(
x,
time,
status,
covar = NULL,
lambda = "lambda.1se",
nvar = NULL,
alpha = 0.9,
nfolds = 10,
showPlots = TRUE,
coef_threshold = 0
)
Arguments
x |
abundance matrix or data frame (rows are samples, columns are variables (taxa)) |
time |
time to event or follow up time for right censored data. Must be a numericvector. |
status |
event occurrence. Vector (type: numeric or logical) specifying 0, or FALSE, for no event occurrence, and 1, or TRUE, for event occurrence. |
covar |
data frame with covariates (default = NULL) |
lambda |
penalization parameter (default = "lambda.1se") |
nvar |
number of variables to use in the glmnet.fit function (default = NULL) |
alpha |
elastic net parameter (default = 0.9) |
nfolds |
number of folds |
showPlots |
if TRUE, shows the plots (default = TRUE) |
coef_threshold |
coefficient threshold, minimum absolute value of the coefficient for a variable to be included in the model (default =0) |
Value
list with "taxa.num","taxa.name","log-contrast coefficients","risk.score","apparent Cindex","mean cv-Cindex","sd cv-Cindex","risk score plot","signature plot".
Author(s)
M. Calle, M. Pujolassos, T. Susin
Examples
data(data_survival, package = "coda4microbiome")
time <- Event_time
status <- Event
set.seed(12345)
coda_coxnet(x = x,
time = Event_time,
status = Event,
covar = NULL,
lambda = "lambda.1se", nvar = NULL,
alpha = 0.9, nfolds = 10, showPlots = TRUE, coef_threshold = 0)
coda_glmnet
Description
Microbial signatures in cross-sectional studies. The algorithm performs variable selection through penalized regression on the set of all pairwise log-ratios. The result is expressed as the (weighted) balance between two groups of taxa. It allows the use of non-compositional covariates.
Usage
coda_glmnet(
x,
y,
covar = NULL,
lambda = "lambda.1se",
nvar = NULL,
alpha = 0.9,
nfolds = 10,
showPlots = TRUE,
coef_threshold = 0
)
Arguments
x |
abundance matrix or data frame (rows are samples, columns are variables (taxa)) |
y |
outcome (binary or continuous); data type: numeric, character or factor vector |
covar |
data frame with covariates (default = NULL) |
lambda |
penalization parameter (default = "lambda.1se") |
nvar |
number of variables to use in the glmnet.fit function (default = NULL) |
alpha |
elastic net parameter (default = 0.9) |
nfolds |
number of folds |
showPlots |
if TRUE, shows the plots (default = TRUE) |
coef_threshold |
coefficient threshold, minimum absolute value of the coefficient for a variable to be included in the model (default =0) |
Value
if y is binary: list with "taxa.num","taxa.name","log-contrast coefficients","predictions","apparent AUC","mean cv-AUC","sd cv-AUC","predictions plot","signature plot" if not:list with "taxa.num","taxa.name","log-contrast coefficients","predictions","apparent Rsq","mean cv-MSE","sd cv-MSE","predictions plot","signature plot"
Author(s)
M. Calle - T. Susin
Examples
data(Crohn, package = "coda4microbiome")
set.seed(123)
coda_glmnet(x_Crohn[,(1:10)],y_Crohn,showPlots= FALSE)
coda_glmnet0
Description
Internal function for the permutational test
Usage
coda_glmnet0(
x,
lrX,
idlrX,
nameslrX,
y,
covar = NULL,
lambda = "lambda.1se",
alpha = 0.9
)
Arguments
x |
. |
lrX |
. |
idlrX |
. |
nameslrX |
. |
y |
. |
covar |
. |
lambda |
. |
alpha |
. |
Value
.
Author(s)
M. Calle - T. Susin
coda_glmnet_longitudinal
Description
Microbial signatures in longitudinal studies. Identification of a set of microbial taxa whose joint dynamics is associated with the phenotype of interest. The algorithm performs variable selection through penalized regression over the summary of the log-ratio trajectories (AUC). The result is expressed as the (weighted) balance between two groups of taxa.
Usage
coda_glmnet_longitudinal(
x,
y,
x_time,
subject_id,
ini_time,
end_time,
covar = NULL,
lambda = "lambda.1se",
nvar = NULL,
alpha = 0.9,
nfolds = 10,
showPlots = TRUE,
coef_threshold = 0
)
Arguments
x |
abundance matrix or data frame in long format (several rows per individual) |
y |
outcome (binary); data type: numeric, character or factor vector |
x_time |
observation times |
subject_id |
subject id |
ini_time |
initial time to be analyzed |
end_time |
end time to be analyzed |
covar |
data frame with covariates (default = NULL) |
lambda |
penalization parameter (default = "lambda.1se") |
nvar |
number of variables to use in the glmnet.fit function (default = NULL) |
alpha |
elastic net parameter (default = 0.9) |
nfolds |
number of folds (default = 10) |
showPlots |
if TRUE, shows the plots (default = FALSE) |
coef_threshold |
coefficient threshold, minimum absolute value of the coefficient for a variable to be included in the model (default =0) |
Value
in case of binary outcome: list with "taxa.num","taxa.name","log-contrast coefficients","predictions","apparent AUC","mean cv-AUC","sd cv-AUC","predictions plot","signature plot","trajectories plot"
Author(s)
M. Calle - T. Susin
Examples
data(ecam_filtered, package = "coda4microbiome") # load the data
ecam_results<-coda_glmnet_longitudinal (x=x_ecam[,(1:4)],y= metadata$diet,
x_time= metadata$day_of_life, subject_id = metadata$studyid, ini_time=0,
end_time=60,lambda="lambda.min",nfolds=4, showPlots=FALSE)
ecam_results$taxa.num
coda_glmnet_longitudinal0
Description
internal function
Usage
coda_glmnet_longitudinal0(
x,
lrX,
idlrX,
nameslrX,
y,
x_time,
subject_id,
ini_time,
end_time,
covar = NULL,
ktop = NULL,
lambda = "lambda.1se",
alpha = 0.9,
nfolds = 10
)
Arguments
x |
abundance matrix or data frame in long format (several rows per individual) |
lrX |
log-ratio matrix |
idlrX |
indices table in the log-ratio matrix |
nameslrX |
colnames of the log-ratio matrix |
y |
outcome (binary); data type: numeric, character or factor vector |
x_time |
observation times |
subject_id |
subject id |
ini_time |
initial time to be analyzed |
end_time |
end time to be analyzed |
covar |
data frame with covariates (default = NULL) |
ktop |
given number of selected taxa or compute the best number in case it is NULL (default = NULL) |
lambda |
penalization parameter (default = "lambda.1se") |
alpha |
elastic net parameter (default = 0.9) |
nfolds |
number of folds |
Value
.
Author(s)
M. Calle - T. Susin
coda_glmnet_longitudinal_null
Description
Performs a permutational test for the coda_glmnet_longitudinal() algorithm: It provides the distribution of results under the null hypothesis by implementing the coda_glmnet_longitudinal() on different rearrangements of the response variable.
Usage
coda_glmnet_longitudinal_null(
x,
y,
x_time,
subject_id,
ini_time,
end_time,
niter = 100,
covar = NULL,
alpha = 0.9,
lambda = "lambda.1se",
nfolds = 10,
sig = 0.05
)
Arguments
x |
abundance matrix or data frame in long format (several rows per individual) |
y |
outcome (binary); data type: numeric, character or factor vector |
x_time |
observation times |
subject_id |
subject id |
ini_time |
initial time to be analyzed |
end_time |
end time to be analyzed |
niter |
number of sample iterations |
covar |
data frame with covariates (default = NULL) |
alpha |
elastic net parameter (default = 0.9) |
lambda |
penalization parameter (default = "lambda.1se") |
nfolds |
number of folds |
sig |
significance value (default = 0.05) |
Value
list with "accuracy" and "confidence interval"
Author(s)
M. Calle - T. Susin
Examples
set.seed(123) # to reproduce the results
data(ecam_filtered, package = "coda4microbiome") # load the data
x=x_ecam # microbiome abundance
x_time = metadata$day_of_life # observation times
subject_id = metadata$studyid # subject id
y= metadata$diet # diet ("bd"= breast diet, "fd"=formula diet)
ini_time = 0
end_time = 90
coda_glmnet_longitudinal_null (x,y, x_time, subject_id, ini_time, end_time,
lambda="lambda.min",nfolds=4, niter=3)
coda_glmnet_null
Description
Performs a permutational test for the coda_glmnet() algorithm: It provides the distribution of results under the null hypothesis by implementing the coda_glmnet() on different rearrangements of the response variable.
Usage
coda_glmnet_null(
x,
y,
niter = 100,
covar = NULL,
lambda = "lambda.1se",
alpha = 0.9,
sig = 0.05
)
Arguments
x |
abundance matrix or data frame (rows are samples, columns are variables (taxa)) |
y |
outcome (binary or continuous); data type: numeric, character or factor vector |
niter |
number of iterations (default = 100) |
covar |
data frame with covariates (default = NULL) |
lambda |
penalization parameter (default = "lambda.1se") |
alpha |
elastic net parameter (default = 0.9) |
sig |
significance level for the confidence interval (default = 0.05) |
Value
a list with "accuracy" and "confidence interval"
Author(s)
M. Calle - T. Susin
Examples
data(Crohn, package = "coda4microbiome")
coda_glmnet_null(x=x_Crohn[,(1:10)], y=y_Crohn, niter=2,covar=NULL,lambda="lambda.1se",
alpha=0.9,sig=0.05)
ecam_filtered
Description
Microbiome composition at genus level from Early childhood and the microbiome (ECAM) study (Bokulich et al. 2016). Metadata and microbiome data were downloaded from https://github.com/caporaso-lab/longitudinal-notebooks. Filtered data contains information on 42 children and 37 taxa.
Format
The dataset contains three objects:
- x_ecam
microbiome abundance matrix in long format (873 rows) and 37 genera (columns)
- taxanames
vector containing the taxonomy of the 37 taxa
- metadata
matrix with information on the individuals at the observation time
References
Bokulich et al. (2016). Antibiotics, birth mode, and diet shape microbiome maturation during early life. Sci Transl Med 8:343
explore_logratios
Description
Explores the association of each log-ratio with the outcome. Summarizes the importance of each variable (taxa) as the aggregation of the association measures of those log-ratios involving the variable. The output includes a plot of the association of the log-ratio with the outcome where the variables (taxa) are ranked by importance
Usage
explore_logratios(
x,
y,
decreasing = TRUE,
measure = "AUC",
covar = NULL,
shownames = FALSE,
maxrow = 15,
maxcol = 15,
showtitle = TRUE,
mar = c(0, 0, 1, 0)
)
Arguments
x |
abundance matrix or data frame (rows are samples, columns are variables (taxa)) |
y |
outcome (binary or continuous); data type: numeric, character or factor vector |
decreasing |
order of importance (default = TRUE) |
measure |
association measures "AUC", "Pearson","Spearman", "glm" (default = "AUC") |
covar |
data frame with covariates (default = NULL) |
shownames |
logical, if TRUE, shows the names of the variables in the rows of the plot (default = FALSE) |
maxrow |
maximum number of rows to display in the plot (default = 15) |
maxcol |
maximum number of columns to display in the plot (default = 15) |
showtitle |
logical, if TRUE, shows the title of the plot (default = TRUE) |
mar |
mar numerical vector of the form c(bottom, left, top, right) which gives the number of lines of margin to be specified on the four sides of the plot (default mar=c(0,0,1,0)) |
Value
list with "max log-ratio","names max log-ratio", "order of importance", "name of most important variables", "association log-ratio with y" and "top log-ratios plot"
Author(s)
M. Calle - T. Susin
Examples
data(HIV, package = "coda4microbiome")
explore_logratios(x_HIV,y_HIV)
explore_lr_longitudinal
Description
Explores the association of summary (integral) of each log-ratio trajectory with the outcome. Summarizes the importance of each variable (taxa) as the aggregation of the association measures of those log-ratios involving the variable. The output includes a plot of the association of the log-ratio with the outcome where the variables (taxa) are ranked by importance
Usage
explore_lr_longitudinal(
x,
y,
x_time,
subject_id,
ini_time,
end_time,
showPlots = FALSE,
decreasing = TRUE,
covar = NULL,
shownames = FALSE,
maxrow = 15,
maxcol = 15,
showtitle = TRUE,
mar = c(0, 0, 1, 0)
)
Arguments
x |
abundance matrix or data frame in long format (several rows per individual) |
y |
outcome (binary); data type: numeric, character or factor vector |
x_time |
observation times |
subject_id |
subject id |
ini_time |
initial time to be analyzed |
end_time |
end time to be analyzed |
showPlots |
if TRUE, shows the plot (default = FALSE) |
decreasing |
order of importance (default = TRUE) |
covar |
data frame with covariates (default = NULL) |
shownames |
if TRUE, shows the names of the variables in the rows of the plot (default = FALSE) |
maxrow |
maximum number of rows to display in the plot (default = 15) |
maxcol |
maximum number of columns to display in the plot (default = 15) |
showtitle |
logical, if TRUE, shows the title of the plot (default = TRUE) |
mar |
mar numerical vector of the form c(bottom, left, top, right) which gives the number of lines of margin to be specified on the four sides of the plot (default mar=c(0,0,1,0)) |
Value
list with "max log-ratio","names max log-ratio","order of importance","name of most important variables","association log-ratio with y","top log-ratios plot"
Author(s)
M. Calle - T. Susin
Examples
set.seed(123) # to reproduce the results
data(ecam_filtered, package = "coda4microbiome") # load the data
x=x_ecam # microbiome abundance
x_time = metadata$day_of_life # observation times
subject_id = metadata$studyid # subject id
y= metadata$diet # diet ("bd"= breast diet, "fd"=formula diet)
ini_time = 0
end_time = 90
ecam_logratios<-explore_lr_longitudinal(x,y,x_time,subject_id,ini_time,end_time)
explore_zeros
Description
Provides the proportion of zeros for a pair of variables (taxa) in table x and the proportion of samples with zero in both variables. A bar plot with this information is also provided. Results can be stratified by a categorical variable.
Usage
explore_zeros(x, id1, id2, strata = NULL)
Arguments
x |
abundance matrix or data frame (rows are samples, columns are variables (taxa)) |
id1 |
column number in x for the first taxa |
id2 |
column number in x for the second taxa |
strata |
stratification variable (default = NULL) |
Value
a list with the frequency table and the associated bar plot
Author(s)
M. Calle - T. Susin
Examples
data(HIV, package = "coda4microbiome")
explore_zeros(x_HIV,5,6)
explore_zeros(x_HIV,5,6, strata=y_HIV)
filter_longitudinal
Description
Filters those individuals and taxa with enough longitudinal information
Usage
filter_longitudinal(
x,
taxanames = NULL,
x_time,
subject_id,
metadata,
ini_time = min(x_time),
end_time = max(x_time),
percent_indv = 0.7,
min_obs = 3
)
Arguments
x |
abundance matrix or data frame in long format (several rows per individual) |
taxanames |
names of different taxa |
x_time |
observation times |
subject_id |
subject id |
metadata |
matrix sample data |
ini_time |
initial time to be analyzed |
end_time |
end time to be analyzed |
percent_indv |
percentage of individuals with more than min_obs observations |
min_obs |
required minimum number of observations per individual |
Value
list with filtered abundance table, taxanames and metadata
Author(s)
M. Calle - T. Susin
Examples
data(ecam_filtered, package = "coda4microbiome") # load the data
x=x_ecam # microbiome abundance
x_time = metadata$day_of_life # observation times
subject_id = metadata$studyid # subject id
ini_time = 0
end_time = 360
data_filtered<-filter_longitudinal(x,taxanames,x_time, subject_id, metadata,
ini_time, end_time, min_obs=4)
impute_zeros
Description
Simple imputation: When the abundance table contains zeros, a positive value is added to all the values in the table. It adds 1 when the minimum of table is larger than 1 (i.e. tables of counts) or it adds half of the minimum value of the table, otherwise.
Usage
impute_zeros(x)
Arguments
x |
abundance matrix or data frame (rows are samples, columns are variables (taxa)) |
Value
x abundance matrix or data frame with zeros substituted by imputed positive values
Author(s)
M. Calle - T. Susin
Examples
data(HIV, package = "coda4microbiome")
x<-impute_zeros(x_HIV)
integralFun
Description
Integral of the curve trajectory of subject_id in the interval a,b
Usage
integralFun(x, y, id, a, b)
Arguments
x |
abundance matrix or data frame in long format (several rows per individual) |
y |
outcome (binary); data type: numeric, character or factor vector |
id |
subjects-ids |
a |
interval initial time |
b |
interval final time |
Value
matrix with integrals for each individual (rows) and each taxa (columns)
Author(s)
M. Calle - T. Susin
logratios_matrix
Description
Computes a large matrix with all the log-ratios between pairs of taxa (columns) in the abundance table
Usage
logratios_matrix(x)
Arguments
x |
abundance matrix or data frame (rows are samples, columns are variables (taxa)) |
Value
list with matrix of log-ratios, matrix indicating the pairs of variables involved in each log-ratio, and a matrix indicating the names of the variables involved in each log-ratio.
Author(s)
M. Calle - T. Susin
Examples
data(HIV, package = "coda4microbiome")
lrHIV<-logratios_matrix(x_HIV[,(1:4)])
# matrix of log-ratios (first 6 rows and 6 columns):
lrHIV[[1]][1:6,1:6]
# variables involved in each log-ratio
head(lrHIV[[2]])
# names of the variables involved in each log-ratio
head(lrHIV[[3]])
ecam_filtered
Description
Microbiome composition at genus level from Early childhood and the microbiome (ECAM) study (Bokulich et al. 2016). Metadata and microbiome data were downloaded from https://github.com/caporaso-lab/longitudinal-notebooks. Filtered data contains information on 42 children and 37 taxa.
Format
The dataset contains three objects:
- x_ecam
microbiome abundance matrix in long format (873 rows) and 37 genera (columns)
- taxanames
vector containing the taxonomy of the 37 taxa
- metadata
matrix with information on the individuals at the observation time
References
Bokulich et al. (2016). Antibiotics, birth mode, and diet shape microbiome maturation during early life. Sci Transl Med 8:343
plotMedianCurve
Description
Internal plot function
Usage
plotMedianCurve(iNum, iDen, X, Y, x_time, subject_id, ini_time, end_time)
Arguments
iNum |
. |
iDen |
. |
X |
. |
Y |
. |
x_time |
. |
subject_id |
. |
ini_time |
. |
end_time |
. |
Value
.
Author(s)
M. Calle - T. Susin
plot_prediction
Description
Plot of the predictions of a fitted model: Multiple box-plot and density plots for binary outcomes and Regression plot for continuous outcome
Usage
plot_prediction(prediction, y, strata = NULL, showPlots = TRUE)
Arguments
prediction |
the fitted values of predictions for the model |
y |
outcome (binary or continuous); data type: numeric, character or factor vector |
strata |
stratification variable (default = NULL) |
showPlots |
if TRUE, shows the plots (default = TRUE) |
Value
prediction plot
Author(s)
M. Calle - T. Susin
Examples
# prediction plot for the log-ratio between columns 3 and 32 on HIV status
data(HIV, package = "coda4microbiome")
x<-impute_zeros(x_HIV)
lr<-log(x[,3])-log(x[,32])
plot_prediction(lr, y_HIV)
plot_riskscore
Description
Plots samples ordered by microbial risk score values along time to event.
Usage
plot_riskscore(risk.score, x, time, status, showPlots = TRUE)
Arguments
risk.score |
microbial risk score values resulting from the coda_coxnet model |
x |
original survival data |
time |
time to event or follow up time for right censored data. Must be a vector (type:numeric) specifying time to event for each sample for right censored data. |
status |
event occurrence. Vector (numeric or logical) specifying 0 (or FALSE) for no event occurrence, and 1 (or TRUE) for event occurrence. |
showPlots |
(default: TRUE) |
Value
returns an object of class HeatmapList.
Author(s)
M. Calle, M. Pujolassos, T. Susin
Examples
set.seed(12345)
data(data_survival, package = "coda4microbiome")
time <- Event_time
status <- Event
crohn_cox <- coda_coxnet(x = x,
time = Event_time,
status = Event,
covar = NULL,
lambda = "lambda.1se", nvar = NULL,
alpha = 0.9, nfolds = 10, showPlots = TRUE, coef_threshold = 0)
plot_riskscore(risk.score = crohn_cox$risk.score,
x = x,
time = Event_time,
status = Event,
showPlots = TRUE)
#-------------------------------------------------------------------------------
plot_signature
Description
Graphical representation of the variables selected and their coefficients
Usage
plot_signature(vars, coeff, showPlots = TRUE, varnames = NULL)
Arguments
vars |
variables selected |
coeff |
associated coefficients |
showPlots |
if TRUE, shows the plots (default = TRUE) |
varnames |
if TRUE, shows the names of the variables |
Value
bar plot
Author(s)
M. Calle - T. Susin
Examples
plot_signature(c(2,10, 3, 15, 4), c(0.8, -0.1, 0.2, -0.6, -0.3))
plot_signature_curves
Description
Graphical representation of the signature trajectories
Usage
plot_signature_curves(
varNum,
coeff,
x,
y,
x_time,
subject_id,
ini_time,
end_time,
color = c("chocolate1", "slateblue2"),
showLabel = TRUE,
location = "bottomright",
inset = c(0.01, 0.02),
cex = 0.8,
y.intersp = 0.7,
main_title = NULL
)
Arguments
varNum |
column number of the variables (taxa) |
coeff |
coefficients (coefficients must sum-up zero) |
x |
microbiome abundance matrix in long format |
y |
binary outcome; data type: numeric, character or factor vector |
x_time |
observation times |
subject_id |
subject id |
ini_time |
initial time to be analyzed |
end_time |
end time to be analyzed |
color |
color graphical parameter |
showLabel |
graphical parameter (see help(label)) |
location |
graphical parameter (see help(label)) |
inset |
graphical parameter (see help(label)) |
cex |
graphical parameter (see help(label)) |
y.intersp |
graphical parameter (see help(label)) |
main_title |
title plot |
Value
trajectories plot
Author(s)
M. Calle - T. Susin
Examples
x=matrix(c(2, 3, 4, 1, 2, 5, 10, 20, 15, 30, 40, 12), ncol=2)
x_time = c(0,10,20,1,15, 25)
subject_id = c(1,1,1,2,2,2)
y=c(0,0,0,1,1,1)
plot_signature_curves(varNum=c(1,2), coeff=c(1,-1), x, y,x_time, subject_id,
ini_time=0, end_time=25)
plot_survcurves
Description
Plots survival curves stratifying samples based on their signature value. Signature value for stratification can be set by the user.
Usage
plot_survcurves(risk.score, time, status, strata.quantile = 0.5)
Arguments
risk.score |
microbial risk score values resulting from the coda_coxnet model |
time |
time to event or follow up time for right censored data. Must be a vector (type:numeric) specifying time to event for each sample for right censored data (what about interval data?). |
status |
event occurrence. Vector (type: numeric or logical) specifying 0 or FALSE for no event occurrence, and 1 or TRUE for event occurrence. |
strata.quantile |
cut-off quantile (values 0, 1) for sample stratification based on signature value. Default is set to 0.5 quantile of the signature. |
Value
return an object of class ggsurvplot which is list containing the following: plot: the survival plot (ggplot object). table: the number of subjects at risk table per time (ggplot object). data.survplot: data used to plot the survival curves (data.frame). data.survtable: data used to plot the tables under the main survival curves (data.frame).
Author(s)
M. Calle, M. Pujolassos, T. Susin
Examples
set.seed(12345)
data(data_survival, package = "coda4microbiome")
time <- Event_time
status <- Event
crohn_cox <- coda_coxnet(x = x,
time = Event_time,
status = Event,
covar = NULL,
lambda = "lambda.1se", nvar = NULL,
alpha = 0.9, nfolds = 10, showPlots = TRUE, coef_threshold = 0)
plot_survcurves(risk.score = crohn_cox$risk.score,
time,
status,
strata.quantile = 0.5)
#-------------------------------------------------------------------------------
sCD14
Description
Microbiome composition (60 taxa and 151 individuals) and inflammatory parameter sCD14 from an HIV study (Noguera-Julian et al. 2016). A dataset containing the number of counts of 60 different genera in a group of 151 samples (including HIV - infected and non - infected patients).
Format
The dataset contains two objects:
- x_sCD14
microbiome abundance matrix for 151 individuals (rows) and 60 genera (columns)
- y_sCD14
a
numeric
variable with the value of the inflammation parameter sCD14 for each sample
References
Rivera-Pinto et al. (2018) Balances: a new perspective for microbiome analysis. mSystems 3 (4)
shannon
Description
Shannon information
Usage
shannon(x)
Arguments
x |
abundance composition (vector) |
Value
shannon information
Author(s)
M. Calle - T. Susin
Examples
data(HIV, package = "coda4microbiome")
shannon(x_HIV[1,])
shannon_effnum
Description
Shannon effective number of variables in a composition
Usage
shannon_effnum(x)
Arguments
x |
abundance composition (vector) |
Value
shannon information
Author(s)
M. Calle - T. Susin
Examples
data(HIV, package = "coda4microbiome")
shannon_effnum(x_HIV[1,])
shannon_sim
Description
Shannon similarity between two compositions
Usage
shannon_sim(x, y)
Arguments
x |
abundance composition (vector) |
y |
abundance composition (vector) |
Value
shannon similarity (value between 0 and 1)
Author(s)
M. Calle - T. Susin
Examples
data(HIV, package = "coda4microbiome")
shannon_sim(x_HIV[1,],x_HIV[2,])
ecam_filtered
Description
Microbiome composition at genus level from Early childhood and the microbiome (ECAM) study (Bokulich et al. 2016). Metadata and microbiome data were downloaded from https://github.com/caporaso-lab/longitudinal-notebooks. Filtered data contains information on 42 children and 37 taxa.
Format
The dataset contains three objects:
- x_ecam
microbiome abundance matrix in long format (873 rows) and 37 genera (columns)
- taxanames
vector containing the taxonomy of the 37 taxa
- metadata
matrix with information on the individuals at the observation time
References
Bokulich et al. (2016). Antibiotics, birth mode, and diet shape microbiome maturation during early life. Sci Transl Med 8:343
data_survival
Description
Survival Data simulated from the Crohn's disease original study: 48 taxa and 150 individuals
Format
The dataset contains three objects:
- x
microbiome abundance matrix for 150 individuals (rows) and 48 genera (columns)
- Event
a
numeric
, event occurrence. Vector (type: numeric or logical) specifying 0 or FALSE for no event occurrence, and 1 or TRUE for event occurrence.- Event_time
a
numeric
, time to event or follow up time for right censored data. Must be a vector (type:numeric) specifying time to event for each sample for right censored data.
References
doi:10.1016/j.chom.2014.02.005
Crohn
Description
Microbiome composition at genus level from a Crohn's disease study: 48 taxa and 975 individuals (662 patients with Crohn's disease and 313 controls)
Format
The dataset contains two objects:
- x_Crohn
microbiome abundance matrix for 975 individuals (rows) and 48 genera (columns)
- y_Crohn
a
factor
, indicating if the sample corresponds to a case (CD) or a control (no).
References
doi:10.1016/j.chom.2014.02.005
HIV
Description
Microbiome abundances (60 taxa and 155 individuals) from an HIV study (Noguera-Julian et al. 2016).
Format
The dataset contains three objects:
- x_HIV
microbiome abundance matrix for 155 individuals (rows) and 60 genera (columns)
- y_HIV
a factor, specifying if the individual is HIV positive or (
Pos
) or negative (Neg
).- MSM_HIV
a factor, indicating sexual preferences:
MSM
(Men who have Sex with Men) or not (nonMSM
).
References
doi:10.1016/j.ebiom.2016.01.032
ecam_filtered
Description
Microbiome composition at genus level from Early childhood and the microbiome (ECAM) study (Bokulich et al. 2016). Metadata and microbiome data were downloaded from https://github.com/caporaso-lab/longitudinal-notebooks. Filtered data contains information on 42 children and 37 taxa.
Format
The dataset contains three objects:
- x_ecam
microbiome abundance matrix in long format (873 rows) and 37 genera (columns)
- taxanames
vector containing the taxonomy of the 37 taxa
- metadata
matrix with information on the individuals at the observation time
References
Bokulich et al. (2016). Antibiotics, birth mode, and diet shape microbiome maturation during early life. Sci Transl Med 8:343
sCD14
Description
Microbiome composition (60 taxa and 151 individuals) and inflammatory parameter sCD14 from an HIV study (Noguera-Julian et al. 2016). A dataset containing the number of counts of 60 different genera in a group of 151 samples (including HIV - infected and non - infected patients).
Format
The dataset contains two objects:
- x_sCD14
microbiome abundance matrix for 151 individuals (rows) and 60 genera (columns)
- y_sCD14
a
numeric
variable with the value of the inflammation parameter sCD14 for each sample
References
Rivera-Pinto et al. (2018) Balances: a new perspective for microbiome analysis. mSystems 3 (4)
Crohn
Description
Microbiome composition at genus level from a Crohn's disease study: 48 taxa and 975 individuals (662 patients with Crohn's disease and 313 controls)
Format
The dataset contains two objects:
- x_Crohn
microbiome abundance matrix for 975 individuals (rows) and 48 genera (columns)
- y_Crohn
a
factor
, indicating if the sample corresponds to a case (CD) or a control (no).
References
doi:10.1016/j.chom.2014.02.005
HIV
Description
Microbiome abundances (60 taxa and 155 individuals) from an HIV study (Noguera-Julian et al. 2016).
Format
The dataset contains three objects:
- x_HIV
microbiome abundance matrix for 155 individuals (rows) and 60 genera (columns)
- y_HIV
a factor, specifying if the individual is HIV positive or (
Pos
) or negative (Neg
).- MSM_HIV
a factor, indicating sexual preferences:
MSM
(Men who have Sex with Men) or not (nonMSM
).
References
doi:10.1016/j.ebiom.2016.01.032
sCD14
Description
Microbiome composition (60 taxa and 151 individuals) and inflammatory parameter sCD14 from an HIV study (Noguera-Julian et al. 2016). A dataset containing the number of counts of 60 different genera in a group of 151 samples (including HIV - infected and non - infected patients).
Format
The dataset contains two objects:
- x_sCD14
microbiome abundance matrix for 151 individuals (rows) and 60 genera (columns)
- y_sCD14
a
numeric
variable with the value of the inflammation parameter sCD14 for each sample
References
Rivera-Pinto et al. (2018) Balances: a new perspective for microbiome analysis. mSystems 3 (4)