Help for package MetChem

Version:

0.5

Date:

2025-06-04

Maintainer:

Stefano Cacciatore <tkcaccia@gmail.com>

Title:

Chemical Structural Similarity Analysis

Description:

A new pipeline to explore chemical structural similarity across metabolites. It allows the metabolite classification in structurally-related modules and identifies common shared functional groups. The KODAMA algorithm is used to highlight structural similarity between metabolites. See Cacciatore S, Tenori L, Luchinat C, Bennett PR, MacIntyre DA. (2017) Bioinformatics <doi:10.1093/bioinformatics/btw705>, Cacciatore S, Luchinat C, Tenori L. (2014) Proc Natl Acad Sci USA <doi:10.1073/pnas.1220873111>, and Abdel-Shafy EA, Melak T, MacIntyre DA, Zadra G, Zerbini LF, Piazza S, Cacciatore S. (2023) Bioinformatics Advances <doi:10.1093/bioadv/vbad053>.

Depends:

R (≥ 3.5.0), stats, KODAMA (≥ 3.0), httr, XML, fingerprint, rcdk (≥ 3.4.3)

Suggests:

knitr, rmarkdown, readxl, impute, pheatmap, RColorBrewer, clinical

VignetteBuilder:

knitr

SuggestsNote:

No suggestions

License:

GPL-2 | GPL-3 [expanded from: GPL (≥ 2)]

Packaged:

2025-06-04 15:53:53 UTC; user

NeedsCompilation:

Repository:

CRAN

Author:

Ebtesam Abdel-Shafy [aut], Tadele Melak [aut], David A. MacIntyre [aut], Giorgia Zadra [aut], Luiz F. Zerbini [aut], Silvano Piazza [aut], Stefano Cacciatore [aut, cre]

Date/Publication:

2025-06-04 16:30:02 UTC

ChemRICH Dataset

Description

This dataset consists of a list of the metabolites names download from https://chemrich.fiehnlab.ucdavis.edu/. HMDB IDs were retrieved from PubChem Identifier Exchange Service (https://pubchem.ncbi.nlm.nih.gov/idexchange/idexchange.cgi) and manually curated.

Usage

data(ChemRICH)

Value

A list with the following elements in the variable ChemRICH:

name

A vector of metabolite's names.

SMILES

A vector of SMILES represenation of each metabolite.

HMDB

A vector containing HMDB IDs of each metabolite.

Examples

 data(ChemRICH)

HFD Dataset

Description

This dataset is dataframe of metabolite dataset contains only chemical information.

Usage

data(HFD)

Value

A list with the following elements in the variable HFD:

SMILES

A vector of SMILES represenation of each metabolite.

CHEMICAL_ID

A vector of chemical ID number or each metabolite.

PUBCHEM

A vector of identifier ID number from PUBCHEM database for chemical molecules and their activities in biological assays.

CHEMSPIDER

A vector of a unique identifier from CHEMSPIDER database each molecule.

HMDB

A vector containing HMDB IDs of each metabolite.

Examples

 data(HFD)

KODAMA chemical similarity.

Description

This function calculates the structural similarity between different metabolites and performs hierarchical clustering using the KODAMA algorithm.

Usage




KODAMA.chem.sim (smiles,
                 d=NULL,
                 k=50,
                 dissimilarity.parameters=list(),
                 kodama.matrix.parameters=list(),
                 kodama.visualization.parameters=list(),
                 hclust.parameters=list(method="ward.D"))

Arguments

smiles

A list of smile notations for the study metabolites dataset.

d

A distance structure such as that returned by dist or a full symmetric matrix containing the dissimilarities. If NULL (default), then the dissimilarity matrix will be generated by chemical.dissimilarity function. Otherwise, d will be considered as the dissimilarity matrix.

k

A number of components of multidimensional scaling.

dissimilarity.parameters

Optional parameters for chemical.dissimilarity function.

kodama.matrix.parameters

Optional parameters for KODAMA.matrix function.

kodama.visualization.parameters

Optional parameters for KODAMA.visualization function.

hclust.parameters

Optional parameters for hclust function.

Value

A list contains all results of KODAMA chemical similarity analysis and hierarchical clustering for the KODAMA dimensions.

Examples


data(Metabolites)

res=KODAMA.chem.sim(Metabolites$SMILES)  
plot(res$kodama$visualization)

Metabolomic Dataset

Description

This dataset consists of a list of the metabolites as returned by the function readMet and concentration value of each metabolites.

Usage

data(Metabolites)

Value

A list with the following elements in the variable Metabolites:

concentration

A matrix containing the concentration of each metabolites.

name

A vector of metabolite's names.

SMILES

A vector of SMILES represenation of each metabolite.

HMDB

A vector containing HMDB IDs of each metabolite.

readMet

A list of metabolites information produced by readMet function.

Examples

 data(Metabolites)

Weighted Metabolite Chemical Structural Analysis

Description

Summarize metabolites concetration in each of identified clusters using the module eigenvalue (eigen-metabolite) for calculating module membership measures.

Usage

WMCSA(data,cl)

Arguments

data

dataset of different metabolite concentration in differnt samples.

cl

The output of the allbranches function containing the module memberships.

Value

This function returns a matrix as output represent similarity score of metabolites within the same module among different samples.

Examples



data(Metabolites)

SMILES=Metabolites$SMILES
names(SMILES)=Metabolites$name
res=KODAMA.chem.sim(SMILES)  
cl=allbranches(res$hclust)
ww=WMCSA(Metabolites$concentration,cl)

Cut a Tree into Groups of Data

Description

Cuts a tree as resulting from hclust function, into groups (a.k.a. modules).

Usage


allbranches(hh,minlen=5)

Arguments

hh

a tree as produced by hclust function.

minlen

The minimum number of elements in each module.

Value

A list contains vectors of module memberships.

Examples


data(Metabolites)

data=Metabolites$readMet$concentration
hh=hclust(dist(data),method="ward.D")
res=allbranches(hh)

Chemical dissimilarity.

Description

This function calculates the structural dissimilarity between different metabolites using the simplified molecular-input line-entry system (SMILE) of each metabolite as input.

Usage


chemical.dissimilarity (smiles,method="tanimoto",type="extended")

Arguments

smiles

A vector of smile notations.

method

The method used to calculated the distance between molecular fingerprint ("tanimoto" as default). For more information see fp.sim.matrix function.

type

The type of fingerprint applied to the SMILEs ("extended" as default). For more information see get.fingerprint function.

Value

A list contains distance between fingerprints .

Examples


data(Metabolites)
d=chemical.dissimilarity(Metabolites$SMILES[1:50])

Detection of clusters.

Description

This function calculates the structural similarity between different metabolites, performs hierarchical clustering using the KODAMA algorithm, and detects the optimal number of clusters. The procedure is repeated to ensure the robustness of the detection.

Usage



clusters.detection  (smiles,
                     k=50,
                     seed=12345,
                     max_nc = 30,
                     dissimilarity.parameters=list(),
                     kodama.matrix.parameters=list(),
                     kodama.visualization.parameters=list(),
                     hclust.parameters=list(method="ward.D"),
                     verbose = TRUE)

Arguments

smiles

A list of smile notations for the study metabolites dataset.

k

The number of components of multidimensional scaling.

seed

Seed for the generation of random numbers.

max_nc

Maximum number of clusters.

dissimilarity.parameters

Optional parameters for chemical.dissimilarity function.

kodama.matrix.parameters

Optional parameters for KODAMA.matrix function.

kodama.visualization.parameters

Optional parameters for KODAMA.visualization function.

hclust.parameters

Optional parameters for hclust function.

verbose

If verbose is TRUE, it displays the progress for each iteration.

Value

A list contains all results of KODAMA chemical similarity analysis and hierarchical clustering.

Examples


data(Metabolites)

res=clusters.detection(Metabolites$SMILES)

Metabolite-associated Diseases

Description

This function correlates metabolites to associated diseases.

Usage

diseasesMet(doc)

Arguments

doc

A list of metabolites information produced by readMet function.

Value

A data frame contains the diseases associated with each metabolite.

Examples


data(Metabolites)
dis=diseasesMet(Metabolites$readMet)

Metabolite-associated Enzymes

Description

This function finds the metabolite related enzymes.

Usage

enzymesMet(doc)

Arguments

doc

A list of metabolites information produced by readMet function.

Value

A data frame contains the enzymes associated with each metabolite.

Examples


data(Metabolites)
enz=enzymesMet(Metabolites$readMet)

Cluster features extraction

Description

This function finds features associated with each cluster.

Usage


features(doc,cla,cl,HMDB_ID)

Arguments

doc

The output of the readMet function.

cla

The output of diseasesMet, enzymesMet, pathwaysMet, propertiesMet, substituentsMet, or taxonomyMet functions.

cl

The output of the allbranches function containing the module memberships.

HMDB_ID

A vector of HMDB IDs associated with their chemical name.

Value

A list of p-value calculated using Fisher test for cluster associted features.

Examples


data(Metabolites)
SMILES=Metabolites$SMILES
names(SMILES)=Metabolites$name
HMDB=Metabolites$HMDB
names(HMDB)=Metabolites$name
res=KODAMA.chem.sim(SMILES)
cl=allbranches(res$hclust)
cla=substituentsMet(Metabolites$readMet)
f=features(Metabolites$readMet,cla,cl,HMDB)

Name of metabolites

Description

This function extracts the metabolite's names from the list generated by readMet function.

Usage

nameMet(doc)

Arguments

doc

A list of metabolites information produced by readMet function.

Value

A data frame contains the names of each metabolite.

Examples


data(Metabolites)
nam=nameMet(Metabolites$readMet)

Metabolic Pathways

Description

This function finds the metabolite related pathways.

Usage

pathwaysMet(doc)

Arguments

doc

A list of metabolites information produced by readMetfunction.

Value

A data frame contains the pathways associated with each metabolite.

Examples



data(Metabolites)
pat=pathwaysMet(Metabolites$readMet)

Physical Proprieties of metabolites

Description

This function finds the Physical Proprieties of metabolites.

Usage

propertiesMet(doc)

Arguments

doc

A list of metabolites information produced by readMet function.

Value

A data frame contains the properties associated with each metabolite.

Examples


data(Metabolites)
pro=propertiesMet(Metabolites$readMet)

Metabolite Cards Reading

Description

This function extract metabocards of your metabolites dataset from http://www.hmdb.ca/metabolites/ database and store all of this information in a list.

Usage

readMet(ID, address =  c("http://www.hmdb.ca/metabolites/"),remove=TRUE)

Arguments

ID

A vector containg the HMDBcodes (i.e., metabolite IDs) of metabolites dataset.

address

Optional address where the MetaboCards are located. The default address is http://www.hmdb.ca/metabolites/.

remove

A logic value. If true, missing and wrong HMDB IDs are removed.

Value

A list containing all the information related to the metabocards.

Examples



ID=c("HMDB0000122","HMDB0000124","HMDB0000243","HMDB0000263")
doc=readMet(ID)

Metabolites selection

Description

This function select metabolites from the list generated by readMet function.

Usage

selectionMet(doc, sel)

Arguments

doc

A list of metabolites information produced by readMet function.

sel

A vector of metabolite's HMDBcode that will be selected

Value

doc

A doc list contains only the selcted metabolites.

Examples


data(Metabolites)
doc=selectionMet(Metabolites$readMet,c("HMDB0000299","HMDB0000881"))
nameMet(doc)

Metabolite substituents

Description

This function finds the metabolite related substituents.

Usage

substituentsMet(doc)

Arguments

doc

A list of metabolites information produced by readMet function

Value

A data frame contains the substituents of each metabolite.

Examples


data(Metabolites)
sub=substituentsMet(Metabolites$readMet)

Metabolite Taxonomy

Description

This function finds the metabolite related taxonomy.

Usage

taxonomyMet(doc)

Arguments

doc

A list of metabolites information produced by readMet function.

Value

A data frame contains the taxonomy of each metabolite.

Examples


data(Metabolites)
tax=taxonomyMet(Metabolites$readMet)

Optimal cluster number calculation.

Description

This function helps to estimate the optimal cluster number that fit the metabolites dataset. It applies different optimal cluster number calculating algorithms to cut clutering tree of hclust function. and return a list contains index corresponding to each cluster number.

Usage

tree.cutting (res,max_nc=20)

Arguments

res

A list produced by KODAMA.chem.sim function.

max_nc

The maximum number of cluster (default = 20).

Value

A list contains the calculation for each clustering of Rousseeuw's Silhouette index.

Examples


data(Metabolites)

res=KODAMA.chem.sim(Metabolites$SMILES)
clu=tree.cutting(res,max_nc = 30)
plot(clu$min_nc:clu$max_nc,clu$res.S)

Write a CLS file

Description

This function write a file in the format CLS defined by GenePattern.

Usage

write.cls(es, address)

Arguments

es

A matrix.

address

The address of the file should be saved.

Value

No return value. If an invalid address is inserted, the function will generate an error.

Write a GCT file

Description

This function write a file in the format GCT defined by GenePattern.

Usage

write.gct(es, address)

Arguments

es

A matrix.

address

The address of the file should be saved.

Value

No return value. If an invalid address is inserted, the function will generate an error.

Write a GMT file

Description

This function write a file containing the Metabolite Set informtation in the format GMT defined by GenePattern.

Usage

write.gmt(sub,address,min_entry=2,max_entry=50)

Arguments

sub

A matrix.

address

The address of the file should be saved.

min_entry

The minimum number of metabolites for each metabolite set.

max_entry

The maximum number of metabolites for each metabolite set.

Value

No return value. If an invalid address is inserted, the function will generate an error.

ChemRICH Dataset

Description

Usage

Value

Examples

HFD Dataset

Description

Usage

Value

Examples

KODAMA chemical similarity.

Description

Usage

Arguments

Value

See Also

Examples

Metabolomic Dataset

Description

Usage

Value

Examples

Weighted Metabolite Chemical Structural Analysis

Description

Usage

Arguments

Value

See Also

Examples

Cut a Tree into Groups of Data

Description

Usage

Arguments

Value

See Also

Examples

Chemical dissimilarity.

Description

Usage

Arguments

Value

See Also

Examples

Detection of clusters.

Description

Usage

Arguments

Value

See Also

Examples

Metabolite-associated Diseases

Description

Usage

Arguments

Value

See Also

Examples

Metabolite-associated Enzymes

Description

Usage

Arguments

Value

See Also

Examples

Cluster features extraction

Description

Usage

Arguments

Value

See Also

Examples

Name of metabolites

Description

Usage

Arguments

Value

See Also

Examples

Metabolic Pathways

Description