Type: | Package |
Title: | Prediction of Amyloid Proteins |
Version: | 1.1 |
LazyData: | TRUE |
Date: | 2017-10-11 |
Description: | Predicts amyloid proteins using random forests trained on the n-gram encoded peptides. The implemented algorithm can be accessed from both the command line and shiny-based GUI. |
License: | GPL-3 |
URL: | https://github.com/michbur/AmyloGram |
BugReports: | https://github.com/michbur/AmyloGram/issues |
RoxygenNote: | 6.0.1 |
Depends: | R (≥ 3.0.0) |
Imports: | biogram, ranger, seqinr, shiny |
Repository: | CRAN |
NeedsCompilation: | no |
Packaged: | 2017-10-11 14:35:25 UTC; michal |
Author: | Michal Burdukiewicz [cre, aut], Piotr Sobczyk [ctb], Stefan Roediger [ctb] |
Maintainer: | Michal Burdukiewicz <michalburdukiewicz@gmail.com> |
Date/Publication: | 2017-10-11 14:46:15 UTC |
Prediction of amyloids
Description
Amyloids are proteins associated with the number of clinical disorders (e.g., Alzheimer's, Creutzfeldt-Jakob's and Huntington's diseases). Despite their diversity, all amyloid proteins can undergo aggregation initiated by 6- to 15-residue segments called hot spots. Henceforth, amyloids form unique, zipper-like beta-structures, which are often harmful. To find the patterns defining the hot spots, we developed our novel predictor of amyloidogenicity AmyloGram, based on random forests.
Details
AmyloGram is available as R function (predict.ag_model
) or
shiny GUI (AmyloGram_gui
).
The package is enriched with the benchmark data set pep424
.
Author(s)
Maintainer: Michal Burdukiewicz <michalburdukiewicz@gmail.com>
References
Burdukiewicz MJ, Sobczyk P, Roediger S, Duda-Madej A, Mackiewicz P, Kotulska M. (2017) Amyloidogenic motifs revealed by n-gram analysis. Scientific Reports 7 https://doi.org/10.1038/s41598-017-13210-9
AmyloGram Graphical User Interface
Description
Launches graphical user interface that predicts presence of amyloids.
Usage
AmyloGram_gui()
Warning
Any ad-blocking software may cause malfunctions.
Random forest model of amyloid proteins
Description
Random forest grown using the ranger
package with additional
information.
Format
A list of length three: random forest, a vector of important n-grams and the best-performing encoding.
See Also
Protein test
Description
Checks if an object is a protein (contains letters from one-letter amino acid code).
Usage
is_protein(object)
Arguments
object |
|
Value
TRUE
or FALSE
.
pep424 data set
Description
Benchmark dataset for PASTA 2.0. 5 sequences shorter than 6 amino acids (1% of the original dataset) were removed.
Usage
pep424
Format
a list of 424 peptides (class SeqFastaAA
).
Source
Walsh, I., Seno, F., Tosatto, S.C.E., and Trovato, A. (2014). PASTA 2.0: an improved server for protein aggregation prediction. Nucleic Acids Research gku399.
Predict amyloids
Description
Recognizes amyloids using AmyloGram algorithm.
Usage
## S3 method for class 'ag_model'
predict(object, newdata, ...)
Arguments
object |
|
newdata |
|
... |
further arguments passed to or from other methods. |
Examples
data(AmyloGram_model)
data(pep424)
predict(AmyloGram_model, pep424[17])
Print AmyloGram object
Description
Prints ag_model
objects.
Usage
## S3 method for class 'ag_model'
print(x, ...)
Arguments
x |
|
... |
further arguments passed to or from other methods. |
Examples
data(AmyloGram_model)
print(AmyloGram_model)
Read sequences from .txt file
Description
Read sequence data saved in text file.
Usage
read_txt(connection)
Arguments
connection |
a |
Details
The input file should contain one or more amino acid sequences separated by empty line(s).
Value
a list of sequences. Each element has class SeqFastaAA
. If
connection contains no characters, function prompts warning and returns NULL
.
Specificity/sensitivity balance
Description
Sensitivity, specificity and Matthew's Correlation Coefficient
of AmyloGram for different cutoffs computed on pep424
dataset.
Usage
spec_sens
Format
a data frame with four columns and 99 rows.
Source
Walsh, I., Seno, F., Tosatto, S.C.E., and Trovato, A. (2014). PASTA 2.0: an improved server for protein aggregation prediction. Nucleic Acids Research gku399.