Type: | Package |
Title: | Poly-Omic Prediction of Complex TRaits |
Version: | 1.4.0 |
Date: | 2016-03-03 |
Author: | Hae Kyung Im, Heather E. Wheeler, Keston Aquino Michaels, Vassily Trubetskoy |
Maintainer: | Hae Kyung Im <haky@uchicago.edu> |
Description: | It provides functions to generate a correlation matrix from a genetic dataset and to use this matrix to predict the phenotype of an individual by using the phenotypes of the remaining individuals through kriging. Kriging is a geostatistical method for optimal prediction or best unbiased linear prediction. It consists of predicting the value of a variable at an unobserved location as a weighted sum of the variable at observed locations. Intuitively, it works as a reverse linear regression: instead of computing correlation (univariate regression coefficients are simply scaled correlation) between a dependent variable Y and independent variables X, it uses known correlation between X and Y to predict Y. |
License: | GPL (≥ 3) |
Depends: | R(≥ 2.15.1), doParallel |
Imports: | ROCR, irlba, parallel, foreach |
Collate: | 'correlation_matrices.R' 'input_pheno_GT.R' 'omic_KRIGR.R' |
RoxygenNote: | 5.0.1 |
NeedsCompilation: | no |
Packaged: | 2016-03-07 16:26:39 UTC; haky |
Repository: | CRAN |
Date/Publication: | 2016-03-08 00:12:43 |
Multithreaded cross validation routine for Omic Kriging.
Description
This is a flexible cross validation routine which wraps the Omic Kriging
calculation. The user can specify the size of the test set, all the way to
"Leave One Out" cross validation. Additionally, all relevant parameters in the
okriging
function are exposed. This function uses the doParallel
package to distribute computation over multiple cores. If the phenotype is
case/control, a ROCR AUC and GLM analysis is run and the results printed to screen.
Usage
krigr_cross_validation(cor.list, pheno.df, pheno.id = 1, h2.vec,
covar.mat = NULL, nfold = 10, ncore = "all", verbose = FALSE, ...)
Arguments
cor.list |
A list of correlation matrices used in Kriging. rownames and colnames of cor should be IID list and include idtest and idtrain. |
pheno.df |
A data frame with rownames set as sample IDs and a column containing phenotype data. |
pheno.id |
The name of the column in pheno which contains phenotype data to test. |
h2.vec |
has weights for each RM relatednes matrix |
covar.mat |
Data frame of covariates with rownames() set to sample IDs. |
nfold |
Select the number of cross validation rounds to run. The value "LOOCV" will run one round of cross validation for each sample in your dataset. The value "ncore" will set the test set size such that a single round runs on each core specified in the ncore option. Any numeric value will be set to the test size. Default runs 10 rounds of cross validation. |
ncore |
The number of cores available to distribute computaition across If a numeric value is supplied, that number of cores is registered. If the value "all" is supplied, all available cores are used. |
verbose |
Report rounds on cross validation on standard out. |
... |
Optional and unnamed arguments. |
Value
A dataframe with three columns: sample ID, observed phenotype Ytest, and predicted phenotype Ypred
Loads sample phenotype and covariate data into data frame.
Description
This function loads a file into a data frame. This file should contain one row per sample in your study, and one column for each covariate and phenotype of interest. Additionally, it requires a header with "IID" for the column of sample IDs, and a unique name for each phenotype and covariate.
Usage
load_sample_data(phenoFile, main.pheno)
Arguments
phenoFile |
File path to the phenotype/covariate file. |
main.pheno |
Column name of the main phenotype of interest. |
Value
A data frame with dimensions (# of samples) x (# of phenotypes/covar)
Compute gene expression correlation matrix.
Description
This function computes a gene expression correlation matrix given a file of transcript expression levels for each sample in the study. It returns a correlation matrix with rownames and colnames as sample IDs.
Usage
make_GXM(expFile = NULL, gxmFilePrefix = NULL, idfile = NULL)
Arguments
expFile |
Path to gene expression file. |
gxmFilePrefix |
File path prefixes for outputting GCTA style binary correlation matrices. |
idfile |
Path to file containing family IDs and sample IDs with header FID and IID. |
Value
Returns a correlation matrix of (N-samples x N-samples), with rownames and colnames as sample IDs.
Examples
## load gene expression values from vignette
expressionFile <- system.file(package = "OmicKriging",
"doc/vignette_data/ig_gene_subset.txt.gz")
## compute correlation matrix
geneCorrelationMatrix <- make_GXM(expressionFile)
Run Principal Component Analysis (PCA) using the irlba package.
Description
A simple wrapper around the irlba() function which computes a partial SVD
efficiently. This function's run time depends on the number of eigenvectors
requested but scales well. Use this function to generate covariates for use
with the okriging
or krigr_cross_validation
functions.
Usage
make_PCs_irlba(X, n.top = 2)
Arguments
X |
A correlation matrix. |
n.top |
Number of top principal compenents to return |
Value
A matrix of Principal Components of dimension (# of samples) x (n.top). As expected, eigenvectors are ordered by eigenvalue. Rownames are given as sample IDs.
References
library(irlba)
Examples
## compute PC's using the gene expression correlation matrix from vignette
## load gene expression values from vignette
expressionFile <- system.file(package = "OmicKriging",
"doc/vignette_data/ig_gene_subset.txt.gz")
## compute correlation matrix
geneCorrelationMatrix <- make_GXM(expressionFile)
## find top ten PC's of this matrix using SVD
topPcs <- make_PCs_irlba(geneCorrelationMatrix, n.top=10)
Run Principal Component Analysis (PCA) using base R svd() function.
Description
A simple wrapper around the base R svd() function which returns the top N
eigenvectors of a matrix. Use this function to generate covariates for use
with the okriging
or krigr_cross_validation
functions. This wrapper preserves the rownames of the original matrix.
Usage
make_PCs_svd(X, n.top = 2)
Arguments
X |
A correlation matrix. |
n.top |
Number of top principal compenents to return |
Value
A matrix of Principal Components of dimension (# of samples) x (n.top). As expected, eigenvectors are ordered by eigenvalue. Rownames are given as sample IDs.
Examples
## compute PC's using the gene expression correlation matrix from vignette
## load gene expression values from vignette
expressionFile <- system.file(package = "OmicKriging",
"doc/vignette_data/ig_gene_subset.txt.gz")
## compute correlation matrix
geneCorrelationMatrix <- make_GXM(expressionFile)
## find top ten PC's of this matrix using SVD
topPcs <- make_PCs_svd(geneCorrelationMatrix, n.top=10)
Run omic kriging on a set of correlation matrices and a given phenotype.
Description
Universal kriging formula: lambda' = ( c + X m )' iSig m' = ( x - X' iSig c )' ( X' iSig X )^-1 m' = ( t(x) - c' iSig X ) ( X' iSig X )^-1 lambda' = (c' + m' X) iSig x: #covariates x ntest X: ntrain x #cov c: ntrain x ntest
Usage
okriging(idtest, idtrain = NULL, corlist, H2vec, pheno, phenoname,
Xcova = NULL)
Arguments
idtest |
A vector of sample IDs which constitute the test set. |
idtrain |
A vector of sample IDs which constitute the training set. |
corlist |
A list of correlation matrices used in Kriging. rownames and colnames of cor should be IID list and include idtest and idtrain. |
H2vec |
has weights for each RM relatednes matrix |
pheno |
A data frame with rownames set as sample IDs and a column containing phenotype data. |
phenoname |
The name of the column in pheno which contains phenotype data to test. |
Xcova |
Data frame of covariates with rownames() set to sample IDs. |
Value
A dataframe with three columns: sample ID, observed phenotype Ytest, and predicted phenotype Ypred
References
Cressie 1993 Statistics for Spatial Data p.154
Read the GRM binary file.
Description
Function provided by GCTA maintainers (modified slightly) for accessing their recently introduced binary GRM format. The GRM is stored as a vector of numerics which correspond to the lower triangular elements including the diagonal. We simply read these, pull the diagonal elements, and inflate them into a full symmetric matrix. We add sample IDs to colnames and rownames for compatibility with other Kriging functions.
Usage
read_GRMBin(prefix, size = 4)
Arguments
prefix |
The file path prefix to GRM binary files (e.g., test.grm.bin, test.grm.N.bin, test.grm.id.) |
size |
The length (in bytes) of each value in the raw GRM vector. Default is 4, and matches GRM writen by GCTA 1.11. |
Details
Note that the GRM is described by three files, and this function assumes that all have a common prefix that is passed in.
Value
GRM of dim (N.samples x N.samples) with rownames and colnames as sample ID.
References
http://www.complextraitgenomics.com/software/gcta/estimate_grm.html
Examples
## read binary Genetic Relatedness Matrix (GRM) generated by GCTA
grmFile <- system.file(package = "OmicKriging",
"doc/vignette_data/ig_genotypes.grm.bin")
grmFileBase <- substr(grmFile,1, nchar(grmFile) - 4)
GRM <- read_GRMBin(grmFileBase)
Write GRM binary files.
Description
Function to write a binary GRM format recently introduced by GCTA. It takes a correlation matrix as used by other Kriging functions, and writes three files: binary file for storing the diagonal + lower triangular elements, a text file for sample IDs, and a binary file storing the number of SNPs used in the correlation matrix calculation.
Usage
write_GRMBin(X, n.snps = 0, prefix, size = 4)
Arguments
X |
Correlation matrix with rownames and colnames as sample IDs. |
n.snps |
Number of SNPs used in correlation matrix calculation. Default is 0.0. |
prefix |
Base file path and names for the three output files. |
size |
Number of bytes to write for each value. Default is 4 |
Value
None. Though side effects are writing three files as described above.
References
http://www.complextraitgenomics.com/software/gcta/estimate_grm.html
Examples
## create a random genotype matrix
nSamples <- 10
mMarkers <- 100
X <- matrix(rbinom(n=100, size=2, prob=0.5), nrow=nSamples)
## compute the Genetric Relatedness Matrix
grm <- cor(X)
## write a Genetic Relatedness Matrix (GRM)
## NOTE: to following is not run here -- not writing any files in examples
#write_GRMBin(grm, n.snps=mMarkers, prefix="grm.out")