Type: | Package |
Title: | Maximum Likelihood Estimates of Gaussian Processes |
Version: | 3.1.9 |
Date: | 2022-03-07 |
Author: | Garrett M. Dancik |
Maintainer: | Garrett M. Dancik <dancikg@easternct.edu> |
Depends: | methods |
Suggests: | snowfall |
Description: | Maximum likelihood Gaussian process modeling for univariate and multi-dimensional outputs with diagnostic plots following Santner et al (2003) <doi:10.1007/978-1-4757-3799-8>. Contact the maintainer for a package version that includes sensitivity analysis. |
License: | GPL-2 | GPL-3 [expanded from: GPL (≥ 2)] |
Encoding: | UTF-8 |
NeedsCompilation: | yes |
RoxygenNote: | 7.1.1 |
Packaged: | 2022-03-10 13:03:59 UTC; dancikg |
Repository: | CRAN |
Date/Publication: | 2022-03-10 13:40:02 UTC |
mlegp package
Description
Maximum likelihood Gaussian process modeling for univariate and multi-dimensional outputs with diagnostic plots and sensitivity analysis.
Details
Package: | mlegp |
Type: | Package |
Version: | 2.0 |
Date: | 2007-12-05 |
License: | Gnu General Public License (Version 3) |
This package obtains maximum likelihood estimates of Gaussian processes (GPs) for univariate and multi-dimensional outputs, for Gaussian processes with product exponential correlation structure; a constant or linear regression mean function; no nugget term, constant nugget term, or a nugget matrix that can be specified up to a multiplicative constant. The latter provides some flexibility for using GPs to model heteroscedastic responses.
Multi-dimensional output can be modelled by fitting independent GPs to each output, or to the most important principle component weights following singular value decomposition of the output. Plotting of main effects for functional output is also implemented.
Contact the maintainer for a package version that implements sensitivity analysis including Functional Analysis of Variance (FANOVA) decomposition, plotting functions to obtain diagnostic plots, main effects, and two-way factor interactions.
For a complete list of functions, use 'library(help="mlegp")'.
Author(s)
Garrett M. Dancik dancikg@easternct.edu
References
Santner, T.J. Williams, B.J., Notz, W., 2003. The Design and Analysis of Computer Experiments (New York: Springer).
Schonlau, M. and Welch, W. 2006. Screening the Input Variables to a Computer Model Via Analysis of Variance and Visualization, in Screening: Methods for Experimentation in Industry, Drug Discovery, and Genetics. A Dean and S. Lewis, eds. (New York: Springer).
Heitmann, K., Higdon, D., Nakhleh, C., Habib, S., 2006. Cosmic Calibration. The Astrophysical Journal, 646, 2, L1-L4.
https://github.com/gdancik/mlegp/
Gaussian process cross-validation
Description
For a Gaussian process, calculates cross-validated predictions and the variance of cross-validated predictions for all points of the design. These are cross-validated in the sense that when predicting output at design point x, all observations at x are removed from the collection of observed outputs
Usage
CV(gp, predictObserved = TRUE, verbose = FALSE)
Arguments
gp |
an object of type |
predictObserved |
if |
verbose |
if |
Value
a matrix where the first column corresponds to the cross-validated predictions and the second column corresponds to the variance of the cross-validated predictions
Author(s)
Garrett M. Dancik dancikg@easternct.edu
References
https://github.com/gdancik/mlegp
See Also
Examples
## fit a single Gaussian process ##
x = -5:5; y1 = sin(x) + rnorm(length(x),sd=.1)
fit1 = mlegp(x, y1)
cv = CV(fit1) ## note that cv is the same as fit1$cv
creates a Gaussian process object
Description
creates a Gaussian process gp
object
Usage
createGP(X, Z, beta, a, meanReg, sig2, nugget,
param.names = 1:dim(X)[2], constantMean = 1)
Arguments
X |
the design matrix |
Z |
output obtained from the design matrix |
beta |
vector of correlation coefficients |
a |
vector of smoothness parameters in the correlation function (if |
meanReg |
the constant mean if |
sig2 |
the unconditional variance of the Gaussian process |
nugget |
the constant nugget or a vector of length |
param.names |
optional vector of parameter names (with length equal to |
constantMean |
1 if the Gaussian process has a constant mean; 0 otherwise |
Value
an object of class gp
that contains the following components:
Z |
matrix of observations |
numObs |
number of observations |
X |
the design matrix |
numDim |
number of dimensions of X |
constantMean |
1 if GP has a constant mean; 0 otherwise |
mu |
the mean matrix |
Bhat |
mean function regression coefficients |
beta |
correlation parameters |
a |
smoothness parameters in correlation function |
sig2 |
unconditional variance of predicted expected output |
params |
vector of parameter names, corresponding to columns of |
invVarMatrix |
inverse var-cov matrix |
nugget |
constant nugget or vector corresponding to the diagonal nugget matrix for a single observation generated from each element in |
loglike |
the log likelihood of the observations |
cv |
results from cross-validation, where
|
Note
this function is called by mlegp
and should not be called by the user
Author(s)
Garrett M. Dancik dancikg@easternct.edu
References
https://github.com/gdancik/mlegp/
See Also
Gaussian Process Plotting Functions
Description
sets the number of sections on the current device based on the number of figures to draw
Usage
createWindow(n)
Arguments
n |
the number of figures to draw |
Details
Sets the graphical device so that the number of columns (ncol) is trunc(sqrt(n)) and the number of rows is ceiling(n/ncol)
This function is called by mlegp
plotting functions that construct separate graphs for multiple parameters
Author(s)
Garrett M. Dancik dancikg@easternct.edu
References
https://github.com/gdancik/mlegp/
See Also
This function may be used by plot.gp
, among others
Gaussian Process Lists
Description
Creates an object of type gp.list
, given a list of Gaussian processes fit to separate sets of observations, or a list of Gaussian processes fit to principle component weights to approximate output of high dimension
Usage
gp.list(..., param.names = NULL, UD = NULL, gp.names = NULL)
Arguments
... |
either a |
param.names |
optionally, the parameter names corresponding to the columns of the design matrix of all Gaussian processes. By default, this will be equal to the parameter names of the first Gaussian process in |
UD |
the UD matrix, if the Gaussian process is fit to principle component weights |
gp.names |
optionally, a vector of names for the Gaussian processes, defaulting to ‘gp \#1’, ‘gp \#2’, ... |
Value
A gp.list
object is a list
object, where the first k elements correspond to k Gaussian processes passed in as ...
. This makes it straightforward to access a single Gaussian process. In addition, gp.list
contains components:
params |
a vector of parameter names, corresponding to the columns of the design matrix |
numGPs |
the number of Gaussian processes in the list |
numDim |
the number of parameters in the design matrix |
numObs |
the number of observations |
names |
the names of the Gaussian processes |
Note
currently, we require that all Gaussian processes have the same dimension (number of columns in the design matrix) and the same number of observations
this function is called by mlegp
and should not be called by the user
Author(s)
Garrett M. Dancik dancikg@easternct.edu
References
https://github.com/gdancik/mlegp/
See Also
mlegp
, mlegp-svd-functions
for more details about the UD matrix
Examples
x = -5:5
y1 = sin(x) + rnorm(length(x), sd=.1)
y2 = sin(x) + rnorm(length(x), sd = .5)
## create the gp.list object ##
fitMulti = mlegp(x, cbind(y1,y2))
plot(fitMulti)
fitMulti ## print summary of of the fitted Gaussian process list
fitMulti[[2]] ## print summary for the 2nd Gaussian process
Gaussian Process and Gaussian Process Lists
Description
Test for a Gaussian process object or a Gaussian process list object
Usage
is.gp(x)
is.gp.list(x)
Arguments
x |
object to be tested |
Value
is.gp
returns TRUE
or FALSE
, depending on whether its argument inherits the gp
class or not
is.gp.list
returns TRUE
or FALSE
, depending on whether its argument inherits the gp.list
class or not
Author(s)
Garrett M. Dancik dancikg@easternct.edu
References
https://github.com/gdancik/mlegp/
Examples
## fit a single Gaussian process ##
x = -5:5; y1 = sin(x) + rnorm(length(x),sd=.1)
fit1 = mlegp(x, y1)
is.gp(fit1) ## returns TRUE
is.gp.list(fit1) ## returns FALSE
mlegp: maximum likelihood estimation of Gaussian process parameters
Description
Finds maximum likelihood estimates of Gaussian process parameters for a vector (or matrix) of one (or more) responses. For multiple responses, the user chooses between fitting independent Gaussian processes to the separate responses or fitting independent Gaussian processes to principle component weights obtained through singular value decomposition of the output. The latter is useful for functional output or data rich situations.
Usage
mlegp(X, Z, constantMean = 1, nugget = NULL, nugget.known = 0,
min.nugget = 0, param.names = NULL, gp.names = NULL,
PC.UD = NULL, PC.num = NULL, PC.percent = NULL,
simplex.ntries = 5, simplex.maxiter = 500, simplex.reltol = 1e-8,
BFGS.maxiter = 500, BFGS.tol = 0.01, BFGS.h = 1e-10, seed = 0,
verbose = 1, parallel = FALSE)
Arguments
X |
the design matrix |
Z |
vector or matrix of observations; corresponding to the rows of |
constantMean |
a value of 1 indicates that each Gaussian process will have a constant mean; otherwise the mean function will be a linear regression in |
nugget |
if nugget.known is 1, a fixed value to use for the nugget or a vector corresponding to the fixed diagonal nugget matrix; otherwise, either a positive initial value for the nugget term which will be estimated, or a vector corresponding to the diagonal nugget matrix up to a multiplicative constant. If |
nugget.known |
1 if a plug-in estimate of the nugget will be used; 0 otherwise |
min.nugget |
minimum value of the nugget term; 0 by default |
param.names |
a vector of parameter names, corresponding to the columns of X; parameter names are ‘p1’, ‘p2’, ... by default |
gp.names |
a vector of GP names, corresponding to the GPs fit to each column of |
PC.UD |
the UD matrix if |
PC.num |
the number of principle component weights to keep in the singular value decomposition of |
PC.percent |
if not |
simplex.ntries |
the number of simplexes to run |
simplex.maxiter |
maximum number of evaluations / simplex |
simplex.reltol |
relative tolerance for simplex method, defaulting to 1e-16 |
BFGS.maxiter |
maximum number of iterations for BFGS method |
BFGS.tol |
stopping condition for BFGS method is when norm(gradient) < BFGS.tol * max(1, norm(x)), where x is the parameter vector and norm is the Euclidian norm |
BFGS.h |
derivatives are approximated as [f(x+BFGS.h) - f(x)] / BFGS.h) |
seed |
the random number seed |
verbose |
a value of '1' or '2' will result in status updates being printed; a value of '2' results in more information |
parallel |
if TRUE will fit GPs in parallel to each column of Z, or each set of PC weights; See details |
Details
This function calls the C function fitGPFromR which in turn calls fitGP (both in the file fit_gp.h) to fit each Gaussian process.
Separate Gaussian processes are fit to the observations in each column of Z
. Maximum likelihood estimates for correlation and nugget parameters are found through numerical methods (i.e., the Nelder-Mead Simplex and the L-BFGS method), while maximum likelihood estimates of the mean regression parameters and overall variance are calculated in closed form (given the correlation and (scaled) nugget parameters). Multiple simplexes are run, and estimates from the best simplex are used as initial values to the gradient (L-BFGS) method.
Gaussian processes are fit to principle component weights by utilizing the singular value decomposition (SVD) of Z
, Z = UDVprime. Columns of Z
should correspond to a single k-dimensional observation (e.g., functional output of a computer model, evaluated at a particular input)
In the complete SVD, Z
is k x m, and r = min(k,m), U
is k x r, D
is r x r, containing the singular values along the diagonal, and Vprime
is r x m. The output Z
is approximated by keeping l < r singular values, keeping a UD matrix of dimension k x l, and the Vprime
matrix of dimension l x m. Each column of Vprime
now contains l principle component weights, which can be used to reconstruct the functional output.
If nugget.known = 1, nugget = NULL, and replicate observations are present, the nugget will be fixed at its best linear unbiased estimate (a weighted average of sample variances). For each column of Z, a GP will be fit to a collection of sample means rather than all observations. This is the recommended approach as it is more accurate and computationally more efficient.
Parallel support is provided through the package snowfall
which allows multiple GPs to be fit in parallel. The user must set up the cluster using sfInit
and call sfLibrary(mlegp)
to load the library onto the slave nodes. Note: GP fitting is not recommended when the number of observations are large (> 100), in which case sequential GP fitting is faster.
Value
an object of class gp.list
if Z has more than 1 column, otherwise an object of class gp
Note
The random number seed is 0 by default, but should be randomly set by the user
In some situations, especially for noiseless data, it may be desirable to force a nugget term in order to make the variance-covariance matrix of the Gaussian process more stable; this can be done by setting the argument min.nugget
.
If fitting multiple Gaussian processes, the arguments min.nugget
and nugget
apply to all Gaussian processes being fit.
In some cases, the variance-covariance matrix is stable in C but not stable in R. When this happens, this function will attempt to impose a minimum value for the nugget term, and this will be reported. However, the user is encouraged to refit the GP and manually setting the argument min.nugget
in mlegp
.
When fitting Gaussian processes to principle component weights, a minimum of two principle component weights must be used.
Author(s)
Garrett M. Dancik dancikg@easternct.edu
References
Santner, T.J. Williams, B.J., Notz, W., 2003. The Design and Analysis of Computer Experiments (New York: Springer).
Heitmann, K., Higdon, D., Nakhleh, C., Habib, S., 2006. Cosmic Calibration. The Astrophysical Journal, 646, 2, L1-L4.
Dancik, GM and Dorman, KS (2008). mlegp: statistical analysis for computer models of biological systems using R. Bioinformatics 24(17), pp. 1966-1967
https://github.com/gdancik/mlegp/
See Also
createGP
for details of the gp
object; gp.list
for details of the gp.list
object; mlegp-svd-functions
for details on fitting Gaussian processes to high-dimensional data using principle component weights; the L-BFGS method uses C code written by Naoaki Okazaki (http://www.chokkan.org/software/liblbfgs)
Examples
###### fit a single Gaussian process ######
x = -5:5; y1 = sin(x) + rnorm(length(x),sd=.1)
fit1 = mlegp(x, y1)
## summary and diagnostic plots ##
summary(fit1)
plot(fit1)
###### fit a single Gaussian process when replciates are present ######
x = kronecker(-5:5, rep(1,3))
y = x + rnorm(length(x))
## recommended approach: GP fit to sample means; nugget calcualted from sample variances ##
fit1 = mlegp(x,y, nugget.known = 1)
## original approach: GP fit to all observations; look for MLE of nugget ##
fit2 = mlegp(x,y)
###### fit multiple Gaussian processes to multiple observations ######
x = -5:5
y1 = sin(x) + rnorm(length(x),sd=.1)
y2 = sin(x) + 2*x + rnorm(length(x), sd = .1)
fitMulti = mlegp(x, cbind(y1,y2))
## summary and diagnostic plots ##
summary(fitMulti)
plot(fitMulti)
###### fit multiple Gaussian processes using principle component weights ######
## generate functional output ##
x = seq(-5,5,by=.2)
p = 1:50
y = matrix(0,length(p), length(x))
for (i in p) {
y[i,] = sin(x) + i + rnorm(length(x), sd = .01)
}
## we now have 10 functional observations (each of length 100) ##
for (i in p) {
plot(x,y[i,], type = "l", col = i, ylim = c(min(y), max(y)))
par(new=TRUE)
}
## fit GPs to the two most important principle component weights ##
numPCs = 2
fitPC = mlegp(p, t(y), PC.num = numPCs)
plot(fitPC) ## diagnostics
## reconstruct the output Y = UDV'
Vprime = matrix(0,numPCs,length(p))
Vprime[1,] = predict(fitPC[[1]])
Vprime[2,] = predict(fitPC[[2]])
predY = fitPC$UD%*%Vprime
m1 = min(y[39,], predY[,39])
m2 = max(y[39,], predY[,39])
plot(x, y[39,], type="l", lty = 1, ylim = c(m1,m2), ylab = "original y" )
par(new=TRUE)
plot(x, predY[,39], type = "p", col = "red", ylim = c(m1,m2), ylab = "predicted y" )
## Not run:
### fit GPs in parallel ###
library(snowfall)
sfInit(parallel = TRUE, cpus = 2, slaveOutfile = "slave.out")
sfLibrary(mlegp)
fitPC = mlegp(p, t(y), PC.num = 2, parallel = TRUE)
## End(Not run)
Internal Functions for Gaussian Processes
Description
Internal Functions for Gaussian Processes
Details
These functions should not be called by the user
Author(s)
Garrett M. Dancik dancikg@easternct.edu
References
https://github.com/gdancik/mlegp/
mlegp naming functions
Description
Functions that set the design matrix parameter names in a Gaussian process or Gaussian process list or the names of the Gaussian processes in a Gaussian process list
Usage
setParams(x, s)
setGPNames(x, s)
Arguments
x |
an object of type |
s |
a vector of parameter names or Gaussian process names to set |
Details
setParams
sets the parameter names of a Gaussian process (gp
) object or of all Gaussian processes in a Gaussian process list (gp.list
) object. setGPNames
sets the names of the Gaussian processes in an object of type gp.list
Value
the object x
with parameter or Gaussian process names set
Note
Parameter and Gaussian process names are used in the output of various plotting functions. Both of these can also be set when the Gaussian process (list) is created by mlegp
Author(s)
Garrett M. Dancik dancikg@easternct.edu
References
https://github.com/gdancik/mlegp/
See Also
mlegp
for setting parameter and Gaussian process names during object creation
Examples
## fit multiple Gaussian processes to multiple observations ##
x = -5:5
y1 = sin(x) + rnorm(length(x),sd=.1)
y2 = sin(x) + 2*x + rnorm(length(x), sd = .1)
fitMulti = mlegp(x, cbind(y1,y2))
## plot diagnostics with default gp names ##
plot(fitMulti)
## change names and plot again ##
fitMulti = setGPNames(fitMulti, c("y1", "y2"))
plot(fitMulti)
## plot diagnostic for the first Gaussian process, predicted vs. parameter ##
plot(fitMulti[[1]], type = 2)
## change parameter names (of all Gaussian processes) and plot again ##
fitMulti = setParams(fitMulti, "param 1")
plot(fitMulti[[1]], type = 2)
Gaussian Process Nugget Related Functions
Description
Functions for detecting replicates and for calculating sample variance at specific design points
Usage
varPerReps(X, Y)
estimateNugget(X, Y)
anyReps(X)
Arguments
X |
the design matrix |
Y |
a vector (or 1 column matrix) of observations |
Value
varPerReps
returns a 1-column matrix where element i corresponds to the sample variance in observations corresponding to design point X[i]
estimateNugget
returns a double calculated by taking the mean of the matrix returned by varPerReps
anyReps
returns TRUE
if two or more rows of X
are identical
Note
These functions are used by mlegp
to set an initial value of the nugget when a constant nugget is being estimated. The function varPerReps
may also be useful for specifying the form of the nugget matrix for use with mlegp
.
Author(s)
Garrett M. Dancik dancikg@easternct.edu
References
https://github.com/gdancik/mlegp/
Examples
x = matrix(c(1,1,2,3,3)) # the design matrix
y = matrix(c(5,6,7,0,10)) # output
anyReps(x)
varPerReps(x,y)
estimateNugget(x,y)
Parameter Lookup Functions
Description
These functions are used to match the name of a parameter with its position in a parameter list
Usage
toParamIndexes(m, string)
matchIndexes(m, string)
Arguments
m |
a vector of names of the parameters of interest |
string |
a vector of parameter names |
Value
a vector where element i contains the position of m[i]
in string
if m
contains integers and toParamIndexes
is called, m
will be returned, without a check of whether
or not the indices are valid
if m[i]
is not an element of string
, toParamIndexes
will display an error, whereas matchIndexes
will return NA
Note
this function does not need to be called by the user; it exists so that the user can pass in a vector of parameter numbers or parameter names to various functions when specifying a subset of the parameters of the Gaussian process design matrix
for toParamIndexes
, m
can contain integers or characters, but cannot contain both
Author(s)
Garrett M. Dancik dancikg@easternct.edu
References
https://github.com/gdancik/mlegp/
Examples
param.names = c("one", "two", "three")
toParamIndexes(c("one", "three"), param.names)
#toParamIndexes(c("four"), param.names) # will give an error
Singular Value Decomposition functions for mlegp
Description
Functions that deal with the singular value decomposition of an output Y, for use with Gaussian process lists
Usage
pcweights(Y, weights.num = NULL, cutoff = 99)
getSingularValues(Y)
singularValueImportance(Y)
numSingularValues(Y, cutoff = 99)
Arguments
Y |
the output to decompose, where each column of |
weights.num |
optionally, the number of principle component weights to keep |
cutoff |
if specified, |
Details
Utilizes the singular value decomposition (SVD) of Y
, Y = UDVprime. Columns of Y
should correspond to a single k-dimensional observation (e.g., functional output of a computer model, evaluated at a particular input).
For a k x m matrix Y
, and r = min(k,m), in the complete SVD, U
is k x r, D
is r x r, containing the singular values along the diagonal, and Vprime
is r x m. The output Y
is approximated by keeping l < r singular values, keeping a UD matrix of dimension k x l, and the Vprime
matrix of dimension l x m. Each column of Vprime
now contains l principle component weights, which can be used to reconstruct the functional output.
Value
pcweights
returns a list with components:
UD |
the UD matrix corresponding to the number of principle components kept |
Vprime |
The Vprime matrix corresponding to the number of principle components kept |
Note: the number of principle component weights kept is equal to dim(UD)[2]
getSingularValues
returns a matrix containing the singular values of Y
numSingularValues
returns the minimum number of singular values accounting for cutoff
percent of the variation in Y
singularValueImportance
returns a matrix where element i corresponds to the percentage of total variation in Y
accounted for by the first i singular values
Note
these functions are utilized by mlegp
to fit Gaussian processes to principle component weights
Author(s)
Garrett M. Dancik dancikg@easternct.edu
References
Heitmann, K., Higdon, D., Nakhleh, C., Habib, S., 2006. Cosmic Calibration. The Astrophysical Journal, 646, 2, L1-L4.
https://github.com/gdancik/mlegp/
See Also
Examples
## create functional output that varies based on parameter 'p' ##
x = seq(-5,5,by=.2)
p = 1:50
y = matrix(0,length(p), length(x))
for (i in p) {
y[i,] = sin(x) + i + rnorm(length(x), sd = .1)
}
singularValueImportance(t(y))
numSingularValues(t(y), cutoff = 99.99)
Diagnostic Plots for Gaussian processes
Description
Cross-Validated Diagnostic Plots for Gaussian Processes
Usage
## S3 method for class 'gp'
plot(x, type = 0, params = NULL, sds = 1, CI.at.point = FALSE, ...)
Arguments
x |
an object of class |
type |
the type of graph to plot, 0 by default (see Details) |
params |
for graph types 2 and 3, a vector of parameter names (or parameter indices) to plot against. By default, all parameters are looked at |
sds |
the number of standard deviations to use for confidence bands/intervals, for graph types 0-3 |
CI.at.point |
if TRUE, will plot confidence intervals at each predicted point, rather than bands, which is the default |
... |
additional arguments to plot, but cannot overwrite xlab or ylab |
Details
All plots involve cross-validated predictions and/or cross-validated standardized residuals. The cross-validation is in the sense that for predictions made at design point x, all observations at design point x are removed from the training set.
Where relevant, open circles correspond to Gaussian process predictions, black lines correspond to the observations, and red lines correspond to confidence bands. The argument type
determines the type of graph displayed, and is one of the following integers:
0 for observed vs. predicted AND observed vs. standardized residual (default), | |
1 for observed vs. predicted only, | |
2 for parameter vs. predicted for all parameters, | |
3 for parameter vs. standardized residual for all parameters, | |
4 for normal quantile plot and histogram of standardized residuals |
Author(s)
Garrett M. Dancik dancikg@easternct.edu
References
https://github.com/gdancik/mlegp/
See Also
CV
for cross-validation, plot.gp.list
for plotting gp.list
objects
Examples
## fit the gp ##
x = seq(-5,5,by=.5)
y = sin(x) + rnorm(length(x), sd=.1)
fit = mlegp(x,y)
## plot diagnostics ##
plot(fit)
plot(fit, type = 2)
Diagnostics Plots for Gaussian Process Lists
Description
Cross-validated Diagnostic Plots For Gaussian Process Lists
Usage
## S3 method for class 'gp.list'
plot(x, sds = 1, CI.at.point = FALSE, ...)
Arguments
x |
an object of class |
sds |
the number of standard deviations to use for confidence bands / intervals |
CI.at.point |
if TRUE, will plot confidence intervals around each predicted point, rather than bands, which is the default |
... |
not used; for compatibility with |
Details
All plots involve cross-validated predictions and/or cross-validated standardized residuals. The cross-validation is in the sense that for predictions made at design point x
, all observations at design point x
are removed from the training set.
Where relevant, open circles correspond to Gaussian process output, black lines correspond to the observations, and red lines correspond to confidence bands.
For each Gaussian process in x
, plot.gp
is called using graph type 1, which plots cross-validated predictions vs. observed values.
Author(s)
Garrett M. Dancik dancikg@easternct.edu
References
https://github.com/gdancik/mlegp/
See Also
Examples
## create data for multiple responses ##
x = seq(-5,5)
z1 = 10 - 5*x + rnorm(length(x))
z2 = 4 - 5*x + rnorm(length(x))
z3 = 7*sin(x) + rnorm(length(x))
## fit multiple Gaussian processes ##
fitMulti = mlegp(x, cbind(z1,z2,z3))
## plot diagnostics ##
plot(fitMulti)
Plot Observed Values Vs. Each Dimension of the Design Matrix
Description
Constructs multiple graphs, plotting each parameter from the design matrix on the x-axis and observations on the y-axis
Usage
plotObservedEffects(x, ...)
Arguments
x |
an object of class |
... |
if x is a design matrix, a vector of observations;
if x is of class |
Details
if x
is NOT of class gp
(i.e., x
is a design matrix), all columns of x
will be plotted separately against the vector of observations
if x
is of class gp
, the specified columns of the design matrix of x
will be plotted against the the observations
Note
It is often useful to use this function before fitting the gaussian process, to check that the observations are valid
Author(s)
Garrett M. Dancik dancikg@easternct.edu
References
https://github.com/gdancik/mlegp/
Examples
## create the design and output matrices ##
x1 = kronecker(seq(0,1,by=.25), rep(1,5))
x2 = rep(seq(0,1,by=.25),5)
z = 4 * x1 - 2*x2 + x1 * x2 + rnorm(length(x1), sd = 0.001)
## look at the observed effects prior to fitting the GP ##
plotObservedEffects(cbind(x1,x2), z)
## fit the Gaussian process ##
fit = mlegp(cbind(x1,x2), z, param.names = c("x1", "x2"))
## look at the observed effects of the fitted GP (which are same as above)
plotObservedEffects(fit)
Gaussian Process Predictions
Description
Gaussian Process Predictions
Usage
## S3 method for class 'gp'
predict(object, newData = object$X, se.fit = FALSE, ...)
Arguments
object |
an object of class |
newData |
an optional data frame or matrix with rows corresponding to inputs for which to predict.
If omitted, the design matrix |
se.fit |
a switch indicating if standard errors are desired |
... |
for compatibility with generic method |
Details
The Gaussian process is used to predict output at the design points newData
; if the logical se.fit
is set to TRUE
, standard errors (standard deviations of the predicted values) are also calculated. Note that if the Gaussian process contains a nugget term, these standard deviations correspond to standard deviations of predicted expected values, and NOT standard deviations of predicted observations. However, the latter can be obtained by noting that the variance of a predicted observation equals the variance of the predicted expected value plus a nugget term.
If newData
is equal to the design matrix of object
(the default), and there is no nugget term, the Gaussian process interpolates the observations and the
predictions will be identical to component Z
of object
. For cross-validation, the function CV
should be used.
Value
predict.gp
produces a vector of predictions. If se.fit
is TRUE
, a list with the following components is returned:
fit |
vector as above |
se.fit |
standard error of the predictions |
Note
for predictions with gp.list
objects, call predict.gp
separately for each gp in the list
Author(s)
Garrett M. Dancik dancikg@easternct.edu
References
https://github.com/gdancik/mlegp/
See Also
For cross-validated predictions, see CV
Examples
x = -5:5; y = sin(x) + rnorm(length(x), sd = 0.001)
fit = mlegp(x,y)
predict(fit, matrix(c(2.4, 3.2)))
## predictions at design points match the observations
## (because there is no nugget)
round(predict(fit) - fit$Z, 6)
# this is not necessarily true if there is a nugget
fit = mlegp(x,y, min.nugget = 0.01)
round(predict(fit) - fit$Z,6)
Gaussian Process Summary Information
Description
prints a summary of a Gaussian process object
Usage
## S3 method for class 'gp'
print(x, ...)
Arguments
x |
an object of class |
... |
for compatibility with generic method |
Details
prints a summary of the Gaussian process object x
, by calling summary.gp
Author(s)
Garrett M. Dancik dancikg@easternct.edu
References
https://github.com/gdancik/mlegp/
See Also
summary.gp for more description of the output
Examples
x = -5:5; y1 = sin(x) + rnorm(length(x),sd=.1)
fit1 = mlegp(x, y1)
print(fit1)
Gaussian Process List Summary Information
Description
prints a summary of a Gaussian process list object, or (a subset) of its components
Usage
## S3 method for class 'gp.list'
print(x, nums = NULL, ...)
Arguments
x |
an object of class |
nums |
optionally, a vector of integers corresponding to Gaussian processes in the list to summarize |
... |
for compatibility with generic method |
Details
if nums
is NULL
, prints summary information for the Gaussian process list object x
, using summary.gp.list
if nums
is not NULL
, prints summary information for each Gaussian process specified by nums
, using summary.gp
Author(s)
Garrett M. Dancik dancikg@easternct.edu
References
https://github.com/gdancik/mlegp/
See Also
Examples
x = -5:5
y1 = sin(x) + rnorm(length(x),sd=.1)
y2 = sin(x) + 2*x + rnorm(length(x), sd = .1)
fitMulti = mlegp(x, cbind(y1,y2))
print(fitMulti) ## summary of the Gaussian process list
print(fitMulti, nums = 1:2) ## summary of Gaussian processes 1 and 2
Gaussian Process Summary Information
Description
prints a summary of a Gaussian process object
Usage
## S3 method for class 'gp'
summary(object, ...)
Arguments
object |
an object of class |
... |
for compatibility with generic method |
Details
prints a summary of the Gaussian process object object
. Output should be self explanatory, except for possibly CV RMSE, the cross-validated root mean squared error (the average squared difference between the observations and cross-validated predictions); and CV RMaxSE, the maximum cross-validated root squared error. If the design in the Gaussian process object contains any replicates, the root mean pure error (RMPE), which is the square root of the average within replicate variance and the root max pure error (RMaxPE) are also reported.
Author(s)
Garrett M. Dancik dancikg@easternct.edu
References
https://github.com/gdancik/mlegp/
See Also
createGP
for details of the Gaussian process object
Examples
## no replicates in the design matrix ##
x1 = -5:5; y1 = sin(x1) + rnorm(length(x1),sd=.1)
fit1 = mlegp(x1, y1)
summary(fit1)
## with replicates in the design matrix ##
x2 = kronecker(x1, rep(1,3))
y2 = sin(x2) + rnorm(length(x2), sd = .1)
fit2 = mlegp(x2,y2)
summary(fit2)
Gaussian Process List Summary Information
Description
prints a summary of a Gaussian process list object, or (a subset) of its components
Usage
## S3 method for class 'gp.list'
summary(object, nums = NULL, ...)
Arguments
object |
an object of type |
nums |
optionally, a vector of integers corresponding to Gaussian processes in the list to summarize |
... |
for compatibility with generic method |
Details
if nums
is NULL
, prints out a summary of the Gaussian process list
if nums
is not NULL
, displays a summary of the Gaussian processes specified by nums
by calling summary.gp
for each Gaussian process
Author(s)
Garrett M. Dancik dancikg@easternct.edu
References
https://github.com/gdancik/mlegp/
See Also
Examples
x = -5:5
y1 = sin(x) + rnorm(length(x),sd=.1)
y2 = sin(x) + 2*x + rnorm(length(x), sd = .1)
fitMulti = mlegp(x, cbind(y1,y2))
summary(fitMulti) ## summary of the Gaussian process list
summary(fitMulti, nums = 1:2) ## summary of Gaussian processes 1 and 2
Summary of outputs for each unique input
Description
Finds sample means and variances for a matrix of observations when replicates are present
Usage
uniqueSummary(X,Y)
Arguments
X |
matrix of inputs |
Y |
matrix of outputs corresponding to x |
Details
uniqueSummary
calculates sample means and variances for each unique input. For input values with no replicates, the sample variance will be ‘NA’. This function is used by mlegp
to fit a GP to a collection of means observations at each design point.
Value
A list with the following components:
reps |
the number of reps for each unique design point |
uniqueX |
the input matrix with duplicate inputs removed |
uniqueMeans |
sample means corresponding to each unique input |
uniqueVar |
sample variances corresponding to each unique input |
Author(s)
Garrett M. Dancik dancikg@easternct.edu
References
https://github.com/gdancik/mlegp/
Examples
x = c(1,1,2,2,3)
y = x + rnorm(length(x))
uniqueSummary(x,y)