Type: Package
Title: Penalized Partial Least Squares
Depends: R (≥ 3.5.0)
Imports: splines, MASS
Version: 2.0.0
Description: Linear and nonlinear regression methods based on Partial Least Squares and Penalization Techniques. Model parameters are selected via cross-validation, and confidence intervals ans tests for the regression coefficients can be conducted via jackknifing. The method is described and applied to simulated and experimental data in Kraemer et al. (2008) <doi:10.1016/j.chemolab.2008.06.009>.
License: GPL-2 | GPL-3 [expanded from: GPL (≥ 2)]
Encoding: UTF-8
RoxygenNote: 7.3.2
NeedsCompilation: no
Packaged: 2025-07-22 08:19:17 UTC; vguillem
Author: Nicole Kraemer [aut], Anne-Laure Boulesteix [aut], Vincent Guillemot [cre, aut]
Maintainer: Vincent Guillemot <vincent.guillemot@pasteur.fr>
Repository: CRAN
Date/Publication: 2025-07-22 12:10:06 UTC

Penalty Matrix for Higher Order Differences

Description

Computes the block-diagonal penalty matrix penalizing higher-order differences.

Usage

Penalty.matrix(m, order = 2)

Arguments

m

Numeric vector indicating sizes of blocks.

order

Integer indicating the order of differences (default is 2).

Details

For each block of size m[j], and default order = 2, computes:

v^\top P_j v = \sum_{i=3}^{m[j]} (v_i - 2v_{i-1} + v_{i-2})^2.

The final penalty matrix is block-diagonal composed of blocks P_j.

Value

Penalty matrix (numeric matrix) of dimension sum(m) x sum(m).

References

Kraemer, N., Boulesteix, A.-L., & Tutz, G. (2008). Penalized Partial Least Squares with Applications to B-Spline Transformations and Functional Data. Chemometrics and Intelligent Laboratory Systems, 94, 60-69. https://doi.org/10.1016/j.chemolab.2008.06.009

Examples

P <- Penalty.matrix(c(6, 4), order = 2)


Nonlinear Transformation via B-Splines

Description

Transforms each column of a numeric matrix (or vector) into a new basis defined by B-spline functions.

Usage

X2s(X, Xtest = NULL, deg = 3, nknot = NULL, reduce.knots = FALSE)

Arguments

X

Numeric matrix or vector of input data.

Xtest

Optional numeric matrix or vector of test data. Defaults to X.

deg

Degree of the B-splines (default is 3).

nknot

Vector specifying the number of knots per column. Default is rep(20, ncol(X)).

reduce.knots

Logical. Reduces knots to avoid constant columns if TRUE (default is FALSE).

Value

A list containing:

Z

Design matrix for training data (B-spline coefficients).

Ztest

Design matrix for test data.

sizeZ

Vector of number of basis functions for each column.

References

Kraemer, N., Boulesteix, A.-L., & Tutz, G. (2008). Penalized Partial Least Squares with Applications to B-Spline Transformations and Functional Data. Chemometrics and Intelligent Laboratory Systems, 94, 60-69. https://doi.org/10.1016/j.chemolab.2008.06.009

Examples

X <- matrix(rnorm(100), ncol = 5)
Xtest <- matrix(rnorm(300), ncol = 5)
result <- X2s(X, Xtest)


Extract Regression Coefficients from a mypls Object

Description

Returns the regression coefficients (without intercept) from an object of class mypls, typically produced by the function jack.ppls.

Usage

## S3 method for class 'mypls'
coef(object, ...)

Arguments

object

An object of class mypls, which must contain the elements coefficients and covariance.

...

Additional arguments passed to methods (currently unused).

Details

This method returns the vector of regression coefficients associated with the penalized PLS fit stored in the mypls object. These coefficients can be used together with the variance-covariance matrix returned by vcov.mypls to construct confidence intervals or hypothesis tests.

Value

A numeric vector containing the regression coefficients corresponding to the penalized PLS model.

References

N. Kraemer, A.-L. Boulesteix, and G. Tutz (2008). Penalized Partial Least Squares with Applications to B-Spline Transformations and Functional Data. Chemometrics and Intelligent Laboratory Systems, 94(1), 60–69. doi:10.1016/j.chemolab.2008.06.009

See Also

vcov.mypls, jack.ppls

Examples

n <- 50  # number of observations
p <- 5   # number of variables
X <- matrix(rnorm(n * p), ncol = p)
y <- rnorm(n)

pls.object <- penalized.pls.cv(X, y)
my.jack <- jack.ppls(pls.object)
my.coef <- coef(my.jack)
print(my.coef)


Description

This dataset contains measurements from a quantitative NIR spectroscopy experiment designed to evaluate the feasibility of using NIR spectra to estimate the chemical composition of unbaked biscuit doughs.

Usage

data(cookie)

Format

A list of 2 data-frames of 72 observations:

NIR

NIR reflectance spectrum values from 1100 to 2498 nm on 700 columns.

constituents

Percentage of fat, sucrose, dry flour and water in the 72 samples.

Details

Two sets of samples were prepared with variations in a standard biscuit recipe to produce a broad range for each of the four ingredients of interest: fat, sucrose, dry flour, and water.

The first 40 samples correspond to a calibration (training) set, and the remaining 32 samples form a validation (prediction) set. Sample 23 (training) and sample 21 (test) are known outliers.

Each sample is represented by an NIR reflectance spectrum composed of 700 values measured between 1100 and 2498 nanometers, at 2 nm intervals. The last 4 columns represent the percentage of each constituent.

References

Examples

data(cookie) # load data
X <- cookie$NIR       # NIR spectra
Y <- cookie$constituents     # constituent values
Xtrain <- X[1:40, ]; Ytrain <- Y[1:40, ]   # calibration set
Xtest <- X[41:72, ]; Ytest <- Y[41:72, ]   # validation set


Plot Penalized PLS Components for Spline-Transformed Data

Description

This function applies a nonlinear regression model using penalized Partial Least Squares (PLS) on B-spline transformed variables, then visualizes each additive component.

Usage

graphic.ppls.splines(
  X,
  y,
  lambda = NULL,
  add.data = FALSE,
  select = FALSE,
  ncomp = 1,
  deg = 3,
  order = 2,
  nknot = NULL,
  reduce.knots = FALSE,
  kernel = TRUE,
  window.size = c(3, 3)
)

Arguments

X

A numeric matrix of input data.

y

A numeric response vector.

lambda

A numeric value for the penalization parameter. Default is NULL.

add.data

Logical. If TRUE, the original data points X and y are added to the plots. Default is FALSE.

select

Logical. If TRUE, the function fits only one block (variable) per iteration (block-wise selection). Default is FALSE.

ncomp

Integer. Number of PLS components to use. Default is 1.

deg

Integer. Degree of the B-spline basis. Default is 3.

order

Integer. Order of the differences to penalize. Default is 2.

nknot

A numeric vector specifying the number of knots for each variable. Default is NULL, which uses rep(20, ncol(X)).

reduce.knots

Logical. If TRUE, automatically reduces the number of knots for variables leading to constant basis functions. Default is FALSE.

kernel

Logical. If TRUE, uses the kernelized version of PPLS. Default is TRUE.

window.size

A numeric vector of length 2 indicating the number of plots per row and column. Default is c(3, 3) (3 rows and 3 columns).

Details

This function first transforms the input data X and a test grid Xtest using B-spline basis functions, then fits a penalized PLS model using these transformed variables. Each additive component (i.e., variable effect) is then plotted individually.

If add.data = TRUE, the actual observations are plotted on top of the corresponding fitted component functions. While this can help visualize the fit, note that only the sum of all fitted components approximates y, and not each component individually.

The function is intended for exploratory visualization and should be used after appropriate model selection using, e.g., ppls.splines.cv.

Value

A numeric vector of regression coefficients for the final penalized PLS model.

References

N. Kraemer, A.-L. Boulesteix, and G. Tutz (2008). Penalized Partial Least Squares with Applications to B-Spline Transformations and Functional Data. Chemometrics and Intelligent Laboratory Systems, 94(1), 60–69. doi:10.1016/j.chemolab.2008.06.009

See Also

ppls.splines.cv, X2s, penalized.pls, Penalty.matrix

Examples

# Load Boston housing data
library(MASS)
data(Boston)
y <- Boston[, 14]
X <- Boston[, -14]
X <- X[, -4]  # remove categorical variable
X <- as.matrix(X)

# Plot with variable selection and original data
graphic.ppls.splines(
  X, y, lambda = 100, ncomp = 5,
  add.data = TRUE, select = TRUE, window.size = c(3, 4)
)

# Plot without variable selection and without data
graphic.ppls.splines(
  X, y, lambda = 100, ncomp = 5,
  add.data = FALSE, select = FALSE, window.size = c(3, 4)
)


Jackknife Estimation for Penalized PLS Coefficients

Description

This function computes jackknife estimates (mean and covariance) of the regression coefficients obtained from a cross-validated Penalized Partial Least Squares (PPLS) model.

Usage

jack.ppls(
  ppls.object,
  ncomp = ppls.object$ncomp.opt,
  index.lambda = ppls.object$index.lambda
)

Arguments

ppls.object

An object returned by penalized.pls.cv. Must contain the array coefficients.jackknife as well as fields lambda, ncomp.opt, and index.lambda.

ncomp

Integer. Number of PLS components to use. Default is ppls.object$ncomp.opt.

index.lambda

Integer. Index of the penalization parameter lambda. Default is ppls.object$index.lambda.

Details

The jackknife estimates are computed using the array of regression coefficients obtained in each cross-validation fold. The function returns both the mean coefficients and the associated variance-covariance matrix.

If the requested number of components ncomp or the lambda index index.lambda exceeds the available dimensions of the coefficients.jackknife array, they are adjusted to their maximum allowable values, with a message.

Note: This jackknife procedure is not discussed in Kraemer et al. (2008), but it is useful for statistical inference, such as confidence intervals or hypothesis tests.

Value

An object of class "mypls", which is a list containing:

coefficients

The mean regression coefficients across cross-validation splits.

covariance

The estimated covariance matrix of the coefficients.

k

Number of cross-validation folds used.

ncomp

Number of components used in estimation.

index.lambda

Index of the lambda value used.

References

N. Kraemer, A.-L. Boulesteix, and G. Tutz (2008). Penalized Partial Least Squares with Applications to B-Spline Transformations and Functional Data. Chemometrics and Intelligent Laboratory Systems, 94(1), 60–69. doi:10.1016/j.chemolab.2008.06.009

See Also

penalized.pls.cv, coef.mypls, vcov.mypls, ttest.ppls

Examples

data(cookie)  # load example data
X <- as.matrix(cookie$NIR)  # NIR spectra
y <- cookie$constituents$fat    # extract one constituent

pls.object <- penalized.pls.cv(X, y, ncomp = 10, kernel = TRUE)
my.jack <- jack.ppls(pls.object)
coef(my.jack)
vcov(my.jack)

Normalize a Numeric Vector to Unit Length

Description

Returns the input vector normalized to have unit Euclidean norm (i.e., length equal to 1).

Usage

normalize.vector(v)

Arguments

v

A numeric vector.

Details

This function performs:

v_\text{normalized} = \frac{v}{\sqrt{\sum v_i^2}}

It is primarily used to normalize weight vectors or component directions in Partial Least Squares algorithms.

Note: If the input vector has zero norm, the function returns NaN due to division by zero.

Value

A numeric vector of the same length as v, with unit norm.

See Also

penalized.pls, penalized.pls.default, penalized.pls.kernel

Examples

v <- c(3, 4)
normalize.vector(v)  # returns c(0.6, 0.8)

v2 <- rnorm(10)
sqrt(sum(normalize.vector(v2)^2))  # should be 1


Predict New Data Using a Penalized PLS Model

Description

Given a fitted penalized PLS model and new test data, this function predicts the response for all components. If true response values are provided, it also returns the mean squared error (MSE) for each component.

Computes the regression coefficients for a Penalized Partial Least Squares (PPLS) model, using either a classical NIPALS algorithm or a kernel-based version. Optionally allows block-wise variable selection.

Performs k-fold cross-validation to evaluate and select the optimal penalization parameter lambda and the number of components ncomp in a PPLS model.

Computes the regression coefficients using the standard (NIPALS-based) version of Penalized PLS. This function is typically called internally by penalized.pls.

Computes the regression coefficients using the kernel-based version of Penalized PLS, especially useful when the number of predictors exceeds the number of observations (p >> n).

Computes the regression coefficients of a Penalized Partial Least Squares (PPLS) model using block-wise selection, where each component is restricted to use variables from only one block.

Usage

new.penalized.pls(ppls, Xtest, ytest = NULL)

penalized.pls(
  X,
  y,
  P = NULL,
  ncomp = NULL,
  kernel = FALSE,
  scale = FALSE,
  blocks = 1:ncol(X),
  select = FALSE
)

penalized.pls.cv(
  X,
  y,
  P = NULL,
  lambda = 1,
  ncomp = NULL,
  k = 5,
  kernel = FALSE,
  scale = FALSE
)

penalized.pls.default(X, y, M = NULL, ncomp)

penalized.pls.kernel(X, y, M = NULL, ncomp)

penalized.pls.select(X, y, M = NULL, ncomp, blocks)

Arguments

ppls

A fitted penalized PLS model, as returned by penalized.pls.

Xtest

A numeric matrix of new input data for prediction.

ytest

Optional. A numeric response vector corresponding to Xtest, for evaluating prediction error.

X

A numeric matrix of centered (and optionally scaled) predictor variables.

y

A centered numeric response vector.

P

Optional penalty matrix. If NULL, ordinary PLS is computed (i.e., no penalization).

ncomp

Integer. Number of PLS components to compute.

kernel

Logical. If TRUE, uses the kernel representation of PPLS. Default is FALSE.

scale

Logical. If TRUE, scales predictors in X to unit variance. Default is FALSE.

blocks

An integer vector of length ncol(X) that defines the block structure of the variables. All variables sharing the same value in blocks belong to the same block.

select

Logical. If TRUE, block-wise variable selection is applied in each iteration. Only one block contributes to the latent direction per component. Default is FALSE.

lambda

A numeric vector of candidate penalty parameters. Default is 1.

k

Integer. Number of cross-validation folds. Default is 5.

M

Optional penalty transformation matrix M = (I + P)^{-1}. If NULL, no penalization is applied.

Details

The fitted model ppls contains intercepts and regression coefficients for each number of components (from 1 to ncomp). The function computes:

The prediction is performed as:

\hat{y}^{(i)} = X_\text{test} \cdot \beta^{(i)} + \text{intercept}^{(i)},

for each number of components i = 1, \ldots, ncomp.

This function centers X and y, and optionally scales X, then computes PPLS components using one of:

When a penalty matrix P is supplied, a transformation M = (I + P)^{-1} is computed internally. The algorithm then maximizes the penalized covariance between Xw and y:

\text{argmax}_w \; \text{Cov}(Xw, y)^2 - \lambda \cdot w^\top P w

The block-wise selection strategy (when select = TRUE) restricts the weight vector w at each iteration to be non-zero in a single block, selected greedily.

The function splits the data into k cross-validation folds, and for each value of lambda and number of components up to ncomp, computes the mean squared prediction error.

The optimal parameters are selected as those minimizing the prediction error across all folds. Internally, for each fold and lambda value, the function calls penalized.pls to fit the model and new.penalized.pls to evaluate predictions.

The returned object can be further used for statistical inference (e.g., via jackknife) or prediction.

The method is based on iteratively computing latent directions that maximize the covariance with the response y. At each step:

The final regression coefficients are computed via a triangular system using the bidiagonal matrix R = T^\top X W, and backsolving:

\beta = W L (T^\top y),

where L = R^{-1}.

The kernel PPLS algorithm is based on representing the model in terms of the Gram matrix K = X M X^\top (or simply K = X X^\top if M = NULL). The algorithm iteratively computes orthogonal latent components t_i in sample space.

Steps:

  1. Initialize residual vector u = y, then normalize t = Ku.

  2. Orthogonalize t with respect to previous components (if needed).

  3. Repeat for ncomp components.

The regression coefficients are recovered as:

\beta = X^\top A, \quad \text{where } A = UU L (T^\top y),

with UU and TT the matrices of latent vectors and components, and L = R^{-1} the back-solved triangular system.

This function implements a sparse selection strategy inspired by sparse or group PLS. At each component iteration, it computes the penalized covariance between X and y, and selects the block k for which the mean squared weight of its variables is maximal:

\text{score}_k = \frac{1}{|B_k|} \sum_{j \in B_k} w_j^2

Only the weights corresponding to the selected block are retained, and all others are set to zero. The rest of the algorithm follows the classical NIPALS-like PLS with orthogonal deflation.

This procedure enhances interpretability by selecting only one block per component, making it suitable for structured variable selection (e.g., grouped predictors).

Value

A list containing:

ypred

A numeric matrix of predicted responses. Each column corresponds to a different number of PLS components.

mse

A numeric vector of mean squared errors, if ytest is provided. Otherwise NULL.

A list with components:

intercept

A numeric vector of intercepts for 1 to ncomp components.

coefficients

A numeric matrix of size ncol(X) x ncomp, each column being the coefficient vector for the corresponding number of components.

An object of class "mypls", a list with the following components:

error.cv

A matrix of mean squared errors. Rows correspond to different lambda values; columns to different numbers of components.

lambda

The vector of candidate lambda values.

lambda.opt

The lambda value giving the minimum cross-validated error.

index.lambda

The index of lambda.opt in lambda.

ncomp.opt

The optimal number of PLS components.

min.ppls

The minimum cross-validated error.

intercept

Intercept of the optimal model (fitted on the full dataset).

coefficients

Coefficient vector for the optimal model.

coefficients.jackknife

An array of shape ncol(X) x ncomp x length(lambda) x k, containing the coefficients from each CV split and parameter setting.

A list with:

coefficients

A matrix of size ncol(X) x ncomp, each column containing the regression coefficients for the first i components.

A list with:

coefficients

A matrix of size ncol(X) x ncomp, containing the estimated regression coefficients for each number of components.

A list with:

coefficients

A matrix of size ncol(X) x ncomp, containing the regression coefficients after block-wise selection.

References

N. Kraemer, A.-L. Boulesteix, and G. Tutz (2008). Penalized Partial Least Squares with Applications to B-Spline Transformations and Functional Data. Chemometrics and Intelligent Laboratory Systems, 94(1), 60–69. doi:10.1016/j.chemolab.2008.06.009

N. Kraemer, A.-L. Boulesteix, and G. Tutz (2008). Penalized Partial Least Squares with Applications to B-Spline Transformations and Functional Data. Chemometrics and Intelligent Laboratory Systems, 94(1), 60–69. doi:10.1016/j.chemolab.2008.06.009

N. Kraemer, A.-L. Boulesteix, and G. Tutz (2008). Penalized Partial Least Squares with Applications to B-Spline Transformations and Functional Data. Chemometrics and Intelligent Laboratory Systems, 94(1), 60–69. doi:10.1016/j.chemolab.2008.06.009

See Also

penalized.pls, penalized.pls.cv, ppls.splines.cv

penalized.pls.cv, new.penalized.pls, ppls.splines.cv, Penalty.matrix

penalized.pls, new.penalized.pls, jack.ppls, ppls.splines.cv

penalized.pls, penalized.pls.kernel, normalize.vector

penalized.pls, penalized.pls.default, normalize.vector

penalized.pls, penalized.pls.cv, normalize.vector

Examples

set.seed(123)
X <- matrix(rnorm(50 * 200), ncol = 50)
y <- rnorm(200)

Xtrain <- X[1:100, ]
ytrain <- y[1:100]
Xtest <- X[101:200, ]
ytest <- y[101:200]

pen.pls <- penalized.pls(Xtrain, ytrain, ncomp = 10)
pred <- new.penalized.pls(pen.pls, Xtest, ytest)
head(pred$ypred)
pred$mse

## Example from Kraemer et al. (2008)
data(BOD)
X <- BOD[, 1]
y <- BOD[, 2]

Xtest <- seq(min(X), max(X), length = 200)
dummy <- X2s(X, Xtest, deg = 3, nknot = 20)  # Spline transformation
Z <- dummy$Z
Ztest <- dummy$Ztest
size <- dummy$sizeZ
P <- Penalty.matrix(size, order = 2)
lambda <- 200
number.comp <- 3

ppls <- penalized.pls(Z, y, P = lambda * P, ncomp = number.comp)
new.ppls <- new.penalized.pls(ppls, Ztest)$ypred

# Plot fitted values for 2 components
plot(X, y, lwd = 3, xlim = range(Xtest))
lines(Xtest, new.ppls[, 2], col = "blue")

set.seed(42)
X <- matrix(rnorm(20 * 100), ncol = 20)
y <- rnorm(100)

# Example with no penalty
result <- penalized.pls.cv(X, y, lambda = c(0, 1, 10), ncomp = 5)
result$lambda.opt
result$ncomp.opt
result$min.ppls

# Using jackknife estimation after CV
jack <- jack.ppls(result)
coef(jack)

set.seed(123)
X <- matrix(rnorm(20 * 50), nrow = 50)
y <- rnorm(50)
M <- diag(ncol(X))  # No penalty
coef <- penalized.pls.default(scale(X, TRUE, FALSE), scale(y, TRUE, FALSE),
  M, ncomp = 3)$coefficients
coef[, 1]  # coefficients for 1st component

set.seed(123)
X <- matrix(rnorm(100 * 10), nrow = 100)
y <- rnorm(100)
K <- X %*% t(X)
coef <- penalized.pls.kernel(X, y, M = NULL, ncomp = 2)$coefficients
head(coef[, 1])  # coefficients for 1st component

set.seed(321)
X <- matrix(rnorm(40 * 30), ncol = 40)
y <- rnorm(30)

# Define 4 blocks of 10 variables each
blocks <- rep(1:4, each = 10)
result <- penalized.pls.select(X, y, M = NULL, ncomp = 2, blocks = blocks)
result$coefficients[, 1]  # Coefficients for first component


Cross-Validation for Penalized PLS with Spline-Transformed Predictors

Description

Performs cross-validation to select the optimal number of components and penalization parameter for a penalized partial least squares model (PPLS) fitted to spline-transformed predictors.

Usage

ppls.splines.cv(
  X,
  y,
  lambda = 1,
  ncomp = NULL,
  degree = 3,
  order = 2,
  nknot = NULL,
  k = 5,
  kernel = FALSE,
  scale = FALSE,
  reduce.knots = FALSE,
  select = FALSE
)

Arguments

X

A numeric matrix of input predictors.

y

A numeric response vector.

lambda

A numeric vector of penalty parameters. Default is 1.

ncomp

Integer. Maximum number of PLS components. Default is min(nrow(X) - 1, ncol(X)).

degree

Integer. Degree of B-splines (e.g., 3 for cubic splines). Default is 3.

order

Integer. Order of the differences used in the penalty matrix. Default is 2.

nknot

Integer or vector. Number of knots per variable (before adjustment). If NULL, defaults to rep(20, ncol(X)).

k

Number of folds for cross-validation. Default is 5.

kernel

Logical. Whether to use the kernel representation of PPLS. Default is FALSE.

scale

Logical. Whether to standardize predictors to unit variance. Default is FALSE.

reduce.knots

Logical. If TRUE, adaptively reduces the number of knots when overfitting is detected. Default is FALSE.

select

Logical. If TRUE, applies block-wise variable selection. Default is FALSE.

Details

This function performs the following steps for each cross-validation fold:

  1. Transforms predictors using B-spline basis functions via X2s.

  2. Computes the penalty matrix using Penalty.matrix.

  3. Fits a penalized PLS model using penalized.pls with the given lambda and number of components.

  4. Evaluates prediction performance on the test fold using new.penalized.pls.

The optimal parameters are those minimizing the average squared prediction error across all folds.

Value

A list with the following components:

error.cv

Matrix of prediction errors: rows = lambda values, columns = components.

min.ppls

The minimum cross-validated error.

lambda.opt

Optimal lambda value.

ncomp.opt

Optimal number of components.

References

N. Kraemer, A.-L. Boulesteix, and G. Tutz (2008). Penalized Partial Least Squares with Applications to B-Spline Transformations and Functional Data. Chemometrics and Intelligent Laboratory Systems, 94(1), 60–69. doi:10.1016/j.chemolab.2008.06.009

See Also

X2s, Penalty.matrix, penalized.pls, penalized.pls.cv

Examples

# Simulated data
set.seed(123)
X <- matrix(rnorm(30 * 100), ncol = 30)
y <- rnorm(100)

# Run CV with 3 lambdas and max 4 components
result <- ppls.splines.cv(X, y, lambda = c(1, 10, 100), ncomp = 4)
result$lambda.opt
result$ncomp.opt


Simulate Data for Penalized Partial Least Squares (PPLS)

Description

Generates a training and test dataset with non-linear relationships between predictors and response, as used in PPLS simulation studies.

Usage

sim.data.ppls(ntrain, ntest, stnr, p, a = NULL, b = NULL)

Arguments

ntrain

Integer. Number of training observations.

ntest

Integer. Number of test observations.

stnr

Numeric. Signal-to-noise ratio (higher means less noise).

p

Integer. Number of predictors (must be >= 5).

a

Optional numeric vector of length 5. Linear coefficients for the first 5 variables. If NULL, drawn uniformly from [-1, 1].

b

Optional numeric vector of length 5. Nonlinear sine coefficients. If NULL, drawn uniformly from [-1, 1].

Details

The function simulates a response variable y as a combination of additive linear and sinusoidal effects of the first 5 predictors:

f(x) = \sum_{j=1}^{5} a_j x_j + \sin(6 b_j x_j)

The response y is then generated by adding Gaussian noise scaled to match the specified signal-to-noise ratio (stnr).

Remaining variables (p - 5) are included as noise variables, making the dataset suitable to evaluate selection or regularization methods.

Value

A list with the following components:

Xtrain

ntrain x p matrix of training predictors (uniform in [-1, 1]).

ytrain

Numeric vector of training responses.

Xtest

ntest x p matrix of test predictors.

ytest

Numeric vector of test responses.

sigma

Standard deviation of the added noise.

a

Linear coefficients used in the simulation.

b

Nonlinear sine coefficients used in the simulation.

See Also

ppls.splines.cv, graphic.ppls.splines

Examples

set.seed(123)
sim <- sim.data.ppls(ntrain = 100, ntest = 100, stnr = 3, p = 10)
str(sim)
plot(sim$Xtrain[, 1], sim$ytrain, main = "Effect of x1 on y")


t-Test for Penalized PLS Regression Coefficients

Description

Computes two-sided t-tests and p-values for the regression coefficients of a penalized PLS model based on jackknife estimation.

Usage

ttest.ppls(
  ppls.object,
  ncomp = ppls.object$ncomp.opt,
  index.lambda = ppls.object$index.lambda
)

Arguments

ppls.object

An object returned by penalized.pls.cv, containing the jackknife array coefficients.jackknife.

ncomp

Integer. Number of PLS components to use. Default is ppls.object$ncomp.opt.

index.lambda

Integer. Index of the penalty parameter lambda to use. Default is ppls.object$index.lambda.

Details

This function calls jack.ppls to estimate:

It then performs standard two-sided t-tests:

t_j = \frac{\hat{\beta}_j}{\text{SE}_j}, \quad \text{df} = k - 1

and computes associated p-values.

These p-values can be used for variable selection or inference, although they are based on cross-validation folds and should be interpreted with caution.

Value

A list with:

tvalues

Numeric vector of t-statistics.

pvalues

Numeric vector of two-sided p-values.

See Also

jack.ppls, coef.mypls, vcov.mypls

Examples

set.seed(123)
X <- matrix(rnorm(20 * 100), ncol = 20)
y <- rnorm(100)
result <- penalized.pls.cv(X, y, lambda = c(0, 1), ncomp = 3)
tstats <- ttest.ppls(result)
print(tstats$pvalues)


Variance-Covariance Matrix for Penalized PLS Coefficients

Description

Returns the estimated variance-covariance matrix of the regression coefficients from a jackknife-based PPLS model.

Usage

## S3 method for class 'mypls'
vcov(object, ...)

Arguments

object

An object of class "mypls", typically returned by jack.ppls.

...

Additional arguments (currently ignored).

Details

The function retrieves the covariance matrix stored in the object$covariance field. If this field is NULL, a warning is issued and the function returns NULL.

This method can be used in conjunction with coef.mypls and ttest.ppls to conduct inference on the coefficients of a penalized PLS model.

Value

A numeric matrix representing the variance-covariance matrix of the regression coefficients, or NULL if unavailable.

See Also

coef.mypls, jack.ppls, ttest.ppls

Examples

set.seed(42)
X <- matrix(rnorm(30 * 100), ncol = 30)
y <- rnorm(100)

ppls.cv <- penalized.pls.cv(X, y, lambda = c(1, 10), ncomp = 3)
myjack <- jack.ppls(ppls.cv)
Sigma <- vcov(myjack)
Sigma[1:5, 1:5]

mirror server hosted at Truenetwork, Russian Federation.