Help for package india

Type:

Package

Title:

Influence Diagnostics in Statistical Models

Version:

0.1-4

Date:

2026-04-05

Maintainer:

Felipe Osorio <faosorios.stat@gmail.com>

Description:

Set of routines for influence diagnostics by using case-deletion in ordinary least squares, nonlinear regression [Ross (1987). <doi:10.2307/3315198>], ridge estimation [Walker and Birch (1988). <doi:10.1080/00401706.1988.10488370>] and least absolute deviations (LAD) regression [Sun and Wei (2004). <doi:10.1016/j.spl.2003.08.018>].

Depends:

R(≥ 3.5.0), fastmatrix, L1pack

Imports:

stats

License:

GPL-3

URL:

https://github.com/faosorios/india

NeedsCompilation:

yes

LazyLoad:

yes

Packaged:

2026-04-05 14:37:57 UTC; root

Author:

Felipe Osorio

[aut, cre]

Repository:

CRAN

Date/Publication:

2026-04-05 15:20:02 UTC

Aircraft data

Description

This dataset is presented in Rousseeuw and Leroy (1987, pp. 154), and the aim is to model the cost of 23 single-engine aircraft (in unit of $100,000) as a function of the following explanatory variables: aspect ratio, lift-to-drag ratio, weight of the plane, in pounds, and maximal thrust.

Usage

data(aircraft)

Format

A data frame with 23 observations on the following 5 variables.

aspect: aspect ratio.
lift2drag: lift-to-drag ratio.
weight: weight of the plane (in pounds).
thrust: maximal thrust.
cost: cost of 23 single-engine aircraft (in unit of $100,000).

Source

Rousseeuw, P.J., Leroy, A.M. (1987). Robust Regression and Outlier Detection. Wiley, New York.

Cook's distances

Description

Cook's distance is a measure to assess the influence of the ith observation on the model parameter estimates. This function computes the Cook's distance based on leave-one-out cases deletion for ordinary least squares, nonlinear least squares, lad and ridge regression.

Usage

## S3 method for class 'ols'
cooks.distance(model, ...)
## S3 method for class 'nls'
cooks.distance(model, ...)
## S3 method for class 'lad'
cooks.distance(model, ...)
## S3 method for class 'ridge'
cooks.distance(model, type = "cov", ...)

Arguments

model

an R object, returned by ols, nls, lad or ridge.

type

only required for 'ridge' objects, options available are "1st", "cov" and "both" to obtain the Cook's distance based on Equation (2.5), (2.6) or both by Walker and Birch (1988), respectively.

...

further arguments passed to or from other methods.

Value

A vector whose ith element contains the Cook's distance,

D_i(\bold{M},c) = \frac{(\hat{\bold{\beta}}_{(i)} - \hat{\bold{\beta}})^T\bold{M} (\hat{\bold{\beta}}_{(i)} - \hat{\bold{\beta}})}{c},

for i = 1,\dots,n, with \bold{M} a positive definite matrix and c > 0. Specific choices of \bold{M} and c are done for objects of class ols, nls, lad and ridge.

The Cook's distance for nonlinear regression is based on linear approximation, which may be inappropriate for expectation surfaces markedly nonplanar.

References

Cook, R.D., Weisberg, S. (1980). Characterizations of an empirical influence function for detecting influential cases in regression. Technometrics 22, 495-508. doi:10.1080/00401706.1980.10486199

Cook, R.D., Weisberg, S. (1982). Residuals and Influence in Regression. Chapman and Hall, London.

Ross, W.H. (1987). The geometry of case deletion and the assessment of influence in nonlinear regression. The Canadian Journal of Statistics 15, 91-103. doi:10.2307/3315198

Sun, R.B., Wei, B.C. (2004). On influence assessment for LAD regression. Statistics & Probability Letters 67, 97-110. doi:10.1016/j.spl.2003.08.018

Walker, E., Birch, J.B. (1988). Influence measures in ridge regression. Technometrics 30, 221-227. doi:10.1080/00401706.1988.10488370

Examples

# Cook's distances for linear regression
fm <- ols(stack.loss ~ ., data = stackloss)
CD <- cooks.distance(fm)
plot(CD, ylab = "Cook's distances", ylim = c(0,0.8))
text(21, CD[21], label = as.character(21), pos = 3)

# Cook's distances for LAD regression
fm <- lad(stack.loss ~ ., data = stackloss)
CD <- cooks.distance(fm)
plot(CD, ylab = "Cook's distances", ylim = c(0,0.4))
text(17, CD[17], label = as.character(17), pos = 3)

# Cook's distances for ridge regression
data(portland)
fm <- ridge(y ~ ., data = portland)
CD <- cooks.distance(fm)
plot(CD, ylab = "Cook's distances", ylim = c(0,0.5))
text(8, CD[8], label = as.character(8), pos = 3)

# Cook's distances for nonlinear regression
data(skeena)
model <- recruits ~ b1 * spawners * exp(-b2 * spawners)
fm <- nls(model, data = skeena, start = list(b1 = 3, b2 = 0))
CD <- cooks.distance(fm)
plot(CD, ylab = "Cook's distances", ylim = c(0,0.35))
obs <- c(5, 6, 9, 19, 25)
text(obs, CD[obs], label = as.character(obs), pos = 3)

QQ-plot of residuals with simulated envelope

Description

Constructs a normal QQ-plot with simulated envelope of residuals from a fitted model object.

Usage

envelope(object, ...)
## S3 method for class 'lm'
envelope(object, reps = 50, conf = 0.95, 
  type = c("quantile", "standard", "student"), plot.it = TRUE, ...)
## S3 method for class 'lad'
envelope(object, reps = 50, conf = 0.95, plot.it = TRUE, ...)
## S3 method for class 'ols'
envelope(object, reps = 50, conf = 0.95, 
  type = c("quantile", "standard", "student"), plot.it = TRUE, ...)
## S3 method for class 'nls'
envelope(object, reps = 50, conf = 0.95, plot.it = TRUE, ...)
## S3 method for class 'ridge'
envelope(object, reps = 50, conf = 0.95, plot.it = TRUE, ...)

Arguments

object

an R object, returned by lm, lad, ols, nls or ridge.

reps

number of simulated point patterns to be generated when computing the envelopes. The default number is 50, a larger number of replications will produce a smoother band, although it takes more time.

conf

the confidence level required for the construction of the envelope. The default is to find 95% confidence envelopes.

type

a character string indicating the type of residuals that should be used in the construction of the envelope. The available options are randomized quantile ("quantile"), standardized ("standard") and studentized ("student") residuals. Standardized and studentized residuals are only available for objects of class "lm" and "ols"; otherwise, quantile residuals are used.

plot.it

if TRUE it will draw the corresponding plot, if FALSE it will only return the computed values.

...

further arguments passed to or from other methods.

Value

a list containing the following elements:

residuals

a vector with the selected (see type argument) residuals.

envelope

a matrix with two columns corresponding to the values of the lower and upper pointwise confidence envelope.

References

Atkinson, A.C. (1985). Plots, Transformations and Regression. Oxford University Press, Oxford.

Osorio, F. (2026). On the mean-shift outlier model for LAD regression. Working paper.

Venables, W.N., Ripley, B.D. (1999). Modern Applied Statistics with S-PLUS, 3rd Ed. Springer, New York.

Examples

# QQ-plot with envelope for linear regression
fm <- ols(stack.loss ~ ., data = stackloss)
z <- envelope(fm, reps = 500)

# QQ-plot with envelope for LAD regression
data(ereturns)
fm <- lad(m.marietta ~ CRSP, data = ereturns)
z <- envelope(fm, reps = 500)

# QQ-plot with envelope for ridge regression
data(portland)
fm <- ridge(y ~ ., data = portland)
z <- envelope(fm, reps = 500)

# QQ-plot with envelope nonlinear regression
data(skeena)
model <- recruits ~ b1 * spawners * exp(-b2 * spawners)
fm <- nls(model, data = skeena, start = list(b1 = 3, b2 = 0))
z <- envelope(fm, reps = 500)

Leverages

Description

Computes leverage measures from a fitted model object.

Usage

leverages(model, ...)
## S3 method for class 'lm'
leverages(model, infl = lm.influence(model, do.coef = FALSE), ...)
## S3 method for class 'nls'
leverages(model, type = "tangent", ...)
## S3 method for class 'ols'
leverages(model, ...)
## S3 method for class 'ridge'
leverages(model, ...)

## S3 method for class 'ols'
hatvalues(model, ...)
## S3 method for class 'ridge'
hatvalues(model, ...)

Arguments

model

an R object, returned by lm, nls, ols or ridge.

infl

influence structure as returned by lm.influence.

type

only required for 'nls' objects, options available are "tangent" and "jacobian" to obtain the tangent plane leverage (Ross, 1987) and Jacobian leverage matrix (St. Laurent and Cook, 1992, 1993), respectively.

...

further arguments passed to or from other methods.

Value

A vector containing the diagonal of the prediction (or ‘hat’) matrix.

For linear regression (i.e., for "lm" or "ols" objects) the prediction matrix assumes the form

\bold{H} = \bold{X}(\bold{X}^T\bold{X})^{-1}\bold{X}^T,

in which case, h_{ii} = \bold{x}_i^T(\bold{X}^T\bold{X})^{-1}\bold{x}_i for i=1,\dots,n. Whereas for ridge regression, the prediction matrix is given by

\bold{H}(\lambda) = \bold{X}(\bold{X}^T\bold{X} + \lambda\bold{I})^{-1}\bold{X}^T,

where \lambda represents the ridge parameter. Thus, the diagonal elements of \bold{H}(\lambda), are h_{ii}(\lambda) = \bold{x}_i^T(\bold{X}^T\bold{X} + \lambda\bm{I})^{-1}\bold{x}_i, i=1,\dots,n.

In nonlinear regression, the tangent plane leverage matrix is given by

\hat{\bold{H}} = \hat{\bold{F}}(\hat{\bold{F}}^T\hat{\bold{F}})^{-1}\hat{\bold{F}}^T,

where \bold{F} = \bold{F}(\bold{\beta}) is the n\times p local model matrix with ith row \partial f_i(\bold{\beta})/\partial\bold{\beta} and \hat{\bold{F}} = \bold{F}(\hat{\bold{\beta}}), whereas the Jacobian leverage matrix adopts the form:

\hat{\bold{J}} = \hat{\bold{F}}(\hat{\bold{F}}^T\hat{\bold{F}} - [\hat{\bold{r}}^T][\hat{\bold{W}}])^{-1}\hat{\bold{F}}^T,

where \hat{\bold{r}} = \bold{Y} - \bold{f}(\hat{\bold{\beta}}) is the vector of residuals, \hat{\bold{W}} is the n\times p\times p three-dimensional array constructed by n Hessian matrices of order p\times p of \bold{f}(\bold{\beta}) and [][] denotes array multiplication as defined by Bates and Watts (1980).

Note

This function never creates the prediction matrix and only obtains its diagonal elements using the QR or singular value decomposition (only for ridge regression) of \bold{X} or \hat{\bold{F}}.

Function hatvalues only is a wrapper for function leverages.

Warning: Starting from version 4.6.0 of R, the function hatvalues.nls() will be included in the stats package; therefore, it has been removed from india. Please use function leverages.nls() instead.

References

Bates, D.W., Watts, D.G. (1980). Relative curvature measures of nonlineary (with discussion). Journal of the Royal Statistical Society, Series B 42, 1-25. doi:10.1111/j.2517-6161.1980.tb01094.x

Chatterjee, S., Hadi, A.S. (1988). Sensivity Analysis in Linear Regression. Wiley, New York.

Cook, R.D., Weisberg, S. (1982). Residuals and Influence in Regression. Chapman and Hall, London.

Ross, W.H. (1987). The geometry of case deletion and the assessment of influence in nonlinear regression. The Canadian Journal of Statistics 15, 91-103. doi:10.2307/3315198

St. Laurent, R.T., Cook, R.D. (1992). Leverage and superleverage in nonlinear regression. Journal of the American Statistical Association 87, 985-990. doi:10.1080/01621459.1992.10476253

St. Laurent, R.T., Cook, R.D. (1993). Leverage, local influence and curvature in nonlinear regression. Biometrika 80, 99-106. doi:10.1093/biomet/80.1.99

Walker, E., Birch, J.B. (1988). Influence measures in ridge regression. Technometrics 30, 221-227. doi:10.1080/00401706.1988.10488370

Examples

# Leverages for linear regression
fm <- ols(stack.loss ~ ., data = stackloss)
lev <- leverages(fm)
cutoff <- 2 * mean(lev)
plot(lev, ylab = "Leverages", ylim = c(0,0.45))
abline(h = cutoff, lty = 2, lwd = 2, col = "red")
text(17, lev[17], label = as.character(17), pos = 3)

# Leverages for ridge regression
data(portland)
fm <- ridge(y ~ ., data = portland)
lev <- leverages(fm)
cutoff <- 2 * mean(lev)
plot(lev, ylab = "Leverages", ylim = c(0,0.7))
abline(h = cutoff, lty = 2, lwd = 2, col = "red")
text(10, lev[10], label = as.character(10), pos = 3)

# Leverages for nonlinear regression 
data(skeena)
model <- recruits ~ b1 * spawners * exp(-b2 * spawners)
fm <- nls(model, data = skeena, start = list(b1 = 3, b2 = 0))
lev0 <- leverages(fm, type = "tangent")
plot(lev0, ylab = "Tangent plane leverages", ylim = c(0,0.25))
obs <- c(1,9)
text(obs, lev0[obs], label = as.character(obs), pos = 3)
lev1 <- leverages(fm, type = "jacobian")
plot(lev1, ylab = "Jacobian leverages", ylim = c(0,0.25))
obs <- c(1,9)
text(obs, lev1[obs], label = as.character(obs), pos = 3)
# in this example both leverages (tangent and jacobian) 
# are quite similar
plot(lev0, lev1, xlab = "Tangent plane leverages", 
     ylab = "Jacobian leverages")
abline(c(0,1), lty = 2, lwd = 2, col = "red")

Likelihood Displacement

Description

Compute the likelihood displacement influence measure based on leave-one-out cases deletion for linear models, lad and ridge regression.

Usage

logLik.displacement(model, ...)
## S3 method for class 'lm'
logLik.displacement(model, pars = "full", ...)
## S3 method for class 'ols'
logLik.displacement(model, pars = "full", ...)
## S3 method for class 'nls'
logLik.displacement(model, ...)
## S3 method for class 'lad'
logLik.displacement(model, method = "quasi", pars = "full", ...)
## S3 method for class 'ridge'
logLik.displacement(model, pars = "full", ...)

Arguments

model

an R object, returned by lm, nls, ols, lad or ridge.

pars

should be considered the whole vector of parameters (pars = "full"), or only the vector of coefficients (pars = "coef"). This option is not used for nls objects.

method

only required for 'lad' objects, options available are "quasi" and "BR" and "approx" to obtain the likelihood displacement based on Sun and Wei (2004), Elian et al. (2000) approaches, respectively.

...

further arguments passed to or from other methods.

Value

A vector whose ith element contains the distance between the likelihood functions,

LD_i(\bold{\beta},\sigma^2) = 2\{l(\hat{\bold{\beta}},\hat{\sigma}^2) - l(\hat{\bold{\beta}}_{(i)},\hat{\sigma}^2_{(i)})\},

for pars = "full", where \hat{\bold{\beta}}_{(i)} and \hat{\sigma}^2_{(i)} denote the estimates of \bold{\beta} and \sigma^2 when the ith observation is removed from the dataset. If we are interested only in \bold{\beta} (i.e. pars = "coef") the likelihood displacement becomes

LD_i(\bold{\beta}|\sigma^2) = 2\{l(\hat{\bold{\beta}},\hat{\sigma}^2) - \max_{\sigma^2} l(\hat{\bold{\beta}}_{(i)},\hat{\sigma}^2)\}.

References

Cook, R.D., Weisberg, S. (1982). Residuals and Influence in Regression. Chapman and Hall, London.

Cook, R.D., Pena, D., Weisberg, S. (1988). The likelihood displacement: A unifying principle for influence measures. Communications in Statistics - Theory and Methods 17, 623-640. doi:10.1080/03610928808829645

Elian, S.N., Andre, C.D.S., Narula, S.C. (2000). Influence measure for the L1 regression. Communications in Statistics - Theory and Methods 29, 837-849. doi:10.1080/03610920008832518

Ogueda, A., Osorio, F. (2025). Influence diagnostics for ridge regression using the Kullback-Leibler divergence. Statistical Papers 66, 85. doi:10.1007/s00362-025-01701-1

Ross, W.H. (1987). The geometry of case deletion and the assessment of influence in nonlinear regression. The Canadian Journal of Statistics 15, 91-103. doi:10.2307/3315198

Sun, R.B., Wei, B.C. (2004). On influence assessment for LAD regression. Statistics & Probability Letters 67, 97-110. doi:10.1016/j.spl.2003.08.018

Examples

# Likelihood displacement for linear regression
fm <- ols(stack.loss ~ ., data = stackloss)
LD <- logLik.displacement(fm)
plot(LD, ylab = "Likelihood displacement", ylim = c(0,9))
text(21, LD[21], label = as.character(21), pos = 3)

# Likelihood displacement for LAD regression
fm <- lad(stack.loss ~ ., data = stackloss)
LD <- logLik.displacement(fm)
plot(LD, ylab = "Likelihood displacement", ylim = c(0,1.5))
text(17, LD[17], label = as.character(17), pos = 3)

# Likelihood displacement for ridge regression
data(portland)
fm <- ridge(y ~ ., data = portland)
LD <- logLik.displacement(fm)
plot(LD, ylab = "Likelihood displacement", ylim = c(0,4))
text(8, LD[8], label = as.character(8), pos = 3)

# Likelihood displacement for nonlinear regression
data(skeena)
model <- recruits ~ b1 * spawners * exp(-b2 * spawners)
fm <- nls(model, data = skeena, start = list(b1 = 3, b2 = 0))
LD <- logLik.displacement(fm)
plot(LD, ylab = "Likelihood displacement", ylim = c(0,0.7))
obs <- c(5, 6, 9, 19, 25)
text(obs, LD[obs], label = as.character(obs), pos = 3)

Portland cement dataset

Description

This dataset comes from an experimental investigation of the heat evolved during the setting and hardening of Portland cements of varied composition and the dependence of this heat on the percentages of four compounds in the clinkers from which the cement was produced.

Usage

data(portland)

Format

A data frame with 13 observations on the following 5 variables.

y: The heat evolved after 180 days of curing, measured in calories per gram of cement.
x1: Tricalcium aluminate.
x2: Tricalcium silicate.
x3: Tetracalcium aluminoferrite.
x4: \beta-dicalcium silicate.

Source

Kaciranlar, S., Sakallioglu, S., Akdeniz, F., Styan, G.P.H., Werner, H.J. (1999). A new biased estimator in linear regression and a detailed analysis of the widely-analysed dataset on Portland cement. Sankhya, Series B 61, 443-459.

Relative change in the condition number

Description

Compute the relative condition index to identify collinearity-influential points in linear models.

Usage

relative.condition(x)

Arguments

x

the model matrix \bold{X}.

Value

To assess the influence of the ith row of \bold{X} on the condition index of \bold{X}, Hadi (1988) proposed the relative change,

\delta_i = \frac{\kappa_{(i)} - \kappa}{\kappa},

for i=1,\dots,n, where \kappa = \kappa(\bold{X}) and \kappa_{(i)} = \kappa(\bold{X}_{(i)}) denote the (scaled) condition index for \bold{X} and \bold{X}_{(i)}, respectively.

References

Chatterjee, S., Hadi, A.S. (1988). Sensivity Analysis in Linear Regression. Wiley, New York.

Hadi, A.S. (1988). Diagnosing collinerity-influential observations. Computational Statistics & Data Analysis 7, 143-159. doi:10.1016/0167-9473(88)90089-8.

Examples

data(portland)
fm <- ridge(y ~ ., data = portland, x = TRUE)
x <- fm$x
rel <- relative.condition(x)
plot(rel, ylab = "Relative condition number", ylim = c(-0.1,0.4))
abline(h = 0, lty = 2, lwd = 2, col = "red")
text(3, rel[3], label = as.character(3), pos = 3)

Randomized quantile residuals

Description

Compute randomized quantile residuals from a fitted model object.

Usage

rquantile(model, ...)
## S3 method for class 'lm'
rquantile(model, ...)
## S3 method for class 'lad'
rquantile(model, ...)
## S3 method for class 'ols'
rquantile(model, ...)
## S3 method for class 'nls'
rquantile(model, ...)
## S3 method for class 'ridge'
rquantile(model, ...)

Arguments

model

an R object, returned by lm, lad, ols, nls or ridge.

...

further arguments passed to or from other methods.

Value

a vector containing standard normal deviates representing standardized residuals. This kind of residuals are exactly normal.

References

Dunn, P.K., Smyth, G.K. (1996). Randomized quantile residuals. Journal of Computational and Graphical Statistics 5, 236-244. doi:10.1080/10618600.1996.10474708

Osorio, F. (2026). On the mean-shift outlier model for LAD regression. Working paper.

Examples

# quantile residuals for linear regression
fm <- ols(stack.loss ~ ., data = stackloss)
res <- rquantile(fm)
plot(res, ylim = c(-3,3), ylab = "quantile residuals")
abline(h = 0, lwd = 2, col = "gray75")
abline(h = c(-2,2), lwd = 2, lty = 2, col = "red")
text(21, res[21], as.character(21), pos = 1)

# quantile residuals for LAD regression
data(ereturns)
fm <- lad(m.marietta ~ CRSP, data = ereturns)
res <- rquantile(fm)
plot(res, ylim = c(-2,4.5), ylab = "quantile residuals")
abline(h = 0, lwd = 2, col = "gray75")
abline(h = c(-2,2), lwd = 2, lty = 2, col = "red")
obs <- c(8,15,34)
text(obs, res[obs], as.character(obs), pos = 3)

# quantile residuals for ridge regression
data(portland)
fm <- ridge(y ~ ., data = portland)
res <- rquantile(fm)
plot(res, ylim = c(-2,2), ylab = "quantile residuals")
abline(h = 0, lwd = 2, col = "gray75")

# quantile residuals for nonlinear regression
data(skeena)
model <- recruits ~ b1 * spawners * exp(-b2 * spawners)
fm <- nls(model, data = skeena, start = list(b1 = 3, b2 = 0))
res <- rquantile(fm)
plot(res, ylim = c(-3,3), ylab = "quantile residuals")
abline(h = 0, lwd = 2, col = "gray75")
abline(h = c(-2,2), lwd = 2, lty = 2, col = "red")
text(5, res[5], as.character(5), pos = 3)

Skeena River sockeye salmon data

Description

The data have 28 observations of spawners and recruits (units are thousands of fish) from 1940 until 1967 for the Skeena river sockeye salmon stock.

Usage

data(skeena)

Format

A data frame with 28 observations on the following 3 variables.

year: Years in which the number of spawners and recruits were recorded.
spawners: Size of the annual spawning stock.
recruits: Production of new catchable-sized fish.

Source

Carroll, R.J., Ruppert, D. (1988). Transformation and Weighting in Regression. Chapman and Hall, London.

Package {india}

Aircraft data

Description

Usage

Format

Source

Cook's distances

Description

Usage

Arguments

Value

References

Examples

QQ-plot of residuals with simulated envelope

Description

Usage

Arguments

Value

References

Examples

Leverages

Description

Usage

Arguments

Value

Note

References

Examples

Likelihood Displacement

Description

Usage

Arguments

Value

References

Examples

Portland cement dataset

Description

Usage

Format

Source

Relative change in the condition number

Description

Usage

Arguments

Value

References

Examples

Randomized quantile residuals

Description

Usage

Arguments

Value

References

Examples

Skeena River sockeye salmon data

Description

Usage

Format

Source