Type: | Package |
Title: | ROC Curve Inference with and without Covariates |
Version: | 1.0-9 |
Date: | 2024-05-09 |
Imports: | stats, grDevices, graphics, splines, np, Matrix, moments, nor1mix, spatstat.univar, lattice, MASS, pbivnorm, parallel |
Description: | Estimates the pooled (unadjusted) Receiver Operating Characteristic (ROC) curve, the covariate-adjusted ROC (AROC) curve, and the covariate-specific/conditional ROC (cROC) curve by different methods, both Bayesian and frequentist. Also, it provides functions to obtain ROC-based optimal cutpoints utilizing several criteria. Based on Erkanli, A. et al. (2006) <doi:10.1002/sim.2496>; Faraggi, D. (2003) <doi:10.1111/1467-9884.00350>; Gu, J. et al. (2008) <doi:10.1002/sim.3366>; Inacio de Carvalho, V. et al. (2013) <doi:10.1214/13-BA825>; Inacio de Carvalho, V., and Rodriguez-Alvarez, M.X. (2022) <doi:10.1214/21-STS839>; Janes, H., and Pepe, M.S. (2009) <doi:10.1093/biomet/asp002>; Pepe, M.S. (1998) http://www.jstor.org/stable/2534001?seq=1; Rodriguez-Alvarez, M.X. et al. (2011a) <doi:10.1016/j.csda.2010.07.018>; Rodriguez-Alvarez, M.X. et al. (2011a) <doi:10.1007/s11222-010-9184-1>. Please see Rodriguez-Alvarez, M.X. and Inacio, V. (2021) <doi:10.32614/RJ-2021-066> for more details. |
License: | GPL-2 | GPL-3 [expanded from: GPL] |
NeedsCompilation: | no |
Packaged: | 2024-05-31 10:31:41 UTC; mrodriguez |
Author: | Maria Xose Rodriguez-Alvarez
|
Maintainer: | Maria Xose Rodriguez-Alvarez <mxrodriguez@uvigo.es> |
Repository: | CRAN |
Date/Publication: | 2024-05-31 13:30:30 UTC |
ROC Curve Inference with and without Covariates
Description
Estimates the pooled (unadjusted) Receiver Operating Characteristic (ROC) curve, the covariate-adjusted ROC (AROC) curve, and the covariate-specific/conditional ROC (cROC) curve by different methods, both Bayesian and frequentist. Also, it provides functions to obtain ROC-based optimal cutpoints utilizing several criteria. Based on Erkanli, A. et al. (2006) <doi:10.1002/sim.2496>; Faraggi, D. (2003) <doi:10.1111/1467-9884.00350>; Gu, J. et al. (2008) <doi:10.1002/sim.3366>; Inacio de Carvalho, V. et al. (2013) <doi:10.1214/13-BA825>; Inacio de Carvalho, V., and Rodriguez-Alvarez, M.X. (2022) <doi:10.1214/21-STS839>; Janes, H., and Pepe, M.S. (2009) <doi:10.1093/biomet/asp002>; Pepe, M.S. (1998) <http://www.jstor.org/stable/2534001?seq=1>; Rodriguez-Alvarez, M.X. et al. (2011a) <doi:10.1016/j.csda.2010.07.018>; Rodriguez-Alvarez, M.X. et al. (2011a) <doi:10.1007/s11222-010-9184-1>. Please see Rodriguez-Alvarez, M.X. and Inacio, V. (2021) <doi:10.32614/RJ-2021-066> for more details.
Details
Package: | ROCnReg |
Type: | Package |
Version: | 1.0-9 |
Date: | 2024-05-30 |
License: | GPL |
Author(s)
Maria Xose Rodriguez-Alvarez and Vanda Inacio Maintainer: Maria Xose Rodriguez-Alvarez <mxrodriguez@uvigo.es>
References
Erkanli, A., Sung M., Jane Costello, E., and Angold, A. (2006). Bayesian semi-parametric ROC analysis. Statistics in Medicine, 25, 3905–3928.
Faraggi, D. (2003). Adjusting receiver operating characteristic curves and related indices for covariates. The Statistician 52, 179–192.
Gu, J., Ghosal, S., and Roy, A. (2008). Bayesian bootstrap estimation of ROC curve. Statistics in Medicine, 27, 5407–5420.
Inacio de Carvalho, V., Jara, A., Hanson, T. E., and de Carvalho, M. (2013). Bayesian nonparametric ROC regression modeling. Bayesian Analysis, 8, 623–646.
Inacio de Carvalho, V., and Rodriguez-Alvarez, M. X. (2022). The Covariate-Adjusted ROC Curve: The Concept and Its Importance, Review of Inferential Methods, and a New Bayesian Estimator. Statistical Science, 37, 541 -561
Janes, H., and Pepe, M.S. (2009). Adjusting for covariate effects on classification accuracy using the covariate-adjusted receiver operating characteristic curve. Biometrika, 96, 371–382.
Pepe, M.S. (1998). Three approaches to regression analysis of receiver operating characteristic curves for continuous test results. Biometrics 54, 124–135.
Rodriguez-Alvarez, M. X. and Inacio, V., and (2021). ROCnReg: An R Package for Receiver Operating Characteristic Curve Inference with and without Covariate Information. The R Journal
Rodriguez-Alvarez, M.X., Tahoces, P. G., Cadarso-Suarez, C., and Lado, M.J. (2011). Comparative study of ROC regression techniques–Applications for the computer-aided diagnostic system in breast cancer detection. Computational Statistics and Data Analysis, 55, 888–902.
Rodriguez-Alvarez, M.X., Roca-Pardinas, J., and Cadarso-Suarez, C. (2011). ROC curve and covariates: extending induced methodology to the non-parametric framework. Statistics and Computing, 21, 483–499.
Nonparametric Bayesian inference of the covariate-adjusted ROC curve (AROC).
Description
This function estimates the covariate-adjusted ROC curve (AROC) using the nonparametric Bayesian approach proposed by Inacio de Carvalho and Rodriguez-Alvarez (2018).
Usage
AROC.bnp(formula.h, group, tag.h, data, standardise = TRUE,
p = seq(0, 1, l = 101), ci.level = 0.95, compute.lpml = FALSE, compute.WAIC = FALSE,
compute.DIC = FALSE, pauc = pauccontrol(), density = densitycontrol.aroc(),
prior.h = priorcontrol.bnp(), mcmc = mcmccontrol(),
parallel = c("no", "multicore", "snow"), ncpus = 1, cl = NULL)
Arguments
formula.h |
A |
group |
A character string with the name of the variable that distinguishes healthy/nondiseased from diseased individuals. |
tag.h |
The value codifying healthy individuals in the variable |
data |
A data frame representing the data and containing all needed variables. |
standardise |
A logical value. If TRUE both the test outcomes and the continuous covariates assumed to have a linear effect are standardised (i.e., the resulting variables have mean zero and standard deviation of one). The default is TRUE. |
p |
Set of false positive fractions (FPF) at which to estimate the covariate-adjusted ROC curve. |
ci.level |
An integer value (between 0 and 1) specifying the level for the credible interval. The default is 0.95. |
compute.lpml |
A logical value. If TRUE, the log pseudo marginal likelihood (LPML, Geisser and Eddy, 1979) and the conditional predictive ordinates (CPO) are computed. |
compute.WAIC |
A logical value. If TRUE, the widely applicable information criterion (WAIC, Gelman et al., 2014; Watanabe, 2010) is computed. |
compute.DIC |
A logical value. If TRUE, the deviance information criterion (DIC)(Celeux et al., 2006, Spiegelhalter et al., 2002) is computed. |
pauc |
A list of control values to replace the default values returned by the function |
density |
A list of control values to replace the default values returned by the function |
prior.h |
A list of control values to replace the default values returned by the function |
mcmc |
A list of control values to replace the default values returned by the function |
parallel |
A characters string with the type of parallel operation: either "no" (default), "multicore" (not available on Windows) or "snow". |
ncpus |
An integer with the number of processes to be used in parallel operation. Defaults to 1. |
cl |
An object inheriting from class |
Details
Estimates the covariate-adjusted ROC curve (AROC) defined as
AROC\left(p\right) = Pr\{1 - F_{\bar{D}}(Y_D | \mathbf{X}_{D}) \leq p\},
where F_{\bar{D}}(\cdot|\mathbf{X}_{\bar{D}})
denotes the distribution function of Y_{\bar{D}}
conditional on the vector of covariates \mathbf{X}_{\bar{D}}
.
The method implemented in this function combines a single-weights linear dependent Dirichlet process mixture model (De Iorio et al., 2009) to estimate F_{\bar{D}}(\cdot|\mathbf{X}_{\bar{D}})
and the Bayesian bootstrap (Rubin, 1981) to estimate the outside probability. More precisely, and letting \{(\mathbf{x}_{\bar{D}i},y_{\bar{D}i})\}_{i=1}^{n_{\bar{D}}}
be a random sample from the nondiseased population, our postulated model for the conditional distribution function takes the following form
F_{\bar{D}}(y_{\bar{D}i}|\mathbf{X}_{\bar{D}}=\mathbf{x}_{\bar{D}i}) = \sum_{l=1}^{L}\omega_l\Phi(y_{\bar{D}i}\mid\mu_{l}(\mathbf{x}_{\bar{D}i}),\sigma_l^2),
where \Phi(y|\mu, \sigma^2)
denotes the cumulative distribution function of the normal distribution, evaluated at y
, with mean \mu
and variance \sigma^2
. The regression function \mu_{l}(\mathbf{x}_{\bar{D}i})
can incorportate both linear and nonlinear (through B-splines) effects of continuous covariates, categorical covariates (factors) as well as interactions. Interactions between categorical and (nonlinear) continuous covariates are also allowed (factor-by curve interactions). For the sake of simplicity we write \mu_{l}(\mathbf{x}_{\bar{D}i}) = \mathbf{z}_{\bar{D}i}^{T}\mathbf{\beta}_l
(l=1,...,L
), where \mathbf{z}_{\bar{D}i}
is the i
th column of the design matrix (possibly containing a basis representation of some/all continuous covariates). Here L
is a pre-specified upper bound on the number of mixture components. The \omega_l
's result from a truncated version of the stick-breaking construction (\omega_1=v_1
; \omega_l=v_l\prod_{r<l}(1-v_r)
, l=2,\ldots,L
; v_1,\ldots,v_{L-1}\sim
Beta (1,\alpha)
; v_L=1
, \alpha \sim \Gamma(a_{\alpha},b_{\alpha})
), \mathbf{\beta}_l\sim N_{Q}(\mathbf{m},\mathbf{S})
, and \sigma_l^{-2}\sim\Gamma(a,b)
. It is further assumed that \mathbf{m} \sim N_{Q}(\mathbf{m}_0,\mathbf{S}_0)
and \mathbf{S}^{-1}\sim W(\nu,(\nu\Psi)^{-1})
. Here \Gamma(a,b)
denotes a Gamma distribution with shape parameter a
and rate parameter b
, W(\nu,(\nu\Psi)^{-1})
denotes a Wishart distribution with \nu
degrees of freedom and expectation \Psi^{-1}
, and Q
denotes the dimension of the vector \mathbf{z}_{\bar{D}i}
. It is worth mentioning that when L=1
, the model for the conditional distribution of the test outcomes (in the healthy population) reduces to a normal regression model (where continuous covariates effects are modelled either parametrically or nonparametrically). For a detailed description, we refer to Inacio de Carvalho and Rodriguez-Alvarez (2018).
Regarding the area under the curve, we note that
AAUC = \int_{0}^{1}AROC(p)dp = 1 - E\{U_D\},
where U_D = 1 - F_{\bar{D}}(Y_D |\mathbf{X}_D)
. In our implementation, the expectation is computed using the Bayesian bootstrap (using the same weights as those used to estimate the AROC, see Inacio de Carvalho and Rodriguez-Alvarez (2018) for details). As far as the partial area under the curve is concerned, when focus = "FPF"
and assuming an upper bound u_1
for the FPF, what it is computed is
pAAUC_{FPF}(u_1)=\int_0^{u_1} AROC(p)dp = u_1 - E\{U_{D,u_1}\},
where U_{D,u_1} = min\{u_1, 1 - F_{\bar{D}}(Y_D |\mathbf{X}_D)\}
. Again, the expectation is computed using the Bayesian bootstrap. The returned value is the normalised pAAUC, pAAUC_{FPF}(u_1)/u_1
so that it ranges from u_1/2
(useless test) to 1 (perfect marker). Conversely, when focus = "TPF"
, and assuming a lower bound for the TPF of u_2
, the partial area corresponding to TPFs lying in the interval (u_2,1)
is computed as
pAAUC_{TPF}(u_2)=\int_{AROC^{-1}(u_2)}^{1}AROC(p)dp-\{1-AROC^{-1}(u_2)\}\times u_2.
Here, the computation of the integral is done numerically. The returned value is the normalised pAAUC, pAAUC_{TPF}(u_2)/(1-u_2)
, so that it ranges from (1-u_2)/2
(useless test) to 1 (perfect test).
Finally, it is important referring that with respect to the computation of the DIC, when L=1
, it is computed as in Spiegelhalter et al. (2002), and when L>1
, DIC3 as described in Celeux et al. (2006) is computed. Also, for the computation of the conditional predictive ordinates (CPO) we follow the stable version proposed by Gelman et al. (2014).
Value
As a result, the function provides a list with the following components:
call |
The matched call. |
data |
The original supplied data argument. |
missing.ind |
A logical value indicating whether for each pair of observations (test outcomes and covariates) missing values occur. |
marker |
The name of the diagnostic test variable in the dataframe. |
group |
The value of the argument |
tag.h |
The value of the argument |
p |
Set of false positive fractions (FPF) at which the covariate-adjusted ROC curve (AROC) has been estimated. |
ci.level |
The value of the argument |
prior |
A list returning the hyperparameter values. |
ROC |
Estimated covariate-adjusted ROC curve (AROC) (posterior mean) and |
AUC |
Estimated area under the covariate-adjusted ROC curve (AAUC) (posterior mean), and |
pAUC |
If computed, estimated partial area under the covariate-adjusted ROC curve (pAAUC) (posterior mean) and |
newdata |
If |
reg.fun.h |
If |
dens |
If |
lpml |
If computed, a list with two components: the log pseudo marginal likelihood (LPML) and the conditional predictive ordinates (CPO). |
WAIC |
If computed, widely applicable information criterion (WAIC) and associated complexity penalty (pW). |
DIC |
If computed, deviance information criterion (DIC) and associated complexity penalty (pD). |
fit |
Results of the fitting process. A list with the following components: (1) |
data_model |
List with the data used in the fit: observed diagnostic test outcome and design matrices, separately for the healthy and diseased groups. |
Note
The input argument formula.h
is similar to that used for the glm
function, except that flexible specifications can be added by means of the function f()
. For instance, specification y \sim x1 + f(x2, K = 3)
would assume a linear effect of x1
(if x1
continuous) and the effect of x2
would be modeled using B-splines basis functions. The argument K = 3
indicates that 3
internal knots will be used, with the quantiles of x2
used for their location. Categorical variables (factors) can be also incorporated, as well as interaction terms. For example, to include the factor-by-curve interaction between age
and gender
we need to specify, e.g., y \sim gender + f(age, by = gender, K = c(3, 5))
. Note that, in this case, the number of knots can be different for each level of the factor. The order of the vector K
of knots should match the levels of the factor.
References
Celeux, G., Forbes, F., Robert C. P., and Titerrington, D. M. (2006). Deviance information criteria for missing data models. Bayesian Analysis, 1, 651–674.
De Iorio, M., Johnson, W. O., Muller, P., and Rosner, G. L. (2009). Bayesian nonparametric nonproportional hazards survival modeling. Biometrics, 65, 762–775.
Geisser, S. and Eddy, W.F. (1979) A Predictive Approach to Model Selection. Journal of the American Statistical Association, 74, 153–160.
Gelman, A., Carlin, J.B., Stern, H.S., Dunson, D.B., Vehtari, A., and Rubin, D.B. (2014). Bayesian Data Analysis, 3rd ed. CRC Press: Boca Raton, FL.
Gelman, A., Hwang, J., and Vehtari, A. (2014). Understanding predictive information criteria for Bayesian models. Statistics and Computing, 24, 997–1010.
Inacio de Carvalho, V., and Rodriguez-Alvarez, M. X. (2022). The Covariate-Adjusted ROC Curve: The Concept and Its Importance, Review of Inferential Methods, and a New Bayesian Estimator. Statistical Science, 37, 541 -561.
Rubin, D. B. (1981). The Bayesian bootstrap. The Annals of Statistics, 9, 130–134.
Speigelhalter, D. J., Best, N. G., Carlin, B. P., and van der Linde, A. (2002). Bayesian measures of model comparison and fit. Journal of the Royal Statistical Society, Ser. B, 64, 583–639.
Watanabe, S. (2010). Asymptotic Equivalence of Bayes Cross Validation and Widely Applicable Information Criterion in Singular Learning Theory. Journal of Machine Learning Research, 11, 3571–3594.
See Also
AROC.bnp
, AROC.sp
, AROC.kernel
, pooledROC.BB
, pooledROC.emp
, pooledROC.kernel
, pooledROC.dpm
, cROC.bnp
, cROC.sp
or AROC.kernel
.
Examples
library(ROCnReg)
data(psa)
# Select the last measurement
newpsa <- psa[!duplicated(psa$id, fromLast = TRUE),]
# Log-transform the biomarker
newpsa$l_marker1 <- log(newpsa$marker1)
AROC_bnp <- AROC.bnp(formula.h = l_marker1 ~ f(age, K = 0),
group = "status",
tag.h = 0,
data = newpsa,
standardise = TRUE,
p = seq(0,1,l=101),
compute.lpml = TRUE,
compute.WAIC = TRUE,
compute.DIC = TRUE,
pauc = pauccontrol(compute = TRUE, focus = "FPF", value = 0.5),
density = densitycontrol.aroc(compute = TRUE, grid.h = NA, newdata = NA),
prior.h = priorcontrol.bnp(m0 = rep(0, 4), S0 = 10*diag(4), nu = 6, Psi = diag(4),
a = 2, b = 0.5, alpha = 1, L =10),
mcmc = mcmccontrol(nsave = 500, nburn = 100, nskip = 1))
summary(AROC_bnp)
plot(AROC_bnp)
Nonparametric kernel-based estimation of the covariate-adjusted ROC curve (AROC).
Description
This function estimates the covariate-adjusted ROC curve (AROC) using the nonparametric kernel-based method proposed by Rodriguez-Alvarez et al. (2011). The method, as it stands now, can only deal with one continuous covariate.
Usage
AROC.kernel(marker, covariate, group, tag.h,
bw = c("LS", "AIC"),
regtype = c("LC", "LL"),
pauc = pauccontrol(),
data, p = seq(0, 1, l = 101), B = 1000, ci.level = 0.95,
parallel = c("no", "multicore", "snow"), ncpus = 1, cl = NULL)
Arguments
marker |
A character string with the name of the diagnostic test variable. |
covariate |
A character string with the name of the continuous covariate. |
group |
A character string with the name of the variable that distinguishes healthy from diseased individuals. |
tag.h |
The value codifying healthy individuals in the variable |
bw |
A character string specifying which method to use to select the bandwidths. AIC specifies expected Kullback-Leibler cross-validation, and LS specifies least-squares cross-validation. Defaults to LS. For details see |
regtype |
A character string specifying which type of kernel estimator to use for the regression function (see Details). LC specifies a local-constant estimator (Nadaraya-Watson) and LL specifies a local-linear estimator. Defaults to LC. For details see |
pauc |
A list of control values to replace the default values returned by the function |
data |
A data frame representing the data and containing all needed variables. |
p |
Set of false positive fractions (FPF) at which to estimate the covariate-adjusted ROC curve. This set is also used to compute the area under the covariate-adjusted ROC curve (AAUC) using Simpson's rule. Thus, the length of the set should be an odd number and it should be rich enough for an accurate estimation. |
B |
An integer value specifying the number of bootstrap resamples for the construction of the confidence intervals. The default is 1000. |
ci.level |
An integer value (between 0 and 1) specifying the confidence level. The default is 0.95. |
parallel |
A characters string with the type of parallel operation: either "no" (default), "multicore" (not available on Windows) or "snow". |
ncpus |
An integer with the number of processes to be used in parallel operation. Defaults to 1. |
cl |
An object inheriting from class |
Details
Estimates the covariate-adjusted ROC curve (AROC) defined as
AROC\left(p\right) = Pr\{1 - F_{\bar{D}}(Y_D | X_{D}) \leq p\},
where F_{\bar{D}}(y|x) = Pr\{Y_{\bar{D}} \leq y | X_{\bar{D}} = x\}
. In particular, the method implemented in this function estimates the outer probability empirically (see Janes and Pepe, 2009) and F_{\bar{D}}(y|x)
is estimated assuming a nonparametric location-scale regression model for Y_{\bar{D}}
, i.e.,
Y_{\bar{D}} = \mu_{\bar{D}}(X_{\bar{D}}) + \sigma_{\bar{D}}(X_{\bar{D}})\varepsilon_{\bar{D}},
where \mu_{\bar{D}}(x) = E(Y_{\bar{D}} | X_{\bar{D}} = x)
is the regression funcion, \sigma^2_{\bar{D}}(x) = Var(Y_{\bar{D}} | X_{\bar{D}} = x)
is the variance function, and \varepsilon_{\bar{D}}
has zero mean, variance one, and distribution function G_{\bar{D}}
. As a consequence,
F_{\bar{D}}(y | x) = G_{\bar{D}}\left(\frac{y - \mu_{\bar{D}}(x)}{\sigma_{\bar{D}}(x)}\right).
By default, both the regression and variance functions are estimated using the Nadaraya-Watson estimator (LC), and the bandwidths are selected using least-squares cross-validation (LS). Implementation relies on the R
-package np
. No assumption is made about G_{\bar{D}}
, which is empirically estimated on the basis of the standardised residuals.
The area under the AROC curve is
AAUC=\int_0^1 AROC(p)dp,
and there exists a closed-form estimator. With regard to the partial area under the curve, when focus = "FPF"
and assuming an upper bound u_1
for the FPF, what it is computed is
pAAUC_{FPF}(u_1)=\int_0^{u_1} AROC(p)dp,
where again there exists a closed-form estimator. The returned value is the normalised pAAUC, pAAUC_{FPF}(u_1)/u_1
so that it ranges from u_1/2
(useless test) to 1 (perfect marker). Conversely, when focus = "TPF"
, and assuming a lower bound for the TPF of u_2
, the partial area corresponding to TPFs lying in the interval (u_2,1)
is computed as
pAAUC_{TPF}(u_2)=\int_{AROC^{-1}(u_2)}^{1}AROC(p)dp-\{1-AROC^{-1}(u_2)\}\times u_2.
Here, the computation of the integral is done numerically. The returned value is the normalised pAAUC, pAAUC_{TPF}(u_2)/(1-u_2)
, so that it ranges from (1-u_2)/2
(useless test) to 1 (perfect test).
Value
As a result, the function provides a list with the following components:
call |
The matched call. |
data |
The original supplied data argument. |
missing.ind |
A logical value indicating whether for each pair of observations (test outcomes and covariates) missing values occur. |
marker |
The name of the diagnostic test variable in the dataframe. |
covariate |
The value of the argument |
group |
The value of the argument |
tag.h |
The value of the argument |
p |
Set of false positive fractions (FPF) at which the covariate-adjusted ROC curve has been estimated. |
ci.level |
The value of the argument |
ROC |
Estimated covariate-adjusted ROC curve (AROC), and |
AUC |
Estimated area under the covariate-adjusted ROC curve (AAUC), and |
pAUC |
If computed, estimated partial area under the covariate-adjusted ROC curve (pAAUC) and |
fit |
List with the following components: (1) |
References
Hayfield, T., and Racine, J. S. (2008). Nonparametric Econometrics: The np Package. Journal of Statistical Software, 27(5). URL http://www.jstatsoft.org/v27/i05/.
Inacio de Carvalho, V., and Rodriguez-Alvarez, M. X. (2022). The Covariate-Adjusted ROC Curve: The Concept and Its Importance, Review of Inferential Methods, and a New Bayesian Estimator. Statistical Science, 37, 541 -561.
Janes, H., and Pepe, M.S. (2009). Adjusting for covariate effects on classification accuracy using the covariate-adjusted receiver operating characteristic curve. Biometrika, 96, 371–382.
Rodriguez-Alvarez, M. X., Roca-Pardinas, J., and Cadarso-Suarez, C. (2011). ROC curve and covariates: extending induced methodology to the non-parametric framework. Statistics and Computing, 21, 483–499.
See Also
AROC.bnp
, AROC.sp
, AROC.kernel
, pooledROC.BB
, pooledROC.emp
, pooledROC.kernel
, pooledROC.dpm
, cROC.bnp
, cROC.sp
or AROC.kernel
.
Examples
library(ROCnReg)
data(psa)
# Select the last measurement
newpsa <- psa[!duplicated(psa$id, fromLast = TRUE),]
# Log-transform the biomarker
newpsa$l_marker1 <- log(newpsa$marker1)
m2 <- AROC.kernel(marker = "l_marker1",
covariate = "age",
group = "status",
tag.h = 0,
data = newpsa,
bw = "LS",
regtype = "LC",
pauc = pauccontrol(compute = TRUE, focus = "FPF", value = 0.5),
B = 500)
summary(m2)
plot(m2)
Semiparametric frequentist inference for the covariate-adjusted ROC curve (AROC).
Description
This function estimates the covariate-adjusted ROC curve (AROC) using the semiparametric approach proposed by Janes and Pepe (2009).
Usage
AROC.sp(formula.h, group, tag.h, data,
est.cdf.h = c("normal", "empirical"), pauc = pauccontrol(),
p = seq(0, 1, l = 101), B = 1000, ci.level = 0.95,
parallel = c("no", "multicore", "snow"), ncpus = 1, cl = NULL)
Arguments
formula.h |
A |
group |
A character string with the name of the variable that distinguishes healthy from diseased individuals. |
tag.h |
The value codifying healthy individuals in the variable |
data |
A data frame representing the data and containing all needed variables. |
est.cdf.h |
A character string. It indicates how the conditional distribution function of the diagnostic test in the healthy population is estimated. Options are |
pauc |
A list of control values to replace the default values returned by the function |
p |
Set of false positive fractions (FPF) at which to estimate the covariate-adjusted ROC curve. This set is also used to compute the area under the covariate-adjusted ROC curve (AAUC) using Simpson's rule. Thus, the length of the set should be an odd number, and it should be rich enough for an accurate estimation. |
B |
An integer value specifying the number of bootstrap resamples for the construction of the confidence intervals. The default is 1000. |
ci.level |
An integer value (between 0 and 1) specifying the confidence level. The default is 0.95. |
parallel |
A characters string with the type of parallel operation: either "no" (default), "multicore" (not available on Windows) or "snow". |
ncpus |
An integer with the number of processes to be used in parallel operation. Defaults to 1. |
cl |
An object inheriting from class |
Details
Estimates the covariate-adjusted ROC curve (AROC) defined as
AROC\left(p\right) = Pr\{1 - F_{\bar{D}}(Y_D | \mathbf{X}_{D}) \leq p\},
F_{\bar{D}}(y|\mathbf{x}) = Pr\{Y_{\bar{D}} \leq y | \mathbf{X}_{\bar{D}} = \mathbf{x}\}.
The method implemented in this function estimates the outer probability empirically (see Janes and Pepe, 2009) and F_{\bar{D}}(\cdot|\mathbf{x})
is estimated assuming a semiparametric location regression model for Y_{\bar{D}}
, i.e.,
Y_{\bar{D}} = \mathbf{X}_{\bar{D}}^{T}\mathbf{\beta}_{\bar{D}} + \sigma_{\bar{D}}\varepsilon_{\bar{D}},
where \varepsilon_{\bar{D}}
has zero mean, variance one, and distribution function G_{\bar{D}}
. As a consequence, we have
F_{\bar{D}}(y | \mathbf{x}) = G_{\bar{D}}\left(\frac{y-\mathbf{x}^{T}\mathbf{\beta}_{\bar{D}}}{\sigma_{\bar{D}}}\right).
In line with the assumptions made about the distribution of \varepsilon_{\bar{D}}
, estimators will be referred to as: (a) "normal", where a standard Gaussian error is assumed, i.e., G_{\bar{D}}(y) = \Phi(y)
; and, (b) "empirical", where no assumption is made about the distribution (in this case, G_{\bar{D}}
is empirically estimated on the basis of standardised residuals).
The area under the AROC curve is
AAUC=\int_0^1 AROC(p)dp,
and there exists a closed-form estimator. With regard to the partial area under the AROC curve, when focus = "FPF"
and assuming an upper bound u_1
for the FPF, what it is computed is
pAAUC_{FPF}(u_1)=\int_0^{u_1} AROC(p)dp,
where again there exists a closed-form estimator. The returned value is the normalised pAAUC, pAAUC_{FPF}(u_1)/u_1
so that it ranges from u_1/2
(useless test) to 1 (perfect marker). Conversely, when focus = "TPF"
, and assuming a lower bound for the TPF of u_2
, the partial area corresponding to TPFs lying in the interval (u_2,1)
is computed as
pAAUC_{TPF}(u_2)=\int_{AROC^{-1}(u_2)}^{1}AROC(p)dp-\{1-AROC^{-1}(u_2)\}\times u_2.
Here, the computation of the integral is done numerically. The returned value is the normalised pAAUC, pAAUC_{TPF}(u_2)/(1-u_2)
, so that it ranges from (1-u_2)/2
(useless test) to 1 (perfect test).
Value
As a result, the function provides a list with the following components:
call |
The matched call. |
data |
The original supplied data argument. |
missing.ind |
A logical value indicating whether for each pair of observations (test outcomes and covariates) missing values occur. |
marker |
The name of the diagnostic test variable in the dataframe. |
group |
The value of the argument |
tag.h |
The value of the argument |
formula |
The value of the argument |
est.cdf.h |
The value of the argument |
p |
Set of false positive fractions (FPF) at which the covariate-adjusted ROC (AROC) curve has been estimated |
ci.level |
The value of the argument |
ROC |
Estimated covariate-adjusted ROC curve (AROC), and |
AUC |
Estimated area under the covariate-adjusted ROC curve (AAUC), and |
pAUC |
If computed, estimated partial area under the covariate-adjusted ROC curve (pAAUC) and |
fit |
Object of class |
coeff |
Estimated regression coefficients (and |
References
Janes, H., and Pepe, M.S. (2009). Adjusting for covariate effects on classification accuracy using the covariate-adjusted receiver operating characteristic curve. Biometrika, 96(2), 371 - 382.
See Also
AROC.bnp
, AROC.sp
, AROC.kernel
, pooledROC.BB
, pooledROC.emp
, pooledROC.kernel
, pooledROC.dpm
, cROC.bnp
, cROC.sp
or AROC.kernel
.
Examples
library(ROCnReg)
data(psa)
# Select the last measurement
newpsa <- psa[!duplicated(psa$id, fromLast = TRUE),]
# Log-transform the biomarker
newpsa$l_marker1 <- log(newpsa$marker1)
m3 <- AROC.sp(formula.h = l_marker1 ~ age,
group = "status",
tag.h = 0,
data = newpsa,
est.cdf.h = "normal",
pauc = pauccontrol(compute = TRUE, focus = "FPF", value = 0.5),
p = seq(0,1,l=101),
B = 500)
summary(m3)
plot(m3)
Nonparametric Bayesian inference for the covariate-specific ROC curve (cROC).
Description
This function estimates the covariate-specific ROC curve (cROC) using the nonparametric Bayesian approach proposed by Inacio de Carvalho et al. (2013).
Usage
cROC.bnp(formula.h, formula.d, group, tag.h, data,
newdata, standardise = TRUE, p = seq(0, 1, l = 101), ci.level = 0.95,
compute.lpml = FALSE, compute.WAIC = FALSE, compute.DIC = FALSE,
pauc = pauccontrol(), density = densitycontrol(),
prior.h = priorcontrol.bnp(), prior.d = priorcontrol.bnp(),
mcmc = mcmccontrol(),
parallel = c("no", "multicore", "snow"), ncpus = 1, cl = NULL)
Arguments
formula.h |
A |
formula.d |
A |
group |
A character string with the name of the variable that distinguishes healthy from diseased individuals. |
tag.h |
The value codifying healthy individuals in the variable |
data |
A data frame representing the data and containing all needed variables. |
newdata |
Optional data frame containing the values of the covariates at which the covariate-specific ROC curve (AUC and pAUC, if computed) will be computed. If not supplied, the function |
standardise |
A logical value. If TRUE both the test outcomes and the continuous covariates assumed to have a linear effect are standardised (i.e., the resulting variables have mean zero and standard deviation of one). The default is TRUE. |
p |
Set of false positive fractions (FPF) at which to estimate the covariate-specific ROC curve. |
ci.level |
An integer value (between 0 and 1) specifying the level for the credible interval. The default is 0.95. |
compute.lpml |
A logical value. If TRUE, the log pseudo marginal likelihood (LPML, Geisser and Eddy, 1979) and the conditional predictive ordinates (CPO) are computed. |
compute.WAIC |
A logical value. If TRUE, the widely applicable information criterion (WAIC, Gelman et al., 2014; Watanabe, 2010) is computed. |
compute.DIC |
A logical value. If TRUE, the deviance information criterion (DIC)(Celeux et al., 2006, Spiegelhalter et al., 2002) is computed. |
pauc |
A list of control values to replace the default values returned by the function |
density |
A list of control values to replace the default values returned by the function |
prior.h |
Hyparameter specification for the healthy population. A list of control values to replace the default values returned by the function |
prior.d |
Hyparameter specification for the diseased population. A list of control values to replace the default values returned by the function |
mcmc |
A list of control values to replace the default values returned by the function |
parallel |
A characters string with the type of parallel operation: either "no" (default), "multicore" (not available on Windows) or "snow". |
ncpus |
An integer with the number of processes to be used in parallel operation. Defaults to 1. |
cl |
An object inheriting from class |
Details
Estimates the covariate-specific ROC curve (cROC) defined as
ROC(p|\mathbf{x}) = 1 - F_{D}\{F_{\bar{D}}^{-1}(1-p|\mathbf{x})|\mathbf{x}\},
where
F_{D}(y|\mathbf{x}) = Pr(Y_{D} \leq y | \mathbf{X}_{D} = \mathbf{x}),
F_{\bar{D}}(y|\mathbf{x}) = Pr(Y_{\bar{D}} \leq y | \mathbf{X}_{\bar{D}} = \mathbf{x}).
Note that, for the sake of clarity, we assume that the covariates of interest are the same in both the healthy and diseased populations. The method implemented in this function estimates F_{D}(\cdot|\mathbf{x})
and F_{\bar{D}}(\cdot|\mathbf{x})
by means of a single-weights linear dependent Dirichlet process mixture of normals model (De Iorio et al., 2009). More precisely, and letting \{(\mathbf{x}_{\bar{D}i},y_{\bar{D}i})\}_{i=1}^{n_{\bar{D}}}
and \{(\mathbf{x}_{Dj},y_{Dj})\}_{j=1}^{n_{D}}
be two independent random samples from the nondiseased and diseased populations, respectively, our postulated model for the conditional distribution in each group function takes the following form
F_{\bar{D}}(y_{\bar{D}i}|\mathbf{X}_{\bar{D}}=\mathbf{x}_{\bar{D}i}) = \sum_{l=1}^{L_{\bar{D}}}\omega_{l\bar{D}}\Phi(y_{\bar{D}i}\mid\mu_{l\bar{D}}(\mathbf{x}_{\bar{D}i}),\sigma_{l\bar{D}}^2),
F_{D}(y_{Dj}|\mathbf{X}_{D} = \mathbf{x}_{Dj}) = \sum_{l=1}^{L_{D}}\omega_{lD}\Phi(y_{Dj}\mid\mu_{lD}(\mathbf{x}_{\bar{D}i}),\sigma_{lD}^2),
where \Phi(y|\mu, \sigma^2)
denotes the cumulative distribution function of the normal distribution, evaluated at y
, with mean mu
and variance \sigma^2
. The regression function \mu_{ld}(\mathbf{x}_{di})
can incorportate both linear and nonlinear (through B-splines) effects of continuous covariates, categorical covariates (factors) as well as interactions. Interactions between categorical and (nonlinear) continuous covariates are also allowed (factor-by curve interactions). For the sake of simplicity we write \mu_{ld}(\mathbf{x}_{di}) = \mathbf{z}_{di}^{T}\mathbf{\beta}_{ld}
, where \mathbf{z}_{di}
is the i
th column of the design matrix (possibly containing a basis representation of some/all continuous covariates), d \in \{D, \bar{D}\}
. Here L_d
is a pre-specified upper bound on the number of mixture components. The \omega_{ld}
's result from a truncated version of the stick-breaking construction (\omega_{1d} = v_{1d}
; \omega_{ld} = v_{ld}\prod_{r<l}(1-v_{dr})
, l=2,\ldots,L_{d}
; v_{d1},\ldots,v_{L_{d}-1}\sim
Beta (1,\alpha_{d})
; v_{Ld} = 1
, \alpha_d \sim \Gamma(a_{\alpha_d},b_{\alpha_d})
), \mathbf{\beta}_{ld}\sim N_{Q_d}(\mathbf{m}_{d},\mathbf{S}_{d})
, and \sigma_{ld}^{-2}\sim\Gamma(a_{d},b_{d})
. It is further assumed that \mathbf{m}_{d} \sim N_{Q_k}(\mathbf{m}_{0d},\mathbf{S}_{0d})
and \mathbf{S}_{d}^{-1}\sim W(\nu,(\nu_k\Psi_d)^{-1})
. Here W(\nu,(\nu\Psi)^{-1})
denotes a Wishart distribution with \nu
degrees of freedom and expectation \Psi^{-1}
, Here \Gamma(a,b)
denotes a Gamma distribution with shape parameter a
and rate parameter b
, and Q_d
denotes the dimension of the vector \mathbf{z}_{di}
. It is worth mentioning that when L_d=1
, the model for the conditional distribution of the test outcomes reduces to a normal regression model (where continuous covariates effects are modelled either parametrically or nonparametrically). For a detailed description, we refer to Inacio de Carvalho et al. (2013).
The covariate-specific area under the curve is
AUC(\mathbf{x})=\int_{0}^{1}ROC(p|\mathbf{x})dp.
When the upper bound on the number of mixture components is 1, i.e., L_d = 1
(d \in \{D, \bar{D}\}
), there is a closed-form expression for the covariate-specific AUC (binormal model), which is used in the package. In contrast, when L_D > 1
or L_{\bar{D}} > 1
, the integral is computed numerically using Simpson's rule. With regard to the partial area under the curve, when focus = "FPF"
and assuming an upper bound u_1
for the FPF, what it is computed is
pAUC_{FPF}(u_1|\mathbf{x})=\int_0^{u_1} ROC(p|\mathbf{x})dp.
As for the AUC, when L_d = 1
(d \in \{D, \bar{D}\}
), there is a closed-form expression for the pAUC_{FPF}
(Hillis and Metz, 2012), and when L_D > 1
or L_{\bar{D}} > 1
the integral is approximated numerically using Simpson's rule. The returned value is the normalised pAUC, pAUC_{FPF}(u_1|\mathbf{x})/u_1
so that it ranges from u_1/2
(useless test) to 1 (perfect marker). Conversely, when focus = "TPF"
, and assuming a lower bound for the TPF of u_2
, the partial area corresponding to TPFs lying in the interval (u_2,1)
is computed as
pAUC_{TPF}(u_2|\mathbf{x})=\int_{u_2}^{1}ROC_{TNF}(p|\mathbf{x})dp,
where ROC_{TNF}(p|\mathbf{x})
is a 270^\circ
rotation of the ROC curve, and it can be expressed as ROC_{TNF}(p|\mathbf{x}) = F_{\bar{D}}\{F_{D}^{-1}(1-p|\mathbf{x})|\mathbf{x}\}.
Again, when L_d = 1
(d \in \{D, \bar{D}\}
), there is a closed-form expression for the pAUC_{TNF}
(Hillis and Metz, 2012), and when L_D > 1
or L_{\bar{D}} > 1
the integral is approximated numerically using Simpson's rule. The returned value is the normalised pAUC, pAUC_{TPF}(u_2|\mathbf{x})/(1-u_2)
, so that it ranges from (1-u_2)/2
(useless test) to 1 (perfect test).
It is worth referring that with respect to the computation of the DIC, when L=1
, it is computed as in Spiegelhalter et al. (2002), and when L>1
, DIC3 as described in Celeux et al. (2006) is computed. Also, for the computation of the conditional predictive ordinates (CPO) we follow the stable version proposed by Gelman et al. (2014).
Value
As a result, the function provides a list with the following components:
call |
The matched call. |
newdata |
A data frame containing the values of the covariates at which the covariate-specific ROC curve (as well as the AUC, pAUC, dens and reg.fun, if required) was computed. |
data |
The original supplied data argument. |
missing.ind |
A logical value indicating whether for each pair of observations (test outcomes and covariates) missing values occur. |
marker |
The name of the diagnostic test variable in the dataframe. |
group |
The value of the argument |
tag.h |
The value of the argument |
p |
Set of false positive fractions (FPF) at which the covariate-specific ROC curve has been estimated. |
ci.level |
The value of the argument |
prior |
A list returning the hyperparameter values in the healthy and diseased populations. |
ROC |
A list returning the |
AUC |
Estimated area under the covariate-specific ROC curve (posterior mean) and |
pAUC |
If computed, estimated partial area under the covariate-adjusted ROC curve (posterior mean) and |
dens |
Named list of length two, with components 'h' (healthy) and 'd' (diseased). Each component is a list with two components: |
reg.fun |
Named list of length two, with components 'h' (healthy) and 'd' (diseased). Each component is a data frame containing the predicted regression function (posterior mean) and |
lpml |
If computed, named list of length two, with components 'h' (healthy) and 'd' (diseased). Each component is a list with two components: the log pseudo marginal likelihood (LPML) and the conditional predictive ordinates (CPO). |
WAIC |
If computed, named list of length two, with components 'h' (healthy) and 'd' (diseased). Each component is a list with two components: widely applicable information criterion (WAIC) and associated complexity penalty (pW). |
DIC |
If computed, named list of length two, with components 'h' (healthy) and 'd' (diseased). Each component is a list with two components: deviance information criterion (DIC) and associated complexity penalty (pD). |
fit |
Named list of length two, with components |
data_model |
A list with the data used in the fit: observed diagnostic test outcome and design matrices, separately for the healthy and diseased groups. |
Note
The input arguments formula.h
and formula.d
are similar to that used for the glm
function, except that flexible specifications can be added by means of the function f()
. For instance, specification y \sim x1 + f(x2, K = 3)
would assume a linear effect of x1
(if x1
continuous) and the effect of x2
would be modeled using B-splines basis functions. The argument K = 3
indicates that 3
internal knots will be used, with the quantiles of x2
used for their location. Categorical variables (factors) can be also incorporated, as well as interaction terms. For example, to include the factor-by-curve interaction between age
and gender
we need to specify, e.g., y \sim gender + f(age, by = gender, K = c(3, 5))
. Note that, in this case, the number of knots can be different for each level of the factor. The order of the vector K
of knots should match the levels of the factor.
References
Celeux, G., Forbes, F., Robert C. P., and Titerrington, D. M. (2006). Deviance information criteria for missing data models. Bayesian Analysis, 1, 651–674.
De Iorio, M., Johnson, W. O., Muller, P., and Rosner, G. L. (2009). Bayesian nonparametric nonproportional hazards survival modeling. Biometrics, 65, 762–775.
Geisser, S. and Eddy, W.F. (1979) A Predictive Approach to Model Selection, Journal of the American Statistical Association, 74, 153–160.
Gelman, A., Carlin, J.B., Stern, H.S., Dunson, D.B., Vehtari, A., and Rubin, D.B. (2014). Bayesian Data Analysis, 3rd ed. CRC Press: Boca Raton, FL.
Gelman, A., Hwang, J., and Vehtari, A. (2014). Understanding predictive information criteria for Bayesian models. Statistics and Computing, 24, 997–1010.
Hillis, S. L. and Metz, C.E. (2012). An Analytic Expression for the Binormal Partial Area under the ROC Curve. Academic Radiology, 19, 1491–1498.
Inacio de Carvalho, V., Jara, A., Hanson, T. E., and de Carvalho, M. (2013). Bayesian nonparametric ROC regression modeling. Bayesian Analysis, 8, 623–646.
Speigelhalter, D. J., Best, N. G., Carlin, B. P., and van der Linde, A. (2002). Bayesian measures of model comparison and fit. Journal of the Royal Statistical Society, Ser. B, 64, 583–639.
Watanabe, S. (2010). Asymptotic Equivalence of Bayes Cross Validation and Widely Applicable Information Criterion in Singular Learning Theory. Journal of Machine Learning Research, 11, 3571–3594.
See Also
AROC.bnp
, AROC.sp
, AROC.kernel
, pooledROC.BB
, pooledROC.emp
, pooledROC.kernel
, pooledROC.dpm
, cROC.bnp
, cROC.sp
or AROC.kernel
.
Examples
library(ROCnReg)
data(psa)
# Select the last measurement
newpsa <- psa[!duplicated(psa$id, fromLast = TRUE),]
# Log-transform the biomarker
newpsa$l_marker1 <- log(newpsa$marker1)
cROC_bnp <- cROC.bnp(formula.h = l_marker1 ~ f(age, K = 0),
formula.d = l_marker1 ~ f(age, K = 0),
group = "status",
tag.h = 0,
data = newpsa,
standardise = TRUE,
p = seq(0, 1, len = 101),
compute.lpml = TRUE,
compute.WAIC = TRUE,
compute.DIC = TRUE,
pauc = pauccontrol(compute = TRUE, value = 0.5, focus = "FPF"),
density = densitycontrol(compute = TRUE, grid.h = NA, grid.d = NA),
mcmc = mcmccontrol(nsave = 500, nburn = 100, nskip = 1))
summary(cROC_bnp)
plot(cROC_bnp)
Nonparametric kernel-based estimation of the covariate-specific ROC curve (cROC).
Description
This function estimates the covariate-specific ROC curve (cROC) using the nonparametric kernel-based method proposed by Rodriguez-Alvarez et al. (2011). The method, as it stands now, can only deal with one continuous covariate.
Usage
cROC.kernel(marker, covariate, group, tag.h,
bw = c("LS", "AIC"), regtype = c("LC", "LL"),
data, newdata, pauc = pauccontrol(),
p = seq(0, 1, l = 101), B = 1000, ci.level = 0.95,
parallel = c("no", "multicore", "snow"), ncpus = 1, cl = NULL)
Arguments
marker |
A character string with the name of the diagnostic test variable. |
covariate |
A character string with the name of the continuous covariate. |
group |
A character string with the name of the variable that distinguishes healthy from diseased individuals. |
tag.h |
The value codifying healthy individuals in the variable |
bw |
A character string specifying which method to use to select the bandwidths. AIC specifies expected Kullback-Leibler cross-validation, and LS specifies least-squares cross-validation. Defaults to LS. For details see |
regtype |
A character string specifying which type of kernel estimator to use for the regression function (see Details). LC specifies a local-constant estimator (Nadaraya-Watson) and LL specifies a local-linear estimator. Defaults to LC. For details see |
data |
Data frame representing the data and containing all needed variables. |
newdata |
Optional data frame containing the values of the covariates at which the covariate-specific ROC curve (AUC and pAUC, if computed) will be computed. If not supplied, the function |
pauc |
A list of control values to replace the default values returned by the function |
p |
Set of false positive fractions (FPF) at which to estimate the covariate-specific ROC curve. This set is also used to compute the area under the covariate-specific ROC curve using Simpson's rule. Thus, the length of the set should be an odd number, and it should be rich enough for an accurate estimation. |
B |
An integer value specifying the number of bootstrap resamples for the construction of the confidence intervals. The default is 1000. |
ci.level |
An integer value (between 0 and 1) specifying the confidence level. The default is 0.95. |
parallel |
A characters string with the type of parallel operation: either "no" (default), "multicore" (not available on Windows) or "snow". |
ncpus |
An integer with the number of processes to be used in parallel operation. Defaults to 1. |
cl |
An object inheriting from class |
Details
Estimates the covariate-specific ROC curve (cROC) defined as
ROC(p|x) = 1 - F_{D}\{F_{\bar{D}}^{-1}(1-p|x)|x\},
where
F_{D}(y|x) = Pr(Y_{D} \leq y | X_{D} = x ),
F_{\bar{D}}(y|x) = Pr(Y_{\bar{D}} \leq y | X_{\bar{D}} = x).
Note that, for the sake of clarity, we assume that the covariate of interest is the same in both healthy and diseased populations. In particular, the method implemented in this function estimates F_{D}(\cdot|x)
and F_{\bar{D}}(\cdot|x)
assuming a nonparametric location-scale regression model for Y
in each population separately, i.e.,
Y_{D} = \mu_{D}(X_{D}) + \sigma_{D}(X_{D})\varepsilon_{D},
Y_{\bar{D}} = \mu_{\bar{D}}(X_{\bar{D}}) + \sigma_{\bar{D}}(X_{\bar{D}})\varepsilon_{\bar{D}},
where \mu_{D}(x) = E(Y_D | X_D = x)
, \mu_{\bar{D}}(x) = E(Y_{\bar{D}} | X_{\bar{D}} = x)
(regression function), \sigma^2_{D}(x) = Var(Y_D | X_D = x)
, \sigma^2_{\bar{D}}(x) = Var(Y_{\bar{D}} | X_{\bar{D}} = x)
(variance functions), and \varepsilon_{D}
and \varepsilon_{\bar{D}}
have zero mean, variance one, and distribution functions G_{D}
and G_{\bar{D}}
, respectively. In this case, the covariate-specific ROC curve can be expressed as
ROC(p|x) = 1 - G_{D}\{a(\mathbf{x}) + b(\mathbf{x})G_{\bar{D}}^{-1}(1-p)\},
where a(x) = \frac{\mu_{\bar{D}}(x) - \mu_{D}(x)}{\sigma_{D}(x)}
, b(x) = \frac{\sigma_{\bar{D}}(x)}{\sigma_{D}(x)}
, and G_{D}
and G_{\bar{D}}
are the distribution functions of \varepsilon_{D}
and \varepsilon_{\bar{D}}
, respectively.
By default, for both the healthy and diseased population, both the regression and variance functions are estimated using the Nadaraya-Watson estimator (LC), and the bandwidth are selected using least-squares cross-validation (LS). Implementation relies on the R
-package np
. No assumptions are made about G_{D}
and G_{\bar{D}}
, which are empirically estimated on the basis of standardised residuals.
The covariate-specific area under the curve is
AUC(\mathbf{x})=\int_{0}^{1}ROC(p|\mathbf{x})dp,
and is computed numerically (using Simpson's rule). With regard to the partial area under the curve, when focus = "FPF"
and assuming an upper bound u_1
for the FPF, what it is computed is
pAUC_{FPF}(u_1|\mathbf{x})=\int_0^{u_1} ROC(p|\mathbf{x})dp,
where again the integral is approximated numerically (Simpson's rule). The returned value is the normalised pAUC, pAUC_{FPF}(u_1|\mathbf{x})/u_1
so that it ranges from u_1/2
(useless test) to 1 (perfect marker). Conversely, when focus = "TPF"
, and assuming a lower bound for the TPF of u_2
, the partial area corresponding to TPFs lying in the interval (u_2,1)
is computed as
pAUC_{TPF}(u_2|\mathbf{x})=\int_{u_2}^{1}ROC_{TNF}(p|\mathbf{x})dp,
where ROC_{TNF}(p|\mathbf{x})
is a 270^\circ
rotation of the ROC curve, and it can be expressed as ROC_{TNF}(p|\mathbf{x}) = F_{\bar{D}}\{F_{D}^{-1}(1-p|\mathbf{x})|\mathbf{x}\}=G_{\bar{D}}\{\frac{\mu_{D}(x)-\mu_{\bar{D}}(x)}{\sigma_{\bar{D}}(x)}+G_{D}^{-1}(1-p)\frac{\sigma_{D}(x)}{\sigma_{\bar{D}}(x)}\}.
Again, the computation of the integral is done via Simpson's rule. The returned value is the normalised pAUC, pAUC_{TPF}(u_2|\mathbf{x})/(1-u_2)
, so that it ranges from (1-u_2)/2
(useless test) to 1 (perfect test).
Value
As a result, the function provides a list with the following components:
call |
The matched call. |
newdata |
A data frame containing the values of the covariates at which the covariate-specific ROC curve (AUC and pAUC, if required) was computed. |
data |
The original supplied data argument. |
missing.ind |
A logical value indicating whether for each pair of observations (test outcomes and covariates) missing values occur. |
marker |
The name of the diagnostic test variable in the dataframe. |
group |
The value of the argument |
tag.h |
The value of the argument |
covariate |
The value of the argument |
p |
Set of false positive fractions (FPF) at which the covariate-specific ROC curve has been estimated. |
ci.level |
The value of the argument |
ROC |
Estimated covariate-specific ROC curve (AROC), and |
AUC |
Estimated area under the covariate-specific ROC curve, and |
pAUC |
If computed, estimated partial area under the covariate-adjusted ROC curve and |
fit |
Named list of length two, with components 'h' (healthy) and 'd' (diseased). Each component of the list contains the following information: (1) |
References
Hayfield, T., and Racine, J. S.(2008). Nonparametric Econometrics: The np Package. Journal of Statistical Software 27(5). URL http://www.jstatsoft.org/v27/i05/.
Rodriguez-Alvarez, M. X., Roca-Pardinas, J., and Cadarso-Suarez, C. (2011). ROC curve and covariates: extending induced methodology to the non-parametric framework. Statistics and Computing, 21, 483–499.
See Also
AROC.bnp
, AROC.sp
, AROC.kernel
, pooledROC.BB
, pooledROC.emp
, pooledROC.kernel
, pooledROC.dpm
, cROC.kernel
or cROC.sp
.
Examples
library(ROCnReg)
data(psa)
# Select the last measurement
newpsa <- psa[!duplicated(psa$id, fromLast = TRUE),]
# Log-transform the biomarker
newpsa$l_marker1 <- log(newpsa$marker1)
cROC_kernel <- cROC.kernel(marker = "l_marker1",
covariate = "age",
group = "status",
tag.h = 0,
data = newpsa,
bw = "LS",
regtype = "LC",
p = seq(0, 1, len = 101),
pauc = pauccontrol(compute = TRUE, value = 0.5, focus = "FPF"),
B = 500)
plot(cROC_kernel)
summary(cROC_kernel )
Parametric and semiparametric frequentist inference of the covariate-specific ROC curve (cROC).
Description
This function estimates the covariate-specific ROC curve (cROC) using the parametric approach proposed by Faraggi (2003) and the semiparametric approach proposed by Pepe (1998).
Usage
cROC.sp(formula.h, formula.d, group, tag.h, data,
newdata, est.cdf = c("normal", "empirical"),
pauc = pauccontrol(), p = seq(0, 1, l = 101), B = 1000, ci.level = 0.95,
parallel = c("no", "multicore", "snow"), ncpus = 1, cl = NULL)
Arguments
formula.h |
A |
formula.d |
A |
group |
A character string with the name of the variable that distinguishes healthy from diseased individuals. |
tag.h |
The value codifying the healthy individuals in the variable |
data |
Data frame representing the data and containing all needed variables. |
newdata |
Optional data frame containing the values of the covariates at which the covariate-specific ROC curve (AUC and pAUC, if required) will be computed. If not supplied, the function |
est.cdf |
A character string. It indicates how the conditional distribution functions of the diagnostic test in healthy and diseased populations are estimated. Options are "normal" and "empirical" (see Details). The default is "normal". |
pauc |
A list of control values to replace the default values returned by the function |
p |
Set of false positive fractions (FPF) at which to estimate the covariate-specific ROC curve. This set is also used to compute the area under the covariate-specific ROC curve using Simpson's rule. Thus, the length of the set should be an odd number, and it should be rich enough for an accurate estimation. |
B |
An integer value specifying the number of bootstrap resamples for the construction of the confidence intervals. By default 1000. |
ci.level |
An integer value (between 0 and 1) specifying the confidence level. The default is 0.95. |
parallel |
A characters string with the type of parallel operation: either "no" (default), "multicore" (not available on Windows) or "snow". |
ncpus |
An integer with the number of processes to be used in parallel operation. Defaults to 1. |
cl |
An object inheriting from class |
Details
Estimates the covariate-specific ROC curve (cROC) defined as
ROC(p|\mathbf{x}) = 1 - F_{D}\{F_{\bar{D}}^{-1}(1-p|\mathbf{x})|\mathbf{x}\},
where
F_{D}(y|\mathbf{x}) = Pr(Y_{D} \leq y | \mathbf{X}_{D} = \mathbf{x}),
F_{\bar{D}}(y|\mathbf{x}) = Pr(Y_{\bar{D}} \leq y | \mathbf{X}_{\bar{D}} = \mathbf{x}).
Note that, for the sake of clarity, we assume that the covariates of interest are the same in both healthy and diseased populations. In particular, the method implemented in this function estimates F_{D}(\cdot|\mathbf{x})
and F_{\bar{D}}(\cdot|\mathbf{x})
assuming a (semiparametric) location regression model for Y
in each population separately, i.e.,
Y_{D} = \mathbf{X}_{D}^{T}\mathbf{\beta}_{D} + \sigma_{D}\varepsilon_{D},
Y_{\bar{D}} = \mathbf{X}_{\bar{D}}^{T}\mathbf{\beta}_{\bar{D}} + \sigma_{\bar{D}}\varepsilon_{\bar{D}},
such that the covariate-specific ROC curve can be expressed as
ROC(p|\mathbf{x}) = 1 - G_{D}\{a(\mathbf{x}) + b G_{\bar{D}}^{-1}(1-p)\},
where a(\mathbf{x}) = \mathbf{x}^{T}\frac{\mathbf{\beta}_{\bar{D}} - \mathbf{\beta}_{D}}{\sigma_{D}}
, b = \frac{\sigma_{\bar{D}}}{\sigma_{D}}
, and G_{D}
and G_{\bar{D}}
are the distribution functions of \varepsilon_{D}
and \varepsilon_{\bar{D}}
, respectively. In line with the assumptions made about the distributions of \varepsilon_{D}
and \varepsilon_{\bar{D}}
, estimators will be referred to as: (a) "normal", where Gaussian errors are assumed, i.e., G_{D}(y) = G_{\bar{D}}(y) = \Phi(y)
(Faraggi, 2003); and, (b) "empirical", where no assumptios are made about the distribution (in this case, G_{D}
and G_{\bar{D}}
are empirically estimated on the basis of standardised residuals (Pepe, 1998)).
The covariate-specific area under the curve is
AUC(\mathbf{x})=\int_{0}^{1}ROC(p|\mathbf{x})dp.
When Gaussian errors are assumed, there is a closed-form expression for the covariate-specific AUC, which is used in the package. In contrast, when no assumptios are made about the distributionis of the errors, the integral is computed numerically using Simpson's rule. With regard to the partial area under the curve, when focus = "FPF"
and assuming an upper bound u_1
for the FPF, what it is computed is
pAUC_{FPF}(u_1|\mathbf{x})=\int_0^{u_1} ROC(p|\mathbf{x})dp.
Again, when Gaussian errors are assumed, there is a closed-form expression (Hillis and Metz, 2012). Otherwise, the integral is approximated numerically (Simpson's rule). The returned value is the normalised pAUC, pAUC_{FPF}(u_1|\mathbf{x})/u_1
so that it ranges from u_1/2
(useless test) to 1 (perfect marker). Conversely, when focus = "TPF"
, and assuming a lower bound for the TPF of u_2
, the partial area corresponding to TPFs lying in the interval (u_2,1)
is computed as
pAUC_{TPF}(u_2|\mathbf{x})=\int_{u_2}^{1}ROC_{TNF}(p|\mathbf{x})dp,
where ROC_{TNF}(p|\mathbf{x})
is a 270^\circ
rotation of the ROC curve, and it can be expressed asROC_{TNF}(p|\mathbf{x}) = F_{\bar{D}}\{F_{D}^{-1}(1-p|\mathbf{x})|\mathbf{x}\}=G_{\bar{D}}\{\frac{\mu_{D}(\mathbf{x})-\mu_{\bar{D}}(\mathbf{x})}{\sigma_{\bar{D}}(\mathbf{x})}+G_{D}^{-1}(1-p)\frac{\sigma_{D}(\mathbf{x})}{\sigma_{\bar{D}}(\mathbf{x})}\}.
Again, when Gaussian errors are assumed, there is a closed-form expression (Hillis and Metz, 2012). Otherwise, the integral is approximated numerically (Simpson's rule). The returned value is the normalised pAUC, pAUC_{TPF}(u_2|\mathbf{x})/(1-u_2)
, so that it ranges from (1-u_2)/2
(useless test) to 1 (perfect test).
Value
As a result, the function provides a list with the following components:
call |
The matched call. |
newdata |
A data frame containing the values of the covariates at which the covariate-specific ROC curve (AUC and pAUC, if required) was computed. |
data |
The original supplied data argument. |
missing.ind |
A logical value indicating whether for each pair of observations (test outcomes and covariates) missing values occur. |
marker |
The name of the diagnostic test variable in the dataframe. |
group |
The value of the argument |
tag.h |
The value of the argument |
formula |
Named list of length two with the value of the arguments |
est.cdf |
The value of the argument |
p |
Set of false positive fractions (FPF) at which the covariate-specific ROC curves have been estimated. |
ci.level |
The value of the argument |
ROC |
Estimated covariate-specific ROC curve, and |
AUC |
Estimated area under the covariate-specific ROC curve, and |
pAUC |
If computed, estimated partial area under the covariate-adjusted ROC curve and |
fit |
Named list of length two, with components 'h' (healthy) and 'd' (diseased). Each component contains an object of class |
coeff |
Estimated regression coefficients (and |
References
Faraggi, D. (2003). Adjusting receiver operating characteristic curves and related indices for covariates. The Statistician 52, 179–192.
Hillis, S. L. and Metz, C.E. (2012). An Analytic Expression for the Binormal Partial Area under the ROC Curve. Academic Radiology, 19, 1491–1498.
Pepe, M.S. (1998). Three approaches to regression analysis of receiver operating characteristic curves for continuous test results. Biometrics 54, 124–135.
See Also
AROC.bnp
, AROC.sp
, AROC.kernel
, pooledROC.BB
, pooledROC.emp
, pooledROC.kernel
, pooledROC.dpm
, cROC.bnp
, cROC.sp
or AROC.kernel
.
Examples
library(ROCnReg)
data(psa)
# Select the last measurement
newpsa <- psa[!duplicated(psa$id, fromLast = TRUE),]
# Log-transform the biomarker
newpsa$l_marker1 <- log(newpsa$marker1)
# Covariate for prediction
agep <- seq(min(newpsa$age), max(newpsa$age), length = 50)
df.pred <- data.frame(age = agep)
cROC_sp_normal <- cROC.sp(formula.h = l_marker1 ~ age,
formula.d = l_marker1 ~ age,
group = "status",
tag.h = 0,
data = newpsa,
newdata = df.pred,
est.cdf = "normal",
pauc = list(compute = TRUE, value = 0.5, focus = "FPF"),
p = seq(0, 1, l = 101),
B = 500)
summary(cROC_sp_normal)
plot(cROC_sp_normal)
Selects an adequate set of points from a data set for obtaining predictions.
Description
Selects an adequate set of points from a data set for obtaining predictions
Usage
cROCData(data, names.cov, group)
Arguments
data |
Data set from which the new set of covariate values is obtained. |
names.cov |
Character vector with the names of the covariates to be included in the new data set. |
group |
A character string with the name of the variable in the original data set that distinguishes healthy from diseased individuals. |
Value
A data frame containing selected values of all needed covariates. For those that are continuous, 30 different values are selected.
See Also
AROC.bnp
, cROC.bnp
, cROC.sp
, cROC.kernel
, compute.threshold.cROC
or compute.threshold.AROC
.
Examples
library(ROCnReg)
data(psa)
# Select the last measurement
newpsa <- psa[!duplicated(psa$id, fromLast = TRUE),]
# Log-transform the biomarker
newpsa$l_marker1 <- log(newpsa$marker1)
newdf <- cROCData(newpsa, "age", "status")
summary(newdf)
AROC based threshold values.
Description
This function implements methods for estimating AROC-based threshold values.
Usage
compute.threshold.AROC(object, criterion = c("FPF", "YI"), FPF, newdata,
ci.level = 0.95, parallel = c("no", "multicore", "snow"), ncpus = 1, cl = NULL)
Arguments
object |
An object of class |
criterion |
A character string indicating whether the covariate-adjusted threshold values should be computed based on the Youden index (“YI”) or for a fixed set of false positive fractions (“FPF”). |
FPF |
For criterion = FPF, a numeric vector with the FPF at which to calculate the AROC-based threshold values. Atomic values are also valid. |
newdata |
Optional data frame containing the values of the covariates at which the AROC-based threshold values will be computed. If not supplied, the function |
ci.level |
An integer value (between 0 and 1) specifying the confidence level. The default is 0.95. |
parallel |
A characters string with the type of parallel operation: either "no" (default), "multicore" (not available on Windows) or "snow". |
ncpus |
An integer with the number of processes to be used in parallel operation. Defaults to 1. |
cl |
An object inheriting from class |
Details
Estimates AROC-based threshold values based on two different criteria, namely, the Youden index (YI) and the one that gives rise to a pre-specified FPF. Before proceeding, we would like to mention that when the accuracy of a test is not affected by covariates, this does not necessarily imply that the covariate-specific ROC curve (which in this case is the same for all covariate values) coincides with the pooled ROC curve. It does coincide, however, with the AROC curve. Consequently, in all cases where covariates affect the test, even though they might not affect its discriminatory capacity, inferences based on the pooled ROC curve might be misleading. In such cases the AROC curve should be used instead. This also applies to the selection of (optimal) threshold values, which, as will be seen, might be covariate-specific (i.e., possibly different for different covariate values).
For the AROC curve, the Youden Index is defined as
YI = \max_{p}\{AROC(p) - p\},
The value p^{*}
(FPF) that achieves the maximum is then used to calculate the optimal (covariate-specific) YI threshold as follows
c^{*}_{\mathbf{x}} = F_{\bar{D}}^{-1}(1-p^{*}|\mathbf{x}),
where
F_{\bar{D}}(y|\mathbf{x}) = Pr(Y_{\bar{D}} \leq y | \mathbf{X}_{\bar{D}} = \mathbf{x}).
In a similar way, when using the criterion for a fixed FPF, the covariate-specific threshold values are obtained as follows
c^{*}_{\mathbf{x}} = F_{\bar{D}}^{-1}(1-FPF|\mathbf{x}).
In both cases, we use the notation c^{*}_{\mathbf{x}}
to emphasise that this value depends on covariate \mathbf{x}
.
Value
As a result, the function provides a list with the following components:
call |
The matched call. |
newdata |
Data frame containing the values of the covariates at which the AROC-based thresholds were computed. |
thresholds |
If method = "YI", the estimated AROC-based threshold corresponding to the Youden index, and if method = "FPF", AROC-based threshold corresponding to the specified FPF. For the Bayesian approach ( |
YI |
If criterion = "YI", the AROC-based Youden index. For the Bayesian approach ( |
FPF |
If criterion = "YI", the FPF where the Youden index is attained, and if criterion = "FPF", the supplied FPF argument. For the Bayesian approach ( |
References
Inacio de Carvalho, V., and Rodriguez-Alvarez, M. X. (2022). The Covariate-Adjusted ROC Curve: The Concept and Its Importance, Review of Inferential Methods, and a New Bayesian Estimator. Statistical Science, 37, 541 -561.
Rodriguez-Alvarez, M. X., Roca-Pardinas, J., and Cadarso-Suarez, C. (2011). ROC curve and covariates: extending induced methodology to the non-parametric framework. Statistics and Computing, 21, 483–499.
Rutter, C.M. and Miglioretti, D. L. (2003). Estimating the Accuracy of Psychological Scales Using Longitudinal Data. Biostatistics, 4, 97–107.
Youden, W. J. (1950). Index for rating diagnostic tests. Cancer, 3, 32–35.
See Also
AROC.bnp
, AROC.kernel
or AROC.sp
Examples
library(ROCnReg)
data(psa)
# Select the last measurement
newpsa <- psa[!duplicated(psa$id, fromLast = TRUE),]
# Log-transform the biomarker
newpsa$l_marker1 <- log(newpsa$marker1)
AROC_bnp <- AROC.bnp(formula.h = l_marker1 ~ f(age, K = 0),
group = "status",
tag.h = 0,
data = newpsa,
standardise = TRUE,
p = seq(0,1,l=101),
prior = priorcontrol.bnp(m0 = rep(0, 4),
S0 = 10*diag(4), nu = 6, Psi = diag(4),
a = 2, b = 0.5, alpha = 1, L =10),
mcmc = mcmccontrol(nsave = 500, nburn = 100, nskip = 1))
### Threshold values based on the YI
th_AROC_bnp_yi <- compute.threshold.AROC(AROC_bnp, criterion = "YI")
# Plot results
plot(th_AROC_bnp_yi$newdata$age, th_AROC_bnp_yi$thresholds[,"est"],
type = "l", xlab = "Age",
ylab = "log(PSA)", ylim = c(0,3),
main = "Threshold values based on the Youden Index")
lines(th_AROC_bnp_yi$newdata$age, th_AROC_bnp_yi$thresholds[,"qh"], lty = 2)
lines(th_AROC_bnp_yi$newdata$age, th_AROC_bnp_yi$thresholds[,"ql"], lty = 2)
### Threshold values for a fixed FPF
th_AROC_bnp_fpf <- compute.threshold.AROC(AROC_bnp, criterion = "FPF", FPF = 0.1)
# Plot results
plot(th_AROC_bnp_yi$newdata$age, th_AROC_bnp_fpf$thresholds[["0.1"]][,"est"],
type = "l", xlab = "Age",
ylab = "log(PSA)", ylim = c(0,3),
main = "Threshold values for a FPF = 0.1")
lines(th_AROC_bnp_yi$newdata$age, th_AROC_bnp_fpf$thresholds[["0.1"]][,"qh"], lty = 2)
lines(th_AROC_bnp_yi$newdata$age, th_AROC_bnp_fpf$thresholds[["0.1"]][,"ql"], lty = 2)
Covariate-specific ROC based threshold values.
Description
This function implements methods for estimating covariate-specific ROC-based threshold values.
Usage
compute.threshold.cROC(object, criterion = c("FPF", "TPF", "YI"), FPF, TPF, newdata,
ci.level = 0.95, parallel = c("no", "multicore", "snow"), ncpus = 1, cl = NULL)
Arguments
object |
An object of class |
criterion |
A character string indicating whether the covariate-specific threshold values should be computed based on the Youden index (“YI”) or for fixed false positive fractions (“FPF”) or true positive fractions (“TPF”). |
FPF |
For |
TPF |
For |
newdata |
Optional data frame containing the values of the covariates at which the covariate-specific threshold values will be computed. If not supplied, the function |
ci.level |
An integer value (between 0 and 1) specifying the confidence level. The default is 0.95. |
parallel |
A characters string with the type of parallel operation: either "no" (default), "multicore" (not available on Windows) or "snow". |
ncpus |
An integer with the number of processes to be used in parallel operation. Defaults to 1. |
cl |
An object inheriting from class |
Details
Estimates covariate-specific ROC-based threshold values based on three different criteria, namely, the Youden index (YI), one that gives rise to a pre-specified FPF, and one that gives rise to a pre-specified TPF.
In the conditional case, the Youden index is defined as
YI(\mathbf{x}) = \max_{c}|TPF(c|\mathbf{x}) - FPF(c|\mathbf{x})| = \max_{c}|F_{\bar{D}}(c|\mathbf{x}) - F_{D}(c|\mathbf{x})|,
where
F_{D}(y|\mathbf{x}) = Pr(Y_{D} \leq y | \mathbf{X}_{D} = \mathbf{x}),
F_{\bar{D}}(y|\mathbf{x}) = Pr(Y_{\bar{D}} \leq y | \mathbf{X}_{\bar{D}} = \mathbf{x}).
The value c^{*}_{\mathbf{x}}
that achieves the maximum is called the optimal covariate-specific YI threshold. Regarding the criterion for a fixed FPF, the covariate-specific threshold values are obtained as follows
c^{*}_{\mathbf{x}} = F_{\bar{D}}^{-1}(1-FPF|\mathbf{x}),
and for a fixed TPF we have
c^{*}_{\mathbf{x}} = F_{D}^{-1}(1-TPF|\mathbf{x}),
In all cases, we use the notation c^{*}_{\mathbf{x}}
to emphasise that this value depends on covariate \mathbf{x}
.
Value
As a result, the function provides a list with the following components:
call |
The matched call. |
newdata |
Data frame containing the values of the covariates at which the covariate-specific thresholds were computed. |
thresholds |
If method = "YI", the estimated covariate-specific (optimal) threshold corresponding to the covariate-specific Youden index (the one that maximises TPF/sensitivity + TNF/specificity). If method = "FPF", the covariate-specific threshold corresponding to the specified FPF, and if method = "TPF", the covariate-specific threshold corresponding to the specified TPF. For the Bayesian approach ( |
YI |
If method = "YI", the estimated covariate-specific Youden index. For the Bayesian approach ( |
FPF |
If method = "YI" or method = "TPF", the FPF corresponding to the estimated (optimal) covariate-specific thresholds (for the Bayesian approach ( |
TPF |
If method = "YI" or method = "FPF", the covariate-specific TPF/sensitivity corresponding to the estimated covariate-specific (optimal) threshold. For the Bayesian approach ( |
References
Inacio de Carvalho, V., de Carvalho, M. and Branscum, A. J. (2017). Nonparametric Bayesian Covariate-Adjusted Estimation of the Youden Index. Biometrics, 73, 1279-1288.
Rodriguez-Alvarez, M. X., Roca-Pardinas, J., and Cadarso-Suarez, C. (2011). ROC curve and covariates: extending induced methodology to the non-parametric framework. Statistics and Computing, 21, 483–499.
Rutter, C.M. and Miglioretti, D. L. (2003). Estimating the Accuracy of Psychological Scales Using Longitudinal Data. Biostatistics, 4, 97–107.
Youden, W. J. (1950). Index for rating diagnostic tests. Cancer, 3, 32–35.
See Also
cROC.bnp
, cROC.kernel
or cROC.sp
Examples
library(ROCnReg)
data(psa)
# Select the last measurement
newpsa <- psa[!duplicated(psa$id, fromLast = TRUE),]
# Log-transform the biomarker
newpsa$l_marker1 <- log(newpsa$marker1)
cROC_bnp <- cROC.bnp(formula.h = l_marker1 ~ f(age, K = 0),
formula.d = l_marker1 ~ f(age, K = 0),
group = "status",
tag.h = 0,
data = newpsa,
standardise = TRUE,
p = seq(0,1,l=101),
mcmc = mcmccontrol(nsave = 500, nburn = 100, nskip = 1))
### Threshold values based on the YI
th_cROC_bnp_yi <- compute.threshold.cROC(cROC_bnp, criterion = "YI")
# Plot results
# Threshold values
plot(th_cROC_bnp_yi$newdata$age, th_cROC_bnp_yi$thresholds[,"est"],
type = "l", xlab = "Age",
ylab = "log(PSA)", ylim = c(0,3),
main = "Threshold values based on the Youden Index")
lines(th_cROC_bnp_yi$newdata$age, th_cROC_bnp_yi$thresholds[,"qh"], lty = 2)
lines(th_cROC_bnp_yi$newdata$age, th_cROC_bnp_yi$thresholds[,"ql"], lty = 2)
# Youden Index
plot(th_cROC_bnp_yi$newdata$age, th_cROC_bnp_yi$YI[,"est"],
type = "l", xlab = "Age",
ylab = "log(PSA)", ylim = c(0,1),
main = "Threshold values based on the Youden Index")
lines(th_cROC_bnp_yi$newdata$age, th_cROC_bnp_yi$YI[,"qh"], lty = 2)
lines(th_cROC_bnp_yi$newdata$age, th_cROC_bnp_yi$YI[,"ql"], lty = 2)
### Threshold values for a fixed FPF
th_cROC_bnp_fpf <- compute.threshold.cROC(cROC_bnp, criterion = "FPF", FPF = 0.1)
# Plot results
# Threshold values
plot(th_cROC_bnp_yi$newdata$age, th_cROC_bnp_fpf$thresholds[["0.1"]][,"est"],
type = "l", xlab = "Age",
ylab = "log(PSA)", ylim = c(0,3), main = "Threshold values for a FPF = 0.1")
lines(th_cROC_bnp_yi$newdata$age, th_cROC_bnp_fpf$thresholds[["0.1"]][,"qh"], lty = 2)
lines(th_cROC_bnp_yi$newdata$age, th_cROC_bnp_fpf$thresholds[["0.1"]][,"ql"], lty = 2)
Pooled ROC based threshold values.
Description
This function implements methods for estimating pooled ROC-based threshold values.
Usage
compute.threshold.pooledROC(object, criterion = c("FPF", "TPF", "YI"), FPF, TPF,
ci.level = 0.95, parallel = c("no", "multicore", "snow"), ncpus = 1, cl = NULL)
Arguments
object |
An object of class |
criterion |
A character string indicating if the threshold value should be computed based on the Youden index (“YI”), or for fixed false positive fractions (“FPF”) or true positive fractions (“TPF”). |
FPF |
For |
TPF |
For |
ci.level |
An integer value (between 0 and 1) specifying the confidence level. The default is 0.95. |
parallel |
A characters string with the type of parallel operation: either "no" (default), "multicore" (not available on Windows) or "snow". |
ncpus |
An integer with the number of processes to be used in parallel operation. Defaults to 1. |
cl |
An object inheriting from class |
Details
Estimates pooled ROC-based threshold values based on three different criteria, namely, the Youden index (YI), one that gives rise to a pre-specified FPF, and one that gives rise to a pre-specified TPF.
The Youden Index is defined as
YI = \max_{c}\{TPF(c) - FPF(c)\} = \max_{c}\{F_{\bar{D}}(c) - F_{D}(c)\},
where
F_{D}(y) = Pr(Y_{D} \leq y),
F_{\bar{D}}(y) = Pr(Y_{\bar{D}} \leq y).
The value c^{*}
that achieves the maximum is called the optimal YI threshold. Regarding the criterion for a fixed FPF, the threshold value is obtained as follows
c = F_{\bar{D}}^{-1}(1-FPF).
and for a fixed TPF we have
c = F_{D}^{-1}(1-TPF).
Value
As a result, the function provides a list with the following components:
call |
The matched call. |
threshold |
If method = "YI", the estimated (optimal) threshold corresponding to the Youden index (the one that maximises TPF/sensitivity + TNF/specificity). If method = "FPF", the estimated threshold corresponding to the specified FPF, and if method = "TPF", the estimated threshold corresponding to the specified TPF. For the Bayesian approaches ( |
YI |
If method = "YI", the estimated Youden index. For the Bayesian approaches ( |
FPF |
If method = "YI" or method = "TPF", the FPF corresponding to the estimated (optimal) threshold (For the Bayesian approaches ( |
TPF |
If method = "YI" or method = "FPF", the TPF/sensitivity corresponding to the estimated (optimal) threshold. For the Bayesian approaches ( |
References
Rutter, C.M. and Miglioretti, D. L. (2003). Estimating the Accuracy of Psychological Scales Using Longitudinal Data. Biostatistics, 4, 97–107.
Youden, W. J. (1ci.level
*1000). Index for rating diagnostic tests. Cancer, 3, 32–35.
See Also
pooledROC.BB
, pooledROC.emp
, pooledROC.kernel
or pooledROC.dpm
.
Examples
library(ROCnReg)
data(psa)
# Select the last measurement
newpsa <- psa[!duplicated(psa$id, fromLast = TRUE),]
# Log-transform the biomarker
newpsa$l_marker1 <- log(newpsa$marker1)
m0_dpm <- pooledROC.dpm(marker = "l_marker1", group = "status",
tag.h = 0, data = newpsa, standardise = TRUE,
p = seq(0,1,l=101), compute.WAIC = TRUE, compute.lpml = TRUE,
compute.DIC = TRUE,
prior.h = priorcontrol.dpm(m0 = 0, S0 = 10, a = 2, b = 0.5, alpha = 1,
L =10),
prior.d = priorcontrol.dpm(m0 = 0, S0 = 10, a = 2, b = 0.5, alpha = 1,
L =10),
mcmc = mcmccontrol(nsave = 400, nburn = 100, nskip = 1))
## Threshold values based on the YI
th_m0_dpm_yi <- compute.threshold.pooledROC(m0_dpm, criterion = "YI")
th_m0_dpm_yi$threshold
th_m0_dpm_yi$YI
### Threshold values for a fixed FPF
th_m0_dpm_fpf <- compute.threshold.pooledROC(m0_dpm, criterion = "FPF", FPF = 0.1)
th_m0_dpm_fpf$threshold
(Conditional) density estimates of test outcomes
Description
This function is used to set various parameters controlling the estimation of the (conditional) density (densities) of test outcomes in both the healthy and diseased groups.
Usage
densitycontrol(compute = FALSE, grid.h = NA, grid.d = NA)
Arguments
compute |
Logical value. If TRUE the (conditional) density (densities) of test outcomes in each group, healthy and diseased, are estimated. |
grid.h |
Grid of test outcomes in the healthy group where the (conditional) density (densities) estimates are to be evaluated. Value |
grid.d |
Grid of test outcomes in the diseased group where the (conditional) density (densities) estimates are to be evaluated. Value |
Details
The value returned by this function is used as a control argument of the cROC.bnp
and pooledROC.dpm
functions.
Value
A list with components for each of the possible arguments.
See Also
Examples
library(ROCnReg)
data(psa)
# Select the last measurement
newpsa <- psa[!duplicated(psa$id, fromLast = TRUE),]
# Log-transform the biomarker
newpsa$l_marker1 <- log(newpsa$marker1)
cROC_bnp <- cROC.bnp(formula.h = l_marker1 ~ f(age, K = 0),
formula.d = l_marker1 ~ f(age, K = 0),
group = "status",
tag.h = 0,
data = newpsa,
standardise = TRUE,
p = seq(0, 1, len = 101),
compute.lpml = TRUE,
compute.WAIC = TRUE,
compute.DIC = TRUE,
pauc = pauccontrol(compute = TRUE, value = 0.5, focus = "FPF"),
density = densitycontrol(compute = TRUE, grid.h = NA, grid.d = NA),
mcmc = mcmccontrol(nsave = 500, nburn = 100, nskip = 1))
Conditional density estimates of test outcomes in the healthy population
Description
This function is used to set various parameters controlling the estimation of the conditional densities of test outcomes in the healthy group.
Usage
densitycontrol.aroc(compute = FALSE, grid.h = NA, newdata = NA)
Arguments
compute |
Logical value. If TRUE the conditional densities of test outcomes in the healthy group are estimated. |
grid.h |
Grid of test outcomes in the healthy group where the conditional density estimates are to be evaluated. Value |
newdata |
Data frame containing the values of the covariates at which the conditional density estimates are computed. |
Details
The value returned by this function is used as a control argument of the AROC.bnp
function.
Value
A list with components for each of the possible arguments.
See Also
Examples
library(ROCnReg)
data(psa)
# Select the last measurement
newpsa <- psa[!duplicated(psa$id, fromLast = TRUE),]
# Log-transform the biomarker
newpsa$l_marker1 <- log(newpsa$marker1)
# Covariate for prediction
agep <- seq(min(newpsa$age), max(newpsa$age), length = 5)
df.pred <- data.frame(age = agep)
AROC_bnp <- AROC.bnp(formula.h = l_marker1 ~ f(age, K = 0),
group = "status",
tag.h = 0,
data = newpsa,
standardise = TRUE,
p = seq(0, 1, len = 101),
compute.lpml = TRUE,
compute.WAIC = TRUE,
compute.DIC = TRUE,
pauc = pauccontrol(compute = TRUE, value = 0.5, focus = "FPF"),
density = densitycontrol.aroc(compute = TRUE, grid.h = NA, newdata = df.pred),
mcmc = mcmccontrol(nsave = 500, nburn = 100, nskip = 1)
)
Simulated endocrine data.
Description
The endosyn
data set was simulated based on the data analysed in Rodriguez-Alvarez et al. (2011a,b) and Inacio de Carvalho and Rodriguez-Alvarez (2018); and presented in Botana et al. (2007) and Tome et al. (2008). The aim of these studies was to use the body mass index (BMI) to detect patients having a higher risk of cardiovascular problems, ascertaining the possible effect of age and gender on the accuracy of this measure.
Usage
data("endosyn")
Format
A data frame with 2840 observations on the following 4 variables.
gender
patient's gender. Factor with
Men
andWomen
levels.age
patient's age.
cvd_idf
true disease status (presence/absence of two of more cardiovascular risk factors according to the International Diabetes Federation). Numerical vector (0 = absence, 1 = presence).
bmi
patient's body mass index.
Source
Botana, M.A., Mato, J.A., Cadarso-Suarez, C., Tome, M.A., Perez-Fernandez, R., Fernandez-Mario, A., Rego-Iraeta, A., Solache, I. (2007). Overweight, obesity and central obesity prevalences in the region of Galicia in Northwest Spain. Obesity and Metabolism, 3, 106–115.
Tome, M.A., Botana, M.A., Cadarso-Suarez, C., Rego-Iraeta, A., Fernandez-Mario, A., Mato, J.A, Solache, I., Perez-Fernandez, R. (2008). Prevalence of metabolic syndrome in Galicia (NW Spain) on four alternative definitions and association with insulin resistance. Journal of Endocrinological Investigation, 32, 505–511.
References
Inacio de Carvalho, V., and Rodriguez-Alvarez, M. X. (2022). The Covariate-Adjusted ROC Curve: The Concept and Its Importance, Review of Inferential Methods, and a New Bayesian Estimator. Statistical Science, 37, 541 -561.
Rodriguez-Alvarez, M.X., Roca-Pardinas, J. and Cadarso-Suarez, C. (2011a). ROC curve and covariates: extending induced methodology to the non-parametric framework. Statistics and Computing, 21(4), 483–49.
Rodriguez- Alvarez, M.X., Roca-Pardinas, J. and Cadarso-Suarez, C. (2011b). A new flexible direct ROC regression model - Application to the detection of cardiovascular risk factors by anthropometric measures. Computational Statistics and Data Analysis, 55(12), 3257–3270.
Examples
data(endosyn)
summary(endosyn)
Markov chain Monte Carlo (MCMC) parameters
Description
This function is used to set various parameters controlling the Markov chain Monte Carlo (MCMC) parameters.
Usage
mcmccontrol(nsave = 8000, nburn = 2000, nskip = 1)
Arguments
nsave |
An integer giving the total number of scans to be saved (does not include the burn-in and thinning iterations). |
nburn |
An integer giving the number of burn-in scans. |
nskip |
An integer giving the thinning interval, |
Details
The value returned by this function is used as a control argument of the AROC.bnp
, cROC.bnp
, and pooledROC.dpm
functions.
Value
A list with components for each of the possible arguments.
See Also
pooledROC.dpm
, AROC.bnp
and cROC.bnp
Examples
library(ROCnReg)
data(psa)
# Select the last measurement
newpsa <- psa[!duplicated(psa$id, fromLast = TRUE),]
# Log-transform the biomarker
newpsa$l_marker1 <- log(newpsa$marker1)
cROC_bnp <- cROC.bnp(formula.h = l_marker1 ~ f(age, K = 0),
formula.d = l_marker1 ~ f(age, K = 0),
group = "status",
tag.h = 0,
data = newpsa,
standardise = TRUE,
p = seq(0, 1, len = 101),
compute.lpml = TRUE,
compute.WAIC = TRUE,
compute.DIC = TRUE,
pauc = pauccontrol(compute = TRUE, value = 0.5, focus = "FPF"),
density = densitycontrol(compute = TRUE, grid.h = NA, grid.d = NA),
mcmc = mcmccontrol(nsave = 500, nburn = 100, nskip = 1))
Partial area under the covariate-adjusted/covariate-specific/pooled ROC curve
Description
Used to set various parameters controlling the estimation of the partial area under the covariate-adjusted/covariate-specific/pooled ROC curve .
Usage
pauccontrol(compute = FALSE, focus = c("FPF", "TPF"), value = 1)
Arguments
compute |
Logical value. If TRUE the partial area under the covariate-adjusted/covariate-specific/pooled ROC curve is estimated. |
focus |
Whether computation should be done over a restricted range of false positive fractions (FPF) or a restricted range of true positive fractions (TPF). |
value |
Numeric value. Pre-specified upper bound for the FPF (if focus = "FPF") or lower bound for the TPF (if focus = "TPF"). |
Details
The value returned by this function is used as a control argument of the AROC.bnp
, AROC.sp
, AROC.kernel
, cROC.bnp
, cROC.sp
, cROC.kernel
functions.
Value
A list with components for each of the possible arguments.
See Also
Examples
library(ROCnReg)
data(psa)
# Select the last measurement
newpsa <- psa[!duplicated(psa$id, fromLast = TRUE),]
# Log-transform the biomarker
newpsa$l_marker1 <- log(newpsa$marker1)
# Covariate for prediction
agep <- seq(min(newpsa$age), max(newpsa$age), length = 5)
df.pred <- data.frame(age = agep)
cROC_sp_normal <- cROC.sp(formula.h = l_marker1 ~ age,
formula.d = l_marker1 ~ age,
group = "status",
tag.h = 0,
data = newpsa,
newdata = df.pred,
est.cdf = "normal",
pauc = list(compute = TRUE, value = 0.5, focus = "FPF"),
p = seq(0, 1, l = 101),
B = 10)
Default AROC plotting
Description
Takes a fitted AROC
object produced by AROC.bnp()
, AROC.sp()
, or AROC.kernel()
and plots the covariate-adjusted ROC curve (AROC) and associated area under the AROC (AAUC).
Usage
## S3 method for class 'AROC'
plot(x, main = NULL, ...)
Arguments
x |
An object of class |
main |
Character string with the overall title for the plot. If NULL, the default, the method used to estimate the AROC curve is depicted. |
... |
Further arguments passed to or from other methods. |
See Also
AROC.bnp
, AROC.sp
or AROC.kernel
Examples
library(ROCnReg)
data(psa)
# Select the last measurement
newpsa <- psa[!duplicated(psa$id, fromLast = TRUE),]
# Log-transform the biomarker
newpsa$l_marker1 <- log(newpsa$marker1)
AROC_bnp <- AROC.bnp(formula.h = l_marker1 ~ f(age, K = 0),
group = "status", tag.h = 0, data = newpsa, standardise = TRUE,
p = seq(0,1,l=101), compute.lpml = TRUE, compute.WAIC = TRUE,
compute.DIC = TRUE)
plot(AROC_bnp)
Default cROC plotting
Description
Takes a fitted cROC
object produced by cROC.bnp()
, cROC.sp()
, or cROC.kernel()
and plots the covariate-specific ROC curve (cROC) and associated area under the cROC curve. The suitable type of graphic is chosen according to the number and nature of the covariates.
Usage
## S3 method for class 'cROC'
plot(x, ask = TRUE, ...)
Arguments
x |
An object of class |
ask |
A logical value. If TRUE, the default, the user is asked for confirmation, before a new figure is drawn. |
... |
Further arguments passed to or from other methods. |
See Also
cROC.bnp
, cROC.sp
or cROC.kernel
Examples
library(ROCnReg)
data(psa)
# Select the last measurement
newpsa <- psa[!duplicated(psa$id, fromLast = TRUE),]
# Log-transform the biomarker
newpsa$l_marker1 <- log(newpsa$marker1)
cROC_bnp <- cROC.bnp(formula.h = l_marker1 ~ f(age, K = 0),
formula.d = l_marker1 ~ f(age, K = 0),
group = "status",
tag.h = 0,
data = newpsa,
standardise = TRUE,
p = seq(0, 1, len = 101),
compute.lpml = TRUE,
compute.WAIC = TRUE,
compute.DIC = TRUE,
pauc = pauccontrol(compute = TRUE, value = 0.5, focus = "FPF"),
density = densitycontrol(compute = TRUE, grid.h = NA, grid.d = NA),
mcmc = mcmccontrol(nsave = 500, nburn = 100, nskip = 1))
plot(cROC_bnp)
Default pooledROC plotting
Description
Takes a fitted pooledROC
object produced by pooledROC.BB
, pooledROC.emp
, pooledROC.kernel
, or pooledROC.dpm
and plots the pooled ROC curve and associated area under the ROC curve (AUC).
Usage
## S3 method for class 'pooledROC'
plot(x, main = NULL, ...)
Arguments
x |
An object of class |
main |
Character string with the overall title for the plot. If NULL, the default, the method used to estimate the pooled ROC curve is depicted. |
... |
Further arguments passed to or from other methods. |
See Also
pooledROC.BB
, pooledROC.emp
, pooledROC.kernel
or pooledROC.dpm
Examples
library(ROCnReg)
data(psa)
# Select the last measurement
newpsa <- psa[!duplicated(psa$id, fromLast = TRUE),]
# Log-transform the biomarker
newpsa$l_marker1 <- log(newpsa$marker1)
m0_emp <- pooledROC.emp(marker = "l_marker1", group = "status",
tag.h = 0, data = newpsa, p = seq(0,1,l=101), B = 500)
summary(m0_emp)
plot(m0_emp)
Bayesian bootstrap estimation of the pooled ROC curve.
Description
This function estimates the pooled ROC curve using the Bayesian bootstrap estimator proposed by Gu et al. (2008).
Usage
pooledROC.BB(marker, group, tag.h, data,
p = seq(0, 1, l = 101), B = 5000, ci.level = 0.95, pauc = pauccontrol(),
parallel = c("no", "multicore", "snow"), ncpus = 1, cl = NULL)
Arguments
marker |
A character string with the name of the diagnostic test variable. |
group |
A character string with the name of the variable that distinguishes healthy from diseased individuals. |
tag.h |
The value codifying healthy individuals in the variable |
data |
Data frame representing the data and containing all needed variables. |
p |
Set of false positive fractions (FPF) at which to estimate the pooled ROC curve. |
B |
An integer value specifying the number of Bayesian bootstrap resamples. By default 5000. |
ci.level |
An integer value (between 0 and 1) specifying the level for the credible interval. The default is 0.95. |
pauc |
A list of control values to replace the default values returned by the function |
parallel |
A characters string with the type of parallel operation: either "no" (default), "multicore" (not available on Windows) or "snow". |
ncpus |
An integer with the number of processes to be used in parallel operation. Defaults to 1. |
cl |
An object inheriting from class |
Details
Estimates the pooled ROC curve (ROC) defined as
ROC(p) = 1 - F_{D}\{F_{\bar{D}}^{-1}(1-p)\},
where
F_{D}(y) = Pr(Y_{D} \leq y),
F_{\bar{D}}(y) = Pr(Y_{\bar{D}} \leq y).
The method implemented in this function makes use of the equivalence (see Gu et al., 2008)
ROC(p) = 1 - F_{D}\{F_{\bar{D}}^{-1}(1-p)\} = 1 - Pr(1 - F_{\bar{D}}(Y_D) \leq p),
and estimates both F_{\bar{D}}
and the outer probability using the Bayesian bootstrap resampling distribution.
Regarding the area under the curve, we note that
AUC = \int_{0}^{1}ROC(p)dp = 1 - E\{U_D\},
where U_D = 1 - F_{\bar{D}}(Y_D)
. In our implementation, the expectation is computed using the Bayesian bootstrap (using the same weights as those used to estimate the pooled ROC). As far as the partial area under the curve is concerned, when focus = "FPF"
and assuming an upper bound u_1
for the FPF, what it is computed is
pAUC_{FPF}(u_1)=\int_0^{u_1} ROC(p)dp = u_1 - E\{U_{D,u_1}\},
where U_{D,u_1} = \min\{u_1, 1 - F_{\bar{D}}(Y_D)\}
. Again, the expectation is computed using the Bayesian bootstrap. The returned value is the normalised pAUC, pAUC_{FPF}(u_1)/u_1
so that it ranges from u_1/2
(useless test) to 1 (perfect marker). Conversely, when focus = "TPF"
, and assuming a lower bound for the TPF of u_2
, the partial area corresponding to TPFs lying in the interval (u_2,1)
is computed as
pAUC_{TPF}(u_2)=\int_{u_2}^{1}ROC_{TNF}(p)dp,
where ROC_{TNF}(p)
is a 270^\circ
rotation of the ROC curve, and it can be expressed as ROC_{TNF}(p) = F_{\bar{D}}\{F_{D}^{-1}(1-p)\}
. Thus
pAUC_{TPF}(u_2)=\int_{u_2}^{1}ROC_{TNF}(p)dp = E\{U_{\bar{D}, u_2} - u_2\},
where U_{\bar{D}, u_2} = \max\{u_2, 1 - F_{D}(Y_{\bar{D}})\}
, and the expectation is computed using the Bayesian bootstrap. The returned value is the normalised pAUC, pAUC_{TPF}(u_2)/(1-u_2)
, so that it ranges from (1-u_2)/2
(useless test) to 1 (perfect test).
Value
As a result, the function provides a list with the following components:
call |
The matched call. |
marker |
A list with the diagnostic test outcomes in the healthy (h) and diseased (d) groups. |
missing.ind |
A logical value indicating whether missing values occur. |
p |
Set of false positive fractions (FPF) at which the pooled ROC curve has been estimated. |
ci.level |
The value of the argument |
ROC |
Estimated pooled ROC curve, and corresponding |
AUC |
Estimated pooled AUC, and corresponding |
pAUC |
If computed, estimated partial area under the pooled ROC curve (posterior mean) and |
weights |
list with the Dirichlet weights (involved in the estimation) in the healthy (h) and diseased (d) groups. These are matrices of dimension n0 x B and n1 x B, where n0 is the number of healthy individuals and n1 is the number of diseased individuals. |
References
Gu, J., Ghosal, S., and Roy, A. (2008). Bayesian bootstrap estimation of ROC curve. Statistics in Medicine, 27, 5407–5420.
See Also
AROC.bnp
, AROC.sp
, AROC.kernel
, pooledROC.BB
, pooledROC.emp
, pooledROC.kernel
, pooledROC.dpm
, cROC.bnp
, cROC.sp
or AROC.kernel
.
Examples
library(ROCnReg)
data(psa)
# Select the last measurement
newpsa <- psa[!duplicated(psa$id, fromLast = TRUE),]
# Log-transform the biomarker
newpsa$l_marker1 <- log(newpsa$marker1)
m0_BB <- pooledROC.BB(marker = "l_marker1", group = "status",
tag.h = 0, data = newpsa, p = seq(0,1,l=101), B = 5000)
summary(m0_BB)
plot(m0_BB)
Nonparametric Bayesian inference of the pooled ROC curve
Description
This function estimates the pooled ROC curve using a Dirichlet process mixture of normals model as proposed by Erkanli et al. (2006).
Usage
pooledROC.dpm(marker, group, tag.h, data,
standardise = TRUE, p = seq(0, 1, l = 101), ci.level = 0.95,
compute.lpml = FALSE, compute.WAIC = FALSE, compute.DIC = FALSE,
pauc = pauccontrol(), density = densitycontrol(),
prior.h = priorcontrol.dpm(), prior.d = priorcontrol.dpm(),
mcmc = mcmccontrol(),
parallel = c("no", "multicore", "snow"), ncpus = 1, cl = NULL)
Arguments
marker |
A character string with the name of the diagnostic test variable. |
group |
A character string with the name of the variable that distinguishes healthy from diseased individuals. |
tag.h |
The value codifying healthy individuals in the variable |
data |
Data frame representing the data and containing all needed variables. |
standardise |
A logical value. If TRUE the test outcomes are standardised (so that the resulting test outcomes have mean zero and standard deviation of one). The default is TRUE. |
p |
Set of false positive fractions (FPF) at which to estimate the pooled ROC curve. This set is also used to compute the area under the ROC curve (AUC) using Simpson's rule. Thus, the length of the set should be an odd number, and it should be rich enough for an accurate estimation. |
ci.level |
An integer value (between 0 and 1) specifying the level for the credible interval. The default is 0.95. |
compute.lpml |
A logical value. If TRUE, the log pseudo marginal likelihood (LPML, Geisser and Eddy, 1979) and the conditional predictive ordinates (CPO) are computed. |
compute.WAIC |
A logical value. If TRUE, the widely applicable information criterion (WAIC, Gelman et al., 2014; Watanabe, 2010) is computed. |
compute.DIC |
A logical value. If TRUE, the deviance information criterion is computed. |
pauc |
A list of control values to replace the default values returned by the function |
density |
A list of control values to replace the default values returned by the function |
prior.h |
Hyparameter specification for the healthy population. A list of control values to replace the default values returned by the function |
prior.d |
Hyparameter specification for the diseased population. A list of control values to replace the default values returned by the function |
mcmc |
A list of control values to replace the default values returned by the function |
parallel |
A characters string with the type of parallel operation: either "no" (default), "multicore" (not available on Windows) or "snow". |
ncpus |
An integer with the number of processes to be used in parallel operation. Defaults to 1. |
cl |
An object inheriting from class |
Details
Estimates the pooled ROC curve (ROC) defined as
ROC(p) = 1 - F_{D}\{F_{\bar{D}}^{-1}(1-p)\},
where
F_{D}(y) = Pr(Y_{D} \leq y),
F_{\bar{D}}(y) = Pr(Y_{\bar{D}} \leq y).
The method implemented in this function estimates F_{D}(\cdot)
and F_{\bar{D}}(\cdot)
by means of a Dirichlet process mixture of normals model. More precisely, and letting \{y_{\bar{D}i}\}_{i=1}^{n_{\bar{D}}}
and \{y_{Dj}\}_{j=1}^{n_{D}}
be two independent random samples from the nondiseased and diseased populations, respectively, the model postulated for the distribution function is as follows
F_{\bar{D}}(y_{\bar{D}i}) = \sum_{l=1}^{L_{\bar{D}}}\omega_{l\bar{D}}\Phi(y_{\bar{D}i}\mid\mu_{l\bar{D}},\sigma_{l\bar{D}}^2),
F_{D}(y_{Dj}) = \sum_{l=1}^{L_{D}}\omega_{lD}\Phi(y_{Dj}\mid\mu_{lD},\sigma_{lD}^2),
where L_{d}
is pre-specified is a pre-specified upper bound on the number of mixture components (d \in \{D, \bar{D}\}
). The \omega_{ld}
's result from a truncated version of the stick-breaking construction (\omega_{1d} = v_{1d}
; \omega_{ld} = v_{ld}\prod_{r<l}(1-v_{dr})
, l=2,\ldots,L_{d}
; v_{d1},\ldots,v_{L_{d}-1}\sim
Beta (1,\alpha_{d})
; v_{Ld} = 1
, \alpha_d \sim \Gamma(a_{\alpha_d},b_{\alpha_d})
), \beta_{ld}\sim N(m_{0d},S_{0d})
, and \sigma_{ld}^{-2}\sim\Gamma(a_{d},b_{d})
.
The area under the curve is
AUC=\int_{0}^{1}ROC(p)dp.
When the upper bound on the number of mixture components is 1, i.e., L_d = 1
(d \in \{D, \bar{D}\}
), there is a closed-form expression for the AUC (binormal model), which is used in the package. In contrast, when L_D > 1
or L_{\bar{D}} > 1
, the AUC is computed using results presented in Erkanli et al. (2006). With regard to the partial area under the curve, when focus = "FPF"
and assuming an upper bound u_1
for the FPF, what it is computed is
pAUC_{FPF}(u_1)=\int_0^{u_1} ROC(p)dp.
As for the AUC, when L_d = 1
(d \in \{D, \bar{D}\}
), there is a closed-form expression for the pAUC_{FPF}
(Hillis and Metz, 2012), and when L_D > 1
or L_{\bar{D}} > 1
the integral is approximated numerically using Simpson's rule. The returned value is the normalised pAUC, pAUC_{FPF}(u_1)/u_1
so that it ranges from u_1/2
(useless test) to 1 (perfect marker). Conversely, when focus = "TPF"
, and assuming a lower bound for the TPF of u_2
, the partial area corresponding to TPFs lying in the interval (u_2,1)
is computed as
pAUC_{TPF}(u_2)=\int_{u_2}^{1}ROC_{TNF}(p)dp,
where ROC_{TNF}(p)
is a 270^\circ
rotation of the ROC curve, and it can be expressed as ROC_{TNF}(p) = F_{\bar{D}}\{F_{D}^{-1}(1-p)\}.
Again, when L_d = 1
(d \in \{D, \bar{D}\}
), there is a closed-form expression for the pAUC_{TNF}
(Hillis and Metz, 2012), and when L_D > 1
or L_{\bar{D}} > 1
the integral is approximated numerically using Simpson's rule. The returned value is the normalised pAUC, pAUC_{TPF}(u_2)/(1-u_2)
, so that it ranges from (1-u_2)/2
(useless test) to 1 (perfect test).
It is worth referring that with respect to the computation of the DIC, when L=1
, it is computed as in Spiegelhalter et al. (2002), and when L>1
, DIC3 as described in Celeux et al. (2006) is computed. Also, for the computation of the conditional predictive ordinates (CPO) we follow the stable version proposed by Gelman et al. (2014).
Value
As a result, the function provides a list with the following components:
call |
The matched call. |
marker |
A list with the diagnostic test outcomes in the healthy (h) and diseased (d) groups. |
missing.ind |
A logical value indicating whether missing values occur. |
p |
Set of false positive fractions (FPF) at which the pooled ROC curve has been estimated. |
ci.level |
The value of the argument |
prior |
A list returning the hyperparameter values in the healthy and diseased populations. |
ROC |
Estimated pooled ROC curve, and corresponding |
AUC |
Estimated pooled AUC, and corresponding |
pAUC |
If computed, estimated partial area under the pooled ROC curve (posterior mean) and |
dens |
Named list of length two, with components 'h' (healthy) and 'd' (diseased). Each component is a list with two components: |
lpml |
If computed, named list of length two, with components 'h' (healthy) and 'd' (diseased). Each component is a list with two components: the log pseudo marginal likelihood (LPML) and the conditional predictive ordinates (CPO). |
WAIC |
If computed, named list of length two, with components 'h' (healthy) and 'd' (diseased). Each component is a list with two components: widely applicable information criterion (WAIC) and associated complexity penalty (pW). |
DIC |
If computed, named list of length two, with components 'h' (healthy) and 'd' (diseased). Each component is a list with two components: deviance information criterion (DIC) and associated complexity penalty (pD). |
fit |
Named list of length two, with components 'h' (healthy) and 'd' (diseased). Each component is a list with the following information: (1) |
References
Erkanli, A., Sung M., Jane Costello, E., and Angold, A. (2006). Bayesian semi-parametric ROC analysis. Statistics in Medicine, 25, 3905–3928.
Geisser, S. and Eddy, W.F. (1979) A Predictive Approach to Model Selection, Journal of the American Statistical Association, 74, 153–160.
Gelman, A., Carlin, J.B., Stern, H.S., Dunson, D.B., Vehtari, A., and Rubin, D.B. (2014). Bayesian Data Analysis, 3rd ed. CRC Press: Boca Raton, FL.
Gelman, A., Hwang, J., and Vehtari, A. (2014). Understanding predictive information criteria for Bayesian models. Statistics and Computing, 24, 997–1010.
Hillis, S. L. and Metz, C.E. (2012). An Analytic Expression for the Binormal Partial Area under the ROC Curve. Academic Radiology, 19, 1491–1498.
Watanabe, S. (2010). Asymptotic Equivalence of Bayes Cross Validation and Widely Applicable Information Criterion in Singular Learning Theory. Journal of Machine Learning Research, 11, 3571–3594.
See Also
AROC.bnp
, AROC.sp
, AROC.kernel
, pooledROC.BB
, pooledROC.emp
, pooledROC.kernel
, pooledROC.dpm
, cROC.bnp
, cROC.sp
or AROC.kernel
.
Examples
library(ROCnReg)
data(psa)
# Select the last measurement
newpsa <- psa[!duplicated(psa$id, fromLast = TRUE),]
# Log-transform the biomarker
newpsa$l_marker1 <- log(newpsa$marker1)
m0_dpm <- pooledROC.dpm(marker = "l_marker1", group = "status",
tag.h = 0, data = newpsa, standardise = TRUE,
p = seq(0,1,l=101), compute.WAIC = TRUE, compute.lpml = TRUE,
compute.DIC = TRUE,
prior.h = priorcontrol.dpm(m0 = 0, S0 = 10, a = 2, b = 0.5, alpha = 1,
L =10),
prior.d = priorcontrol.dpm(m0 = 0, S0 = 10, a = 2, b = 0.5, alpha = 1,
L =10),
mcmc = mcmccontrol(nsave = 400, nburn = 100, nskip = 1))
summary(m0_dpm)
plot(m0_dpm)
Empirical estimation of the pooled ROC curve.
Description
This function estimates the pooled ROC curve using the empirical estimator proposed by Hsieh and Turnbull (1996).
Usage
pooledROC.emp(marker, group, tag.h, data,
p = seq(0, 1, l = 101), B = 1000, ci.level = 0.95,
method = c("ncoutcome", "coutcome"), pauc = pauccontrol(),
parallel = c("no", "multicore", "snow"), ncpus = 1, cl = NULL)
Arguments
marker |
A character string with the name of the diagnostic test variable. |
group |
A character string with the name of the variable that distinguishes healthy from diseased individuals. |
tag.h |
The value codifying healthy individuals in the variable |
data |
Data frame representing the data and containing all needed variables. |
p |
Set of false positive fractions (FPF) at which to estimate the pooled ROC curve. |
B |
An integer value specifying the number of bootstrap resamples for the construction of the confidence intervals. The default is 1000. |
ci.level |
An integer value (between 0 and 1) specifying the confidence level. The default is 0.95. |
method |
A character string specifying if bootstrap resampling (for the confidence intervals) should be done with or without regard to the disease status (“coutcome” or “noutcome”). In both cases, a naive bootstrap is used. By default, the resampling is done conditionally on the disease status. |
pauc |
A list of control values to replace the default values returned by the function |
parallel |
A characters string with the type of parallel operation: either "no" (default), "multicore" (not available on Windows) or "snow". |
ncpus |
An integer with the number of processes to be used in parallel operation. Defaults to 1. |
cl |
An object inheriting from class |
Details
Estimates the pooled ROC curve (ROC) defined as
ROC(p) = 1 - F_{D}\{F_{\bar{D}}^{-1}(1-p)\},
where
F_{D}(y) = Pr(Y_{D} \leq y),
F_{\bar{D}}(y) = Pr(Y_{\bar{D}} \leq y).
The method implemented in this function estimates F_{D}(\cdot)
and F_{\bar{D}}(\cdot)
by means of the empirical dsitributions. More precisely, and letting \{y_{\bar{D}i}\}_{i=1}^{n_{\bar{D}}}
and \{y_{Dj}\}_{j=1}^{n_{D}}
be two independent random samples from the nondiseased and diseased populations, respectively, the distribution functions in each group take the form
\widehat{F}_{D}(y)=\frac{1}{n_D}\sum_{j=1}^{n_D}I(y_{Dj}\leq y),
\widehat{F}_{\bar{D}}(y)=\frac{1}{n_{\bar{D}}}\sum_{i=1}^{n_{\bar{D}}}I(y_{\bar{D}i}\leq y).
The area under the curve is
AUC=\int_{0}^{1}ROC(p)dp
and is estimated empirically by means of the Mann-Whitney U-statistic. With regard to the partial area under the curve, when focus = "FPF"
and assuming an upper bound u_1
for the FPF, what it is computed is
pAUC_{FPF}(u_1)=\int_0^{u_1} ROC(p)dp,
where again is estimated empirically. The returned value is the normalised pAUC, pAUC_{FPF}(u_1)/u_1
so that it ranges from u_1/2
(useless test) to 1 (perfect marker). Conversely, when focus = "TPF"
, and assuming a lower bound for the TPF of u_2
, the partial area corresponding to TPFs lying in the interval (u_2,1)
is computed as
pAUC_{TPF}(u_2)=\int_{u_2}^{1}ROC_{TNF}(p)dp,
where ROC_{TNF}(p)
is a 270^\circ
rotation of the ROC curve, and it can be expressed as ROC_{TNF}(p) = F_{\bar{D}}\{F_{D}^{-1}(1-p)\}.
Again, ROC_{TNF}(p)
is estimated empirically. The returned value is the normalised pAUC, pAUC_{TPF}(u_2)/(1-u_2)
, so that it ranges from (1-u_2)/2
(useless test) to 1 (perfect test).
Value
As a result, the function provides a list with the following components:
call |
The matched call. |
marker |
A list with the diagnostic test outcomes in the healthy (h) and diseased (d) groups. |
missing.ind |
A logical value indicating whether missing values occur. |
p |
Set of false positive fractions (FPF) at which the pooled ROC curve has been estimated. |
ci.level |
The value of the argument |
ROC |
Estimated pooled ROC curve, and corresponding |
AUC |
Estimated pooled AUC, and corresponding |
pAUC |
If computed, estimated partial area under the pooled ROC curve along with its |
References
Hsieh, F., and Turnbull, B.W. (1996). Nonparametric and semiparametric estimation of the receiver operating characteristic curve, The Annals of Statistics, 24, 25–40.
See Also
AROC.bnp
, AROC.sp
, AROC.kernel
, pooledROC.BB
, pooledROC.emp
, pooledROC.kernel
, pooledROC.dpm
, cROC.bnp
, cROC.sp
or AROC.kernel
.
Examples
library(ROCnReg)
data(psa)
# Select the last measurement
newpsa <- psa[!duplicated(psa$id, fromLast = TRUE),]
# Log-transform the biomarker
newpsa$l_marker1 <- log(newpsa$marker1)
m0_emp <- pooledROC.emp(marker = "l_marker1", group = "status",
tag.h = 0, data = newpsa, p = seq(0,1,l=101), B = 10,
method = "coutcome", pauc = pauccontrol(compute = TRUE, value = 0.5, focus = "FPF"))
summary(m0_emp)
plot(m0_emp)
Kernel-based estimation of the pooled ROC curve.
Description
This function estimates the pooled ROC curve using the kernel-based density estimator proposed by Zhou et al. (1997).
Usage
pooledROC.kernel(marker, group, tag.h, data,
p = seq(0, 1, l = 101),
bw = c("SRT", "UCV"), B = 1000, ci.level = 0.95,
method = c("ncoutcome", "coutcome"), pauc = pauccontrol(),
parallel = c("no", "multicore", "snow"), ncpus = 1, cl = NULL)
Arguments
marker |
A character string with the name of the diagnostic test variable. |
group |
A character string with the name of the variable that distinguishes healthy from diseased individuals. |
tag.h |
The value codifying healthy individuals in the variable |
data |
Data frame representing the data and containing all needed variables. |
p |
Set of false positive fractions (FPF) at which to estimate the pooled ROC curve. This set is also used to compute the area under the ROC curve (AUC) using Simpson's rule. Thus, the length of the set should be an odd number, and it should be rich enough for an accurate estimation. |
bw |
A character string specifying the density bandwidth selection method. “SRT”: Silverman's rule-of-thumb; “UCV”: unbiased cross-validation. The default is “SRT”. |
B |
An integer value specifying the number of bootstrap resamples for the construction of the confidence intervals. The default is 1000. |
ci.level |
An integer value (between 0 and 1) specifying the confidence level. The default is 0.95. |
method |
A character string specifying if bootstrap resampling (for the confidence intervals) should be done with or without regard to the disease status (“coutcome” or “noutcome”). In both cases, a naive bootstrap is used. By default, the resampling is done conditionally on the disease status. |
pauc |
A list of control values to replace the default values returned by the function |
parallel |
A characters string with the type of parallel operation: either "no" (default), "multicore" (not available on Windows) or "snow". |
ncpus |
An integer with the number of processes to be used in parallel operation. Defaults to 1. |
cl |
An object inheriting from class |
Details
Estimates the pooled ROC curve (ROC) defined as
ROC(p) = 1 - F_{D}\{F_{\bar{D}}^{-1}(1-p)\},
where
F_{D}(y) = Pr(Y_{D} \leq y),
F_{\bar{D}}(y) = Pr(Y_{\bar{D}} \leq y).
The method implemented in this function estimates F_{D}(\cdot)
and F_{\bar{D}}(\cdot)
by means of kernel methods. More precisely, and letting \{y_{\bar{D}i}\}_{i=1}^{n_{\bar{D}}}
and \{y_{Dj}\}_{j=1}^{n_{D}}
be two independent random samples from the nondiseased and diseased populations, respectively, the distribution functions in each group take the form
F_{D}(y)=\frac{1}{n_{D}}\sum_{j=1}^{n_D}\Phi\left(\frac{y-y_{Dj}}{h_D}\right).
F_{\bar{D}}(y)=\frac{1}{n_{\bar{D}}}\sum_{i=1}^{n_D}\Phi\left(\frac{y-y_{\bar{D}i}}{h_{\bar{D}}}\right).
where \Phi(y)
stands for the standard normal distribution function evaluated at y
. For the bandwidth h_d
, d \in \{D,\bar{D}\}
which controls the amount of smoothing, two options are available. When bw = "SRT"
,
h_{d}=0.9\min\{SD(\mathbf{y}_d),IQR(\mathbf{y}_d)/1.34\}n_{d}^{-0.2},
where SD(\mathbf{y}_d)
and IQR(\mathbf{y}_d)
are the standard deviation and interquantile range, respectively, of \mathbf{y}_d=(y_{d1},\ldots,y_{dn_{d}})
. In turn, when bw = "UCV"
, the bandwidth is selected via unbiased cross-validation, for further details we refer to bw.ucv
from the base package stats
.
The area under the curve is
AUC=\int_{0}^{1}ROC(p)dp
and is computed numerically (using Simpson's rule). With regard to the partial area under the curve, when focus = "FPF"
and assuming an upper bound u_1
for the FPF, what it is computed is
pAUC_{FPF}(u_1)=\int_0^{u_1} ROC(p)dp,
where again the integral is approximated numerically (Simpson's rule). The returned value is the normalised pAUC, pAUC_{FPF}(u_1)/u_1
so that it ranges from u_1/2
(useless test) to 1 (perfect marker). Conversely, when focus = "TPF"
, and assuming a lower bound for the TPF of u_2
, the partial area corresponding to TPFs lying in the interval (u_2,1)
is computed as
pAUC_{TPF}(u_2)=\int_{u_2}^{1}ROC_{TNF}(p)dp,
where ROC_{TNF}(p)
is a 270^\circ
rotation of the ROC curve, and it can be expressed as ROC_{TNF}(p) = F_{\bar{D}}\{F_{D}^{-1}(1-p)\}.
Again, the computation of the integral is done via Simpson's rule. The returned value is the normalised pAUC, pAUC_{TPF}(u_2)/(1-u_2)
, so that it ranges from (1-u_2)/2
(useless test) to 1 (perfect test).
Value
As a result, the function provides a list with the following components:
call |
The matched call. |
marker |
A list with the diagnostic test outcomes in the healthy (h) and diseased (d) groups. |
missing.ind |
A logical value indicating whether missing values occur. |
bws |
Named list of length two with components 'h' (healthy) and 'd' (diseased). Each component is a numeric value with the selected bandwidth. |
bw |
The value of the argument |
p |
Set of false positive fractions (FPF) at which the pooled ROC curve has been estimated. |
ci.level |
The value of the argument |
ROC |
Estimated pooled ROC curve, and corresponding |
AUC |
Estimated pooled AUC, and corresponding |
pAUC |
If computed, estimated partial area under the pooled ROC curve along with its |
References
Zou, K.H., Hall, W.J., Shapiro, D.E. (1997) Smooth non-parametric receiver operating characteristic (ROC) curves for continuous diagnostic tests. Statistics in Medicine, 16, 2143–2156.
See Also
AROC.bnp
, AROC.sp
, AROC.kernel
, pooledROC.BB
, pooledROC.emp
, pooledROC.kernel
, pooledROC.dpm
, cROC.bnp
, cROC.sp
or AROC.kernel
.
Examples
library(ROCnReg)
data(psa)
# Select the last measurement
newpsa <- psa[!duplicated(psa$id, fromLast = TRUE),]
# Log-transform the biomarker
newpsa$l_marker1 <- log(newpsa$marker1)
m0_kernel <- pooledROC.kernel(marker = "l_marker1", group = "status",
tag.h = 0, data = newpsa, p = seq(0,1,l=101), bw = "SRT",
B = 500, method = "coutcome",
pauc = pauccontrol(compute = TRUE, value = 0.5, focus = "FPF"))
summary(m0_kernel)
plot(m0_kernel)
Posterior predictive checks.
Description
Implements posterior predictive checks for objects of AROC
or cROC
as produced by AROC.bnp
or cROC.bnp
.
Usage
predictive.checks(object,
statistics = c("min", "max", "kurtosis", "skewness"),
ndensity = 512, devnew = TRUE)
Arguments
object |
An object of class |
statistics |
Character vector. Statistics to be used for the posterior predictive checking. By default, "min", "max", "kurtosis" and "skewness" |
ndensity |
An integer giving the number of equally spaced points at which the density of the test outcomes and of the simulated datasets from the posterior predictive distribution (see more on Details) is to be estimated (for more details see the help of the function |
devnew |
A logical value. If TRUE, each plot is depicted in a new graphic device. |
Details
Compares a selected test statistic computed based on the observed diagnostic test outcome (either in the nondiseased group, AROC
object, or in both the nondiseased and diseased groups, cROC
object) against the same test statistics computed based on simulated data from the posterior predictive distribution of the diagnostic test outcome obtained using a single-weights linear dependent Dirichlet process mixture of normals model.
The following graphics are depicted: (1) histograms of the desired statistics computed from simulated datasets (nsave of them) from the posterior predictive distribution of the diagnostic test outcome. In these plots, the estimated statistics from the observed diagnostic test outcome are also depicted. (2) Kernel density estimates computed computed from simulated datasets (nsave of them) from the posterior predictive distribution of the diagnostic test outcome. In these plots, the kernel density estimate of the observed diagnostic test outcome is also depicted. In the case of an object of class AROC
, the abovementioned graphics are depicted only for the diagnostic test outcome in the nondiseased group. However, for an object of class cROC
, the graphics are depicted, separately, for both the nondiseased and diseased groups. For a detailed discussion about predictive checks, see Gabry et al. (2019).
Value
As a result, the function provides a list with the following components:
yrep |
List of matrices associated with the diseased (d) and nondiseased (h) groups. Each column of the matrix (there are nsave of them) corresponds to a dataset generated from the posterior predictive distribution of the diagnostic test outcomes. For |
y |
List of numeric vectors associated with the diseased (d) and nondiseased (h) groups. The vector contains the observed diagnostic test outcomes. For |
References
Gabry, J., Simpson, D., Vehtari, A., Betancourt, M., and Gelman, A. (2019). Visualization in Bayesian workflow. Journal of the Royal Statistical Society, Series A, 182, 1–14.
Inacio de Carvalho, V., and Rodriguez-Alvarez, M. X. (2022). The Covariate-Adjusted ROC Curve: The Concept and Its Importance, Review of Inferential Methods, and a New Bayesian Estimator. Statistical Science, 37, 541 -561.
See Also
Examples
library(ROCnReg)
data(psa)
# Select the last measurement
newpsa <- psa[!duplicated(psa$id, fromLast = TRUE),]
# Log-transform the biomarker
newpsa$l_marker1 <- log(newpsa$marker1)
AROC_bnp <- AROC.bnp(formula.h = l_marker1 ~ f(age, K = 0),
group = "status", tag.h = 0, data = newpsa, standardise = TRUE,
p = seq(0,1,l=101), compute.lpml = TRUE, compute.WAIC = TRUE)
predictive.checks(AROC_bnp, statistics = "skewness")
Print method for AROC
objects
Description
Default print method for objects fitted with AROC.bnp()
, AROC.sp()
, and AROC.kernel()
functions.
Usage
## S3 method for class 'AROC'
print(x, ...)
Arguments
x |
An object of class |
... |
Further arguments passed to or from other methods. Not yet implemented. |
Details
A short summary is printed including the area under the covariate-adjusted ROC curve (AAUC), and if required, the partial area under the covariate-adjusted ROC curve (pAAUC).
See Also
AROC.bnp
, AROC.sp
or AROC.kernel
.
Examples
library(ROCnReg)
data(psa)
# Select the last measurement
newpsa <- psa[!duplicated(psa$id, fromLast = TRUE),]
# Log-transform the biomarker
newpsa$l_marker1 <- log(newpsa$marker1)
AROC_bnp <- AROC.bnp(formula.h = l_marker1 ~ f(age, K = 0),
group = "status", tag.h = 0, data = newpsa, standardise = TRUE,
p = seq(0,1,l=101), compute.lpml = TRUE, compute.WAIC = TRUE)
AROC_bnp
Print method for cROC
objects
Description
Default print method for objects fitted with cROC.bnp()
, cROC.sp()
and cROC.kernel()
functions.
Usage
## S3 method for class 'cROC'
print(x, ...)
Arguments
x |
An object of class |
... |
Further arguments passed to or from other methods. Not yet implemented. |
Details
A short summary is printed.
See Also
cROC.bnp
, cROC.sp
or cROC.kernel
.
Examples
library(ROCnReg)
data(psa)
# Select the last measurement
newpsa <- psa[!duplicated(psa$id, fromLast = TRUE),]
# Log-transform the biomarker
newpsa$l_marker1 <- log(newpsa$marker1)
cROC_bnp <- cROC.bnp(formula.h = l_marker1 ~ f(age, K = 0),
formula.d = l_marker1 ~ f(age, K = 0),
group = "status",
tag.h = 0,
data = newpsa,
standardise = TRUE,
p = seq(0, 1, len = 101),
compute.lpml = TRUE,
compute.WAIC = TRUE,
compute.DIC = TRUE,
pauc = pauccontrol(compute = TRUE, value = 0.5, focus = "FPF"),
density = densitycontrol(compute = TRUE, grid.h = NA, grid.d = NA),
mcmc = mcmccontrol(nsave = 500, nburn = 100, nskip = 1))
cROC_bnp
Print method for pooledROC
objects
Description
Default print method for objects fitted with pooledROC.BB
, pooledROC.emp
, pooledROC.kernel
or pooledROC.dpm
functions.
Usage
## S3 method for class 'pooledROC'
print(x, ...)
Arguments
x |
An object of class |
... |
Further arguments passed to or from other methods. Not yet implemented. |
Details
A short summary is printed including the area under the pooled ROC curve (AUC).
See Also
pooledROC.BB
, pooledROC.emp
, pooledROC.kernel
or pooledROC.dpm
.
Examples
library(ROCnReg)
data(psa)
# Select the last measurement
newpsa <- psa[!duplicated(psa$id, fromLast = TRUE),]
# Log-transform the biomarker
newpsa$l_marker1 <- log(newpsa$marker1)
m0_emp <- pooledROC.emp(marker = "l_marker1", group = "status",
tag.h = 0, data = newpsa, p = seq(0,1,l=101), B = 500)
summary(m0_emp)
plot(m0_emp)
Prior information for the AROC.bnp
and cROC.bnp
Description
This function is used to set various parameters controlling the prior information to be used in the AROC.bnp
and cROC.bnp
functions.
Usage
priorcontrol.bnp(m0 = NA, S0 = NA, nu = NA, Psi = NA, a = 2, b = NA,
alpha = 1, L = 10)
Arguments
m0 |
A numeric vector. Hyperparameter; mean vector of the (multivariate) normal prior distribution for the mean of the normal component of the centring distribution. |
S0 |
A numeric matrix. Hyperparameter; covariance matrix of the (multivariate) normal prior distribution for the mean of the normal component of the centring distribution. |
nu |
A numeric value. Hyperparameter; degrees of freedom of the Wishart prior distribution for the precision matrix of the the normal component of the centring distribution. |
Psi |
A numeric matrix. Hyperparameter; scale matrix of the Wishart distribution for the precision matrix of the the normal component of the centring distribution. |
a |
A numeric value. Hyperparameter; shape parameter of the gamma prior distribution for the precisions (inverse variances) of each component. The default is 2. |
b |
A numeric value. Hyperparameter; shape parameter of the gamma prior distribution for the precisions (inverse variances) of each component. |
alpha |
A numeric value. Precision parameter of the Dirichlet Process. The default is 1. |
L |
A numeric value. Upper bound on the number of mixture components. Setting L = 1 corresponds to a normal model. The default is 10. |
Value
A list with components for each of the possible arguments.
See Also
Prior information for the pooledROC.dpm
Description
This function is used to set various parameters controlling the prior information to be used in the pooledROC.dpm
function.
Usage
priorcontrol.dpm(m0 = NA, S0 = NA, a = 2, b = NA, alpha = 1, L = 10)
Arguments
m0 |
A numeric value. Hyperparameter; mean of the normal prior distribution for the means of each component. |
S0 |
A numeric value. Hyperparameter; variance of the normal prior distribution for the means of each component. |
a |
A numeric value. Hyperparameter; shape parameter of the gamma prior distribution for the precisions (inverse variances) of each component. The default is 2. |
b |
A numeric value. Hyperparameter; rate parameter of the gamma prior distribution for the precisions (inverse variances) of each component. |
alpha |
A numeric value. Precision parameter of the Dirichlet Process. The default is 1. |
L |
A numeric value. Upper bound on the number of mixture components. Setting L=1 corresponds to a normal model. The default is 10. |
Value
A list with components for each of the possible arguments.
See Also
Prostate specific antigen (PSA) biomarker study.
Description
The dataset contains 71 prostate cases and 71 controls who participated in a lung cancer prevention trial (CARET, Beta-carotene and retinol trial). For details, see Etzioni et al. (1999) and Pepe (2003).
Usage
data("psa")
Format
A data frame with 683 observations on the following 6 variables.
id
Patient identifier.
marker1
Total prostate specific antigen (PSA).
marker2
Free prostate specific antigen (PSA)
status
Presence/absence of prostate cancer. The non-cancer patients are controls matched to cases on age and number of serum samples available for analysis (see Details).
age
Patient age at blood draw (serum sample).
t
Time (years) relative to prostate cancer diagnosis.
Details
The CARET enrolled 12000 men, aged between 50 and 65 years, at high risk of lung cancer. For each subject on the study, serum samples were drawn at baseline and at two-year intervals after that. The data presented here represent a subsample of the original sample, and it was reported by Etzioni et al. (1999). It contains 71 cases of prostate cancer that occurred during the study. All these cases had, at least, three and up to eight serum samples. As far as controls are concerned, they were selected from the participants of the CARET study verifying that they had not been diagnosed with prostate cancer by the time of the original study, and the selection was done by matching to cases on date of birth and number of serum samples available for analysis.
Source
The dataset can be downloaded from https://research.fredhutch.org/diagnostic-biomarkers-center/en/datasets.html.
References
Pepe, M. S. (2003). The Statistical Evaluation of Medical Tests for Classification and Prediction. Oxford Statistical Science Series. Oxford University Press, New York.
Etzioni, R., Pepe, M. S., Longton, G., Hu. C., and Goodman, G. (1999). Incorporating the time dimension in receiver operating characteristic curves: A case study of prostate cancer. Medical Decision Making, 19(3), 242-251.
Examples
data(psa)
summary(psa)
Summary method for AROC
objects
Description
Default summary method for objects fitted with AROC.bnp()
, AROC.sp()
, or AROC.kernel()
functions.
Usage
## S3 method for class 'AROC'
summary(object, ...)
Arguments
object |
An object of class |
... |
Further arguments passed to or from other methods. Not yet implemented. |
Details
The information printed depends on the method. In all cases, the call to the function, the method, the area under the covariate-adjusted ROC curve (AAUC), the partial area under the covariate-adjusted ROC curve (if required) (AAUC), and the sample sizes are printed. For the semiparametric approach (AROC.sp()
), the estimated coefficients (and 95% confidence intervals, if required) of the model for the healthy population are printed. In addition, the function provides the Akaike Information Criterion (AIC) and the Bayesian Information Criterion (BIC). For the nonparametric Bayesian approach (AROC.bnp()
), and if required, the function provides the log pseudo marginal likelihood (LPML), the widely applicable information criterion (WAIC) and/or the deviance information criterion (DIC). For the kernel-based approach (AROC.kernel()
), information regarding the selected bandwidth and the type of kernel estimator (for both regression and variance functions) is printed.
See Also
AROC.bnp
, AROC.sp
or AROC.kernel
.
Examples
library(ROCnReg)
data(psa)
# Select the last measurement
newpsa <- psa[!duplicated(psa$id, fromLast = TRUE),]
# Log-transform the biomarker
newpsa$l_marker1 <- log(newpsa$marker1)
m0 <- AROC.bnp(formula.h = l_marker1 ~ f(age, K = 0),
group = "status", tag.h = 0, data = newpsa, standardise = TRUE,
p = seq(0,1,l=101), compute.lpml = TRUE, compute.WAIC = TRUE)
summary(m0)
Summary method for cROC
objects
Description
Default summary method for objects fitted with cROC.bnp()
, cROC.sp()
, or cROC.kernel()
functions.
Usage
## S3 method for class 'cROC'
summary(object, ...)
Arguments
object |
An object of class |
... |
Further arguments passed to or from other methods. Not yet implemented. |
Details
The information printed depends on the method. In all cases, the call to the function, the method, and the sample sizes are printed. For the semiparametric approach (cROC.sp()
), the estimated coefficients (and 95% confidence intervals, if required) of the model for the healthy population, the diseased population and the conditional ROC curve, are printed. In addition, the function provides the Akaike Information Criterion (AIC) and the Bayesian Information Criterion (BIC). For the nonparametric Bayesian approach (cROC.bnp()
), and if required, the function provides the log pseudo marginal likelihood (LPML), the widely applicable information criterion (WAIC) and/or the deviance information criterion (DIC) (for both healthy and diseased populations). For the kernel-based approach (cROC.kernel()
), information regarding the selected bandwidths and the type of kernel estimator(for both healthy and diseased populations and for both regression and variance functions) is printed.
References
Rodriguez-Alvarez, M.X., Tahoces, P.G., Cadarso-Suarez, C. and Lado, M.J. (2011). Comparative study of ROC regression techniques. Applications for the computer-aided diagnostic system in breast cancer detection. Computational Statistics and Data Analysis, 55, 888–902.
See Also
cROC.bnp
, cROC.sp
or cROC.kernel
.
Examples
library(ROCnReg)
data(psa)
# Select the last measurement
newpsa <- psa[!duplicated(psa$id, fromLast = TRUE),]
# Log-transform the biomarker
newpsa$l_marker1 <- log(newpsa$marker1)
cROC_bnp <- cROC.bnp(formula.h = l_marker1 ~ f(age, K = 0),
formula.d = l_marker1 ~ f(age, K = 0),
group = "status",
tag.h = 0,
data = newpsa,
standardise = TRUE,
p = seq(0, 1, len = 101),
compute.lpml = TRUE,
compute.WAIC = TRUE,
compute.DIC = TRUE,
pauc = pauccontrol(compute = TRUE, value = 0.5, focus = "FPF"),
density = densitycontrol(compute = TRUE, grid.h = NA, grid.d = NA),
mcmc = mcmccontrol(nsave = 500, nburn = 100, nskip = 1))
summary(cROC_bnp)
Summary method for pooledROC
objects
Description
Default summary method for objects fitted with pooledROC.BB
, pooledROC.emp
, pooledROC.kernel
, or pooledROC.dpm
functions.
Usage
## S3 method for class 'pooledROC'
summary(object, ...)
Arguments
object |
An object of class |
... |
Further arguments passed to or from other methods. Not yet implemented. |
Details
A short summary is printed including the call to the function, the method, samples sizes, the area under the pooled ROC curve (AUC), and if required, the partial area under the pooled ROC curve. For the nonparametric Bayesian approach (pooledROC.dpm()
), and if required, the function provides the log pseudo marginal likelihood (LPML), the widely applicable information criterion (WAIC) and/or the deviance information criterion (DIC). For the kernel-based approach (pooledROC.dpm()
), information regarding the selected bandwidths and the density bandwidth selection method is presented.
See Also
pooledROC.BB
, pooledROC.emp
, pooledROC.kernel
or pooledROC.dpm
.
Examples
library(ROCnReg)
data(psa)
# Select the last measurement
newpsa <- psa[!duplicated(psa$id, fromLast = TRUE),]
# Log-transform the biomarker
newpsa$l_marker1 <- log(newpsa$marker1)
m0_emp <- pooledROC.emp(marker = "l_marker1", group = "status",
tag.h = 0, data = newpsa, p = seq(0,1,l=101), B = 500)
summary(m0_emp)
plot(m0_emp)