Type: | Package |
Title: | Estimating Time-Dependent ROC Curve and AUC for Censored Data |
Version: | 2.0.0 |
Description: | Contains functions to estimate a smoothed and a non-smoothed (empirical) time-dependent receiver operating characteristic curve and the corresponding area under the receiver operating characteristic curve and the optimal cutoff point for the right and interval censored survival data. See Beyene and El Ghouch (2020)<doi:10.1002/sim.8671> and Beyene and El Ghouch (2022) <doi:10.1002/bimj.202000382>. |
License: | GPL-2 | GPL-3 [expanded from: GPL (≥ 2)] |
Depends: | R (≥ 3.5.0) |
Imports: | Rcpp (≥ 1.0.0), icenReg, condSURV, survival, stats, graphics, methods |
LinkingTo: | Rcpp, RcppEigen |
Encoding: | UTF-8 |
LazyData: | true |
RoxygenNote: | 7.2.3 |
NeedsCompilation: | yes |
Packaged: | 2023-03-27 03:57:11 UTC; m2kas |
Author: | Kassu Mehari Beyene [aut, cre], Anouar El Ghouch [aut, ths] |
Maintainer: | Kassu Mehari Beyene <kassu.mehari@wu.edu.et> |
Repository: | CRAN |
Date/Publication: | 2023-03-27 08:10:05 UTC |
The cross-validation bandwidth selection for weighted data
Description
This function computes the data-driven bandwidth for smoothing the ROC (or distribution) function using the CV method of Beyene and El Ghouch (2020). This is an extension of the classical (unweighted) cross-validation bandwith selection method to the case of weighted data.
Usage
CV(X, wt, ktype = "normal")
Arguments
X |
The numeric data vector. |
wt |
The non-negative weight vector. |
ktype |
A character string giving the type kernel to be used: " |
Details
Bowman et al (1998) proposed the cross-validation bandwidth selection method for unweighted kernal smoothed distribution function. This method is implemented in the R
package kerdiest
.
We adapted this for the case of weighted data by incorporating the weight variable into the cross-validation function of Bowman's method. See Beyene and El Ghouch (2020) for details.
Value
Returns the computed value for the bandwith parameter.
Author(s)
Kassu Mehari Beyene and Anouar El Ghouch
References
Beyene, K. M. and El Ghouch A. (2020). Smoothed time-dependent receiver operating characteristic curve for right censored survival data. Statistics in Medicine. 39: 3373– 3396.
Bowman A., Hall P. and Trvan T.(1998). Bandwidth selection for the smoothing of distribution functions. Biometrika 85:799-808.
Quintela-del-Rio, A. and Estevez-Perez, G. (2015). kerdiest:
Nonparametric kernel estimation of the distribution function, bandwidth selection and estimation of related functions. R
package version 1.2.
Examples
library(cenROC)
X <- rnorm(100) # random data vector
wt <- runif(100) # weight vector
## Cross-validation bandwidth selection
CV(X = X, wt = wt)$bw
Survival probability conditional to the observed data estimation for right censored data.
Description
Survival probability conditional to the observed data estimation for right censored data.
Usage
Csurv(Y, M, censor, t, h = NULL, kernel = "normal")
Arguments
Y |
The numeric vector of event-times or observed times. |
M |
The numeric vector of marker values for which we want to compute the time-dependent ROC curves. |
censor |
The censoring indicator, |
t |
A scaler time point at which we want to compute the time-dependent ROC curve. |
h |
A scaler for the bandwidth of Beran's weight calculaions. The defualt is using the method of Sheather and Jones (1991). |
kernel |
A character string giving the type kernel to be used: " |
Value
Return a vectors:
positive
P(T<t|Y,censor,M)
.
negative
P(T>t|Y,censor,M)
.
References
Beyene, K. M. and El Ghouch A. (2020). Smoothed time-dependent receiver operating characteristic curve for right censored survival data. Statistics in Medicine. 39: 3373– 3396.
Li, Liang, Bo Hu and Tom Greene (2018). A simple method to estimate the time-dependent receiver operating characteristic curve and the area under the curve with right censored data, Statistical Methods in Medical Research, 27(8): 2264-2278.
Pablo Martínez-Camblor and Gustavo F. Bayón and Sonia Pérez-Fernández (2016). Cumulative/dynamic roc curve estimation, Journal of Statistical Computation and Simulation, 86(17): 3582-3594.
Survival probability conditional on the observed data estimation for interval censored data
Description
Survival probability conditional on the observed data estimation for interval censored data
Usage
ICsur(L, R, M, t, method, dist)
Arguments
L |
The numericvector of left limit of observed time. For left censored observations |
R |
The numericvector of right limit of observed time. For right censored observation |
M |
The numeric vector of marker value. |
t |
A scaler time point used to calculate the the ROC curve |
method |
A character indication type of modeling. This include nonparametric |
dist |
A character incating the type of distribution for parametric model. This includes are |
Value
Return a vectors:
positive
P(T<t|L,R,M)
.
negative
P(T>t|L,R,M)
.
References
Beyene, K. M. and El Ghouch A. (2022). Time-dependent ROC curve estimation for interval-censored data. Biometrical Journal, 64, 1056– 1074.
Time-dependent ROC curve estimation for interval-censored survival data
Description
This function computes the time-dependent ROC curve for interval censored survival data using the cumulative sensitivity and dynamic specificity definitions. The ROC curves can be either empirical (non-smoothed) or smoothed with/without boundary correction. It also calculates the time-dependent AUC.
Usage
IntROC(L, R, M, t, U = NULL, method = "emp", method2 = "pa", dist = "weibull",
bw = NULL, ktype = "normal", len = 151, B = 0, alpha = 0.05, plot = "TRUE")
Arguments
L |
The numericvector of left limit of observed time. For left censored observations |
R |
The numericvector of right limit of observed time. For right censored observation |
M |
The numeric vector of marker values. |
t |
A scaler time point used to calculate the ROC curve. |
U |
The numeric vector of cutoff values. |
method |
The method of ROC curve estimation. The possible options are " |
method2 |
A character indication type of modeling. This include nonparametric |
dist |
A character incating the type of distribution for parametric model. This includes are |
bw |
A character string specifying the bandwidth estimation method. The possible options are " |
ktype |
A character string giving the type kernel distribution to be used for smoothing the ROC curve: " |
len |
The length of the grid points for ROC estimation. Default is |
B |
The number of bootstrap samples to be used for variance estimation. The default is |
alpha |
The significance level. The default is |
plot |
The logigal parameter to see the ROC curve plot. Default is |
Details
This function implments time-dependent ROC curve and the corresponding AUC using the model-band and nonparametric for the estimation of conditional survival function. The empirical (non-smoothed) ROC estimate and the smoothed ROC estimate with/without boundary correction can be obtained using this function.
The smoothed ROC curve estimators require selecting a bandwidth parametr for smoothing the ROC curve. To this end, three data-driven methods: the normal reference "NR
", the plug-in "PI
" and the cross-validation "CV
" were implemented.
See Beyene and El Ghouch (2020) for details.
Value
Returns the following items:
ROC
The vector of estimated ROC values. These will be numeric numbers between zero
and one.
U
The vector of grid points used.
AUC
A data frame of dimension 1 \times 4
. The columns are: AUC, standard error of AUC, the lower
and upper limits of bootstrap CI.
bw
The computed value of bandwidth. For the empirical method this is always NA
.
Dt
The vector of estimated event status.
M
The vector of Marker values.
References
Beyene, K. M. and El Ghouch A. (2022). Time-dependent ROC curve estimation for interval-censored data. Biometrical Journal, 64, 1056– 1074.
Beyene, K. M. and El Ghouch A. (2020). Smoothed time-dependent receiver operating characteristic curve for right censored survival data. Statistics in Medicine. 39: 3373– 3396.
Examples
library(cenROC)
data(hds)
est = IntROC(L=hds$L, R=hds$R, M=hds$M, t=2)
est$AUC
The normal reference bandwidth selection for weighted data
Description
This function computes the data-driven bandwidth for smoothing the ROC (or distribution) function using the NR method of Beyene and El Ghouch (2020). This is an extension of the classical (unweighted) normal reference bandwith selection method to the case of weighted data.
Usage
NR(X, wt, ktype = "normal")
Arguments
X |
The numeric data vector. |
wt |
The non-negative weight vector. |
ktype |
A character string giving the type kernel to be used: " |
Details
See Beyene and El Ghouch (2020) for details.
Value
Returns the computed value for the bandwith parameter.
Author(s)
Kassu Mehari Beyene and Anouar El Ghouch
References
Beyene, K. M. and El Ghouch A. (2020). Smoothed time-dependent receiver operating characteristic curve for right censored survival data. Statistics in Medicine. 39: 3373– 3396.
Examples
library(cenROC)
X <- rnorm(100) # random data vector
wt <- runif(100) # weight vector
## Normal reference bandwidth selection
NR(X = X, wt = wt)$bw
The plug-in bandwidth selection for weighted data
Description
This function computes the data-driven bandwidth for smoothing the ROC (or distribution) function using the PI method of Beyene and El Ghouch (2020). This is an extension of the classical (unweighted) direct plug-in bandwith selection method to the case of weighted data.
Usage
PI(X, wt, ktype = "normal")
Arguments
X |
The numeric vector of random variable. |
wt |
The non-negative weight vector. |
ktype |
A character string giving the type kernel to be used: " |
Details
See Beyene and El Ghouch (2020) for details.
Value
Returns the computed value for the bandwith parameter.
Author(s)
Kassu Mehari Beyene and Anouar El Ghouch
References
Beyene, K. M. and El Ghouch A. (2020). Smoothed time-dependent receiver operating characteristic curve for right censored survival data. Statistics in Medicine. 39: 3373– 3396.
Examples
library(cenROC)
X <- rnorm(100) # random data vector
wt <- runif(100) # weight vector
## Plug-in bandwidth selection
PI(X = X, wt = wt)$bw
ROC estimation function
Description
ROC estimation function
Usage
RocFun(U, D, M, bw = "NR", method, ktype)
Arguments
U |
The vector of grid points where the ROC curve is estimated. |
D |
The event indicator. |
M |
The numeric vector of marker values for which the time-dependent ROC curves is computed. |
bw |
The bandwidth parameter for smoothing the ROC function. The possible options are |
method |
is the method of ROC curve estimation. The possible options are |
ktype |
A character string giving the type kernel to be used: " |
Author(s)
Beyene K. Mehari and El Ghouch Anouar
References
Beyene, K. M. and El Ghouch A. (2020). Smoothed time-dependent receiver operating characteristic curve for right censored survival data. Statistics in Medicine. 39: 3373– 3396.
Estimation of the time-dependent ROC curve for right censored survival data
Description
This function computes the time-dependent ROC curve for right censored survival data using the cumulative sensitivity and dynamic specificity definitions. The ROC curves can be either empirical (non-smoothed) or smoothed with/wtihout boundary correction. It also calculates the time-dependent area under the ROC curve (AUC).
Usage
cenROC(Y, M, censor, t, U = NULL, h = NULL, bw = "NR", method = "tra",
ktype = "normal", ktype1 = "normal", B = 0, alpha = 0.05, plot = "TRUE")
Arguments
Y |
The numeric vector of event-times or observed times. |
M |
The numeric vector of marker values for which the time-dependent ROC curves is computed. |
censor |
The censoring indicator, |
t |
A scaler time point at which the time-dependent ROC curve is computed. |
U |
The vector of grid points where the ROC curve is estimated. The default is a sequence of |
h |
A scaler for the bandwidth of Beran's weight calculaions. The defualt is the value obtained by using the method of Sheather and Jones (1991). |
bw |
A character string specifying the bandwidth estimation method for the ROC itself. The possible options are " |
method |
The method of ROC curve estimation. The possible options are " |
ktype |
A character string giving the type kernel distribution to be used for smoothing the ROC curve: " |
ktype1 |
A character string specifying the desired kernel needed for Beran weight calculation. The possible options are " |
B |
The number of bootstrap samples to be used for variance estimation. The default is |
alpha |
The significance level. The default is |
plot |
The logical parameter to see the ROC curve plot. The default is |
Details
The empirical (non-smoothed) ROC estimate and the smoothed ROC estimate with/without boundary correction can be obtained using this function.
The smoothed ROC curve estimators require selecting two bandwidth parametrs: one for Beran’s weight calculation and one for smoothing the ROC curve.
For the latter, three data-driven methods: the normal reference "NR
", the plug-in "PI
" and the cross-validation "CV
" were implemented.
To select the bandwidth parameter needed for Beran’s weight calculation, by default, the plug-in method of Sheather and Jones (1991) is used but it is also possible introduce a numeric value.
See Beyene and El Ghouch (2020) for details.
Value
Returns the following items:
ROC
The vector of estimated ROC values. These will be numeric numbers between zero
and one.
U
The vector of grid points used.
AUC
A data frame of dimension 1 \times 4
. The columns are: AUC, standard error of AUC, the lower
and upper limits of bootstrap CI.
bw
The computed value of bandwidth. For the empirical method this is always NA
.
Dt
The vector of estimated event status.
M
The vector of Marker values.
Author(s)
Kassu Mehari Beyene and Anouar El Ghouch
References
Beyene, K. M. and El Ghouch A. (2020). Smoothed time-dependent receiver operating characteristic curve for right censored survival data. Statistics in Medicine. 39: 3373– 3396.
Sheather, S. J. and Jones, M. C. (1991). A Reliable data-based bandwidth selection method for kernel density estimation. Journal of the Royal Statistical Society. Series B (Methodological) 53(3): 683–690.
Examples
library(cenROC)
data(mayo)
est = cenROC(Y=mayo$time, M=mayo$mayoscore5, censor=mayo$censor, t=365*6)
est$AUC
Compute the conditional survival function for Interval Censored Survival Data
Description
A method to compute the survival function for the interval censored survival data based on a spline function based constrained maximum likelihood estimator. The maximization process of likelihood is carried out by generalized gradient projection method.
Usage
condS(L, R, M, Delta, t, m)
Arguments
L |
The numericvector of left limit of observed time. For left censored observations |
R |
The numericvector of right limit of observed time. For right censored observation |
M |
An array contains marker levels for the samples. |
Delta |
An array of indicator for the censored type, use 1, 2, 3 for event happened before the left bound time, within the defined time range, and after. |
t |
A scalar indicates the predict time. |
m |
A scalar for the cutoff of the marker variable. |
References
Wu, Yuan; Zhang, Ying. Partially monotone tensor spline estimation of the joint distribution function with bivariate current status data. Ann. Statist. 40, 2012, 1609-1636 <doi:10.1214/12-AOS1016>
Derivative of normal distribution
Description
Derivative of normal distribution
Usage
dnorkernel(ord, X)
Arguments
ord |
The order of derivative. |
X |
The numeric data vector. |
NASA Hypobaric Decompression Sickness Marker Data
Description
This data contains the marker values with the left and right limits of the observed time for the subjects in NASA Hypobaric Decompression Sickness Data.
Usage
data(hds)
Format
This is a data frame with 238 observations and 3 variables: L (left limit of the observed time), R (right limit of the observed time) and M (marker). The marker is a score derived by combining the covariates Age, Sex, TR360, and Noadyn.
References
Beyene, K. M. and El Ghouch A. (2022). Time-dependent ROC curve estimation for interval-censored data. Biometrical Journal, 64, 1056– 1074.
Numerical Integral function using Simpson's rule
Description
Numerical Integral function using Simpson's rule
Usage
integ(x, fx, method, n.pts = 256)
Arguments
x |
The numeric data vector. |
fx |
The function. |
method |
The character string specifying method of numerical integration. The possible options are |
n.pts |
Number of points. |
Distribution function without the ith observation
Description
Distribution function without the ith observation
Usage
ker_dis_i(X, y, wt, ktype, bw)
Arguments
X |
The numeric data vector. |
y |
The vector where the kernel estimation is computed. |
wt |
The non-negative weight vector. |
ktype |
A character string giving the type kernel to be used: " |
bw |
A numeric bandwidth value. |
Value
Returns the estimated value for the bandwith parameter.
Author(s)
Kassu Mehari Beyene and Anouar El Ghouch
Function to evaluate the matrix of data vector minus the grid points divided by the bandwidth value.
Description
Function to evaluate the matrix of data vector minus the grid points divided by the bandwidth value.
Usage
kfunc(ktype = "normal", difmat)
Arguments
ktype |
A character string giving the type kernel to be used: " |
difmat |
A numeric matrix of sample data (X) minus evaluation points (x0) divided by bandwidth value (bw). |
Value
Returns the matrix resulting from evaluating difmat
.
Kernel distribution function
Description
Kernel distribution function
Usage
kfunction(ktype, X)
Arguments
ktype |
A character string giving the type kernel to be used: " |
X |
A numeric vector of sample data. |
Value
Returns a vector resulting from evaluating X.
Mayo Marker Data
Description
Two marker values with event time and censoring status for the subjects in Mayo PBC data.
Usage
data(mayo)
Format
A data frame with 312 observations and 4 variables: time (event time/censoring time), censor (censoring indicator), mayoscore4, mayoscore5. The two scores are derived from 4 and 5 covariates respectively.
References
Heagerty, P. J., and Zheng, Y. (2005). Survival model predictive accuracy and ROC curves. Biometrics, 61(1), 92-105.
The value of squared integral x^2 k(x) dx and integral x k(x) K(x) dx
Description
The value of squared integral x^2 k(x) dx and integral x k(x) K(x) dx
Usage
muro(ktype)
Arguments
ktype |
A character string giving the type kernel to be used: " |
Weighted inter-quartile range estimation
Description
Weighted inter-quartile range estimation
Usage
wIQR(X, wt)
Arguments
X |
The numeric data vector. |
wt |
The non-negative weight vector. |
Function to select the bandwidth parameter needed for smoothing the time-dependent ROC curve.
Description
This function computes the data-driven bandwidth value for smoothing the ROC curve. It contains three methods: the normal refrence, the plug-in and the cross-validation methods.
Usage
wbw(X, wt, bw = "NR", ktype = "normal")
Arguments
X |
The numeric data vector. |
wt |
The non-negative weight vector. |
bw |
A character string specifying the bandwidth selection method. The possible options are " |
ktype |
A character string indicating the type of kernel function: " |
Value
Returns the estimated value for the bandwith parameter.
Author(s)
Kassu Mehari Beyene and Anouar El Ghouch
References
Beyene, K. M. and El Ghouch A. (2020). Smoothed time-dependent receiver operating characteristic curve for right censored survival data. Statistics in Medicine. 39: 3373– 3396.
Weighted quartile estimation
Description
Weighted quartile estimation
Usage
wquantile(X, wt, p = 0.5)
Arguments
X |
The numeric data vector. |
wt |
The non-negative weight vector. |
p |
The percentile value. The defult is 0.5. |
Weighted variance estimation
Description
Weighted variance estimation
Usage
wvar(X, wt, na.rm = FALSE)
Arguments
X |
The numeric data vector. |
wt |
The non-negative weight vector. |
na.rm |
The character indicator wether to consider missing value(s) or not. The defult is FALSE. |
Computes optimal cutoff point using the Youden index criteria
Description
This function computes the optimal cutoff point using the Youden index criteria of both right and interval censored time-to-event data. The Youden index estimator can be either empirical (non-smoothed) or smoothed with/without boundary correction.
Usage
youden(est, plot = "FALSE")
Arguments
est |
The object returned either by |
plot |
The logical parameter to see the ROC curve plot along with the Youden inex. The default is |
Details
In medical decision-making, obtaining the optimal cutoff value is crucial to identify subject at high risk of experiencing the event of interest. Therefore, it is necessary to select a marker value that classifies subjects into healthy and diseased groups. To this end, in the literature, several methods for selecting optimal cutoff point have been proposed. In this package, we only included the Youden index criteria.
Value
Returns the following items:
Youden.index
The maximum Youden index value.
cutopt
The optimal cutoff value.
sens
The sensitivity corresponding to the optimal cutoff value.
spec
The specificity corresponding to the optimal cutoff value.
References
Beyene, K. M. and El Ghouch A. (2022). Time-dependent ROC curve estimation for interval-censored data. Biometrical Journal, 64, 1056– 1074.
Youden, W.J. (1950). Index for rating diagnostic tests. Cancer 3, 32–35.
Examples
library(cenROC)
# Right censored data
data(mayo)
resu <- cenROC(Y=mayo$time, M=mayo$mayoscore5, censor=mayo$censor, t=365*6, plot="FALSE")
youden(resu, plot="TRUE")
# Interval censored data
data(hds)
resu1 = IntROC(L=hds$L, R=hds$R, M=hds$M, t=2)
youden(resu1, plot="TRUE")