Type: | Package |
Title: | Assumption-Lean and Data-Adaptive Post-Prediction Inference |
Version: | 1.0.0 |
Maintainer: | Jiacheng Miao <jiacheng.miao@wisc.edu> |
Description: | Implementation of assumption-lean and data-adaptive post-prediction inference (POPInf), for valid and efficient statistical inference based on data predicted by machine learning. See Miao, Miao, Wu, Zhao, and Lu (2023) <doi:10.48550/arXiv.2311.14220>. |
URL: | https://arxiv.org/abs/2311.14220, https://github.com/qlu-lab/POPInf |
Depends: | R (≥ 3.5.0), |
Imports: | randomForest, MASS |
License: | GPL-3 |
Encoding: | UTF-8 |
RoxygenNote: | 7.2.3 |
NeedsCompilation: | no |
Packaged: | 2024-02-19 18:38:56 UTC; jiacheng |
Author: | Jiacheng Miao |
Repository: | CRAN |
Date/Publication: | 2024-02-20 20:40:12 UTC |
Calculation of the matrix A based on single dataset
Description
A
function for the calculation of the matrix A based on single dataset
Usage
A(X, Y, quant = NA, theta, method)
Arguments
X |
Array or DataFrame containing covariates |
Y |
Array or DataFrame of outcomes |
quant |
quantile for quantile estimation |
theta |
parameter theta |
method |
indicates the method to be used for M-estimation. Options include "mean", "quantile", "ols", "logistic", and "poisson". |
Value
matrix A based on single dataset
Variance-covariance matrix of the estimation equation
Description
Sigma_cal
function for variance-covariance matrix of the estimation equation
Usage
Sigma_cal(
X_lab,
X_unlab,
Y_lab,
Yhat_lab,
Yhat_unlab,
w,
theta,
quant = NA,
A_lab_inv,
A_unlab_inv,
method
)
Arguments
X_lab |
Array or DataFrame containing observed covariates in labeled data. |
X_unlab |
Array or DataFrame containing observed or predicted covariates in unlabeled data. |
Y_lab |
Array or DataFrame of observed outcomes in labeled data. |
Yhat_lab |
Array or DataFrame of predicted outcomes in labeled data. |
Yhat_unlab |
Array or DataFrame of predicted outcomes in unlabeled data. |
w |
weights vector POP-Inf linear regression (d-dimensional, where d equals the number of covariates). |
theta |
parameter theta |
quant |
quantile for quantile estimation |
A_lab_inv |
Inverse of matrix A using labeled data |
A_unlab_inv |
Inverse of matrix A using unlabeled data |
method |
indicates the method to be used for M-estimation. Options include "mean", "quantile", "ols", "logistic", and "poisson". |
Value
variance-covariance matrix of the estimation equation
Initial estimation
Description
est_ini
function for initial estimation
Usage
est_ini(X, Y, quant = NA, method)
Arguments
X |
Array or DataFrame containing covariates |
Y |
Array or DataFrame of outcomes |
quant |
quantile for quantile estimation |
method |
indicates the method to be used for M-estimation. Options include "mean", "quantile", "ols", "logistic", and "poisson". |
Value
initial estimatior
Hessians of the link function
Description
link_Hessian
function for Hessians of the link function
Usage
link_Hessian(t, method)
Arguments
t |
t |
method |
indicates the method to be used for M-estimation. Options include "mean", "quantile", "ols", "logistic", and "poisson". |
Value
Hessians of the link function
gradient of the link function
Description
link_grad
function for gradient of the link function
Usage
link_grad(t, method)
Arguments
t |
t |
method |
indicates the method to be used for M-estimation. Options include "mean", "quantile", "ols", "logistic", and "poisson". |
Value
gradient of the link function
Sample expectation of psi
Description
mean_psi
function for sample expectation of psi
Usage
mean_psi(X, Y, theta, quant = NA, method)
Arguments
X |
Array or DataFrame containing covariates |
Y |
Array or DataFrame of outcomes |
theta |
parameter theta |
quant |
quantile for quantile estimation |
method |
indicates the method to be used for M-estimation. Options include "mean", "quantile", "ols", "logistic", and "poisson". |
Value
sample expectation of psi
Sample expectation of POP-Inf psi
Description
mean_psi_pop
function for sample expectation of POP-Inf psi
Usage
mean_psi_pop(
X_lab,
X_unlab,
Y_lab,
Yhat_lab,
Yhat_unlab,
w,
theta,
quant = NA,
method
)
Arguments
X_lab |
Array or DataFrame containing observed covariates in labeled data. |
X_unlab |
Array or DataFrame containing observed or predicted covariates in unlabeled data. |
Y_lab |
Array or DataFrame of observed outcomes in labeled data. |
Yhat_lab |
Array or DataFrame of predicted outcomes in labeled data. |
Yhat_unlab |
Array or DataFrame of predicted outcomes in unlabeled data. |
w |
weights vector POP-Inf linear regression (d-dimensional, where d equals the number of covariates). |
theta |
parameter theta |
quant |
quantile for quantile estimation |
method |
indicates the method to be used for M-estimation. Options include "mean", "quantile", "ols", "logistic", and "poisson". |
Value
sample expectation of POP-Inf psi
Gradient descent for obtaining estimator
Description
optim_est
function for gradient descent for obtaining estimator
Usage
optim_est(
X_lab,
X_unlab,
Y_lab,
Yhat_lab,
Yhat_unlab,
w,
theta,
quant = NA,
method,
step_size = 0.1,
max_iterations = 500,
convergence_threshold = 1e-06
)
Arguments
X_lab |
Array or DataFrame containing observed covariates in labeled data. |
X_unlab |
Array or DataFrame containing observed or predicted covariates in unlabeled data. |
Y_lab |
Array or DataFrame of observed outcomes in labeled data. |
Yhat_lab |
Array or DataFrame of predicted outcomes in labeled data. |
Yhat_unlab |
Array or DataFrame of predicted outcomes in unlabeled data. |
w |
weights vector POP-Inf linear regression (d-dimensional, where d equals the number of covariates). |
theta |
parameter theta |
quant |
quantile for quantile estimation |
method |
indicates the method to be used for M-estimation. Options include "mean", "quantile", "ols", "logistic", and "poisson". |
step_size |
step size for gradient descent |
max_iterations |
maximum of iterations for gradient descent |
convergence_threshold |
convergence threshold for gradient descent |
Value
estimator
Gradient descent for obtaining the weight vector
Description
optim_weights
function for gradient descent for obtaining estimator
Usage
optim_weights(
j,
X_lab,
X_unlab,
Y_lab,
Yhat_lab,
Yhat_unlab,
w,
theta,
quant = NA,
method
)
Arguments
j |
j-th coordinate of weights vector |
X_lab |
Array or DataFrame containing observed covariates in labeled data. |
X_unlab |
Array or DataFrame containing observed or predicted covariates in unlabeled data. |
Y_lab |
Array or DataFrame of observed outcomes in labeled data. |
Yhat_lab |
Array or DataFrame of predicted outcomes in labeled data. |
Yhat_unlab |
Array or DataFrame of predicted outcomes in unlabeled data. |
w |
weights vector POP-Inf linear regression (d-dimensional, where d equals the number of covariates). |
theta |
parameter theta |
quant |
quantile for quantile estimation |
method |
indicates the method to be used for M-estimation. Options include "mean", "quantile", "ols", "logistic", and "poisson". |
Value
weights
POP-Inf M-Estimation
Description
pop_M
function conducts post-prediction M-Estimation.
Usage
pop_M(
X_lab = NA,
X_unlab = NA,
Y_lab,
Yhat_lab,
Yhat_unlab,
alpha = 0.05,
weights = NA,
max_iterations = 100,
convergence_threshold = 0.05,
quant = NA,
intercept = FALSE,
focal_index = NA,
method
)
Arguments
X_lab |
Array or DataFrame containing observed covariates in labeled data. |
X_unlab |
Array or DataFrame containing observed or predicted covariates in unlabeled data. |
Y_lab |
Array or DataFrame of observed outcomes in labeled data. |
Yhat_lab |
Array or DataFrame of predicted outcomes in labeled data. |
Yhat_unlab |
Array or DataFrame of predicted outcomes in unlabeled data. |
alpha |
Specifies the confidence level as 1 - alpha for confidence intervals. |
weights |
weights vector POP-Inf linear regression (d-dimensional, where d equals the number of covariates). |
max_iterations |
Sets the maximum number of iterations for the optimization process to derive weights. |
convergence_threshold |
Sets the convergence threshold for the optimization process to derive weights. |
quant |
quantile for quantile estimation |
intercept |
Boolean indicating if the input covariates' data contains the intercept (TRUE if the input data contains) |
focal_index |
Identifies the focal index for variance reduction. |
method |
indicates the method to be used for M-estimation. Options include "mean", "quantile", "ols", "logistic", and "poisson". |
Value
A summary table presenting point estimates, standard error, confidence intervals (1 - alpha), P-values, and weights.
Examples
data <- sim_data()
X_lab <- data$X_lab
X_unlab <- data$X_unlab
Y_lab <- data$Y_lab
Yhat_lab <- data$Yhat_lab
Yhat_unlab <- data$Yhat_unlab
pop_M(Y_lab = Y_lab, Yhat_lab = Yhat_lab, Yhat_unlab = Yhat_unlab,
alpha = 0.05, method = "mean")
pop_M(Y_lab = Y_lab, Yhat_lab = Yhat_lab, Yhat_unlab = Yhat_unlab,
alpha = 0.05, quant = 0.75, method = "quantile")
pop_M(X_lab = X_lab, X_unlab = X_unlab,
Y_lab = Y_lab, Yhat_lab = Yhat_lab, Yhat_unlab = Yhat_unlab,
alpha = 0.05, method = "ols")
Esimating equation
Description
psi
function for esimating equation
Usage
psi(X, Y, theta, quant = NA, method)
Arguments
X |
Array or DataFrame containing covariates |
Y |
Array or DataFrame of outcomes |
theta |
parameter theta |
quant |
quantile for quantile estimation |
method |
indicates the method to be used for M-estimation. Options include "mean", "quantile", "ols", "logistic", and "poisson". |
Value
esimating equation
Simulate the data for testing the functions
Description
sim_data
function for the calculation of the matrix A
Usage
sim_data(r = 0.9, binary = FALSE)
Arguments
r |
imputation correlation |
binary |
simulate binary outcome or not |
Value
simulated data