Type: | Package |
Title: | Propensity Score Weighting for Causal Inference with Observational Studies and Randomized Trials |
Version: | 2.1.1 |
Date: | 2025-03-01 |
Description: | Supports propensity score weighting analysis of observational studies and randomized trials. Enables the estimation and inference of average causal effects with binary and multiple treatments using overlap weights (ATO), inverse probability of treatment weights (ATE), average treatment effect among the treated weights (ATT), matching weights (ATM) and entropy weights (ATEN), with and without propensity score trimming. These weights are members of the family of balancing weights introduced in Li, Morgan and Zaslavsky (2018) <doi:10.1080/01621459.2016.1260466> and Li and Li (2019) <doi:10.1214/19-AOAS1282>. |
Depends: | R (≥ 3.5.0) |
License: | GPL-2 | GPL-3 [expanded from: GPL (≥ 2)] |
URL: | https://github.com/thuizhou/PSweight |
Encoding: | UTF-8 |
LazyData: | true |
RoxygenNote: | 7.3.2 |
Suggests: | knitr, rmarkdown |
Imports: | lme4, nnet, MASS, ggplot2, numDeriv, gbm, SuperLearner, survey |
VignetteBuilder: | knitr |
NeedsCompilation: | no |
Packaged: | 2025-03-02 06:03:13 UTC; yukang |
Author: | Tianhui Zhou [aut], Guangyu Tong [aut], Fan Li [aut], Laine Thomas [aut], Fan Li [aut], Yukang Zeng [cre] |
Maintainer: | Yukang Zeng <yukang.zeng@yale.edu> |
Repository: | CRAN |
Date/Publication: | 2025-03-02 06:30:02 UTC |
Illustrative dataset for PSweight
Description
This is a real observational study with binary and multiple treatment groups to illustrate the utility of PSweight functions.
Usage
data(NCDS)
Format
A data frame with 3642 rows and 16 columns.
Details
An dataset from the National Child Development Survey (NCDS) of the United Kingdom (UK). This dataset is obtained through the CC0 waiver from the work by Battistin and Sianesi (2011). For illustration, missing entries are imputed once with Multiple Imputation by Chained Equations (MICE).
This data frame contains the following columns:
white: self-identified as white.
wage: gross hourly wage in pound in log scale.
Dany: whether received any education before.
Dmult: three levels of educational attainment.
maemp: employment status of mother.
scht: school type.
qmab: math score at age 7.
qmab2: math score at age 11.
qvab: reading score at age 7.
qvab2: reading score at age 11.
paed_u: father's years of education.
maed_u: mother's years of education.
age_pa: age of father.
age_ma: age of mother.
sub_u: number of siblings.
wagebin: dichotomized wage obtained with a cutoff of average hourly wage.
References
https://cls.ucl.ac.uk/cls-studies/1958-national-child-development-study/
Battistin E, Sianesi B. (2011). Misclassified Treatment Status and Treatment Effects: an Application to Returns to Education in the United Kindom. Review of Economics and Statistics, 93(2), 495-509.
Battistin E, Sianesi B. (2012). Replication data for: Misclassified Treatment Status and Treatment Effects: an Application to Returns to Education in the United Kindom. URL https://doi.org.10.7910/DVN/EPCYUL.
Examples
data("NCDS")
Fitting potential outcome regression with different methods
Description
The function OUTmethod
is an internal function to estimate the potential outcomes given a specified model through formula.
It is built into function PSweight
, and is used for constructing the augmented estimators.
Usage
OUTmethod(
out.formula = out.formula,
y = y,
out.method = "glm",
family = "gaussian",
datain = datain,
dataout = dataout,
out.control = list()
)
Arguments
out.formula |
an object of class |
y |
a vector of the observed outcome in the training data ( |
out.method |
a character to specify the method for estimating the outcome regression model. |
family |
a description of the error distribution and link function to be used in the outcome model. Supported distributional families include
|
datain |
The training data for the outcome model. In the context of |
dataout |
The prediction data for the outcome model. In the context of |
out.control |
a list to specify additional options when |
Details
A typical form for out.formula
is y ~ terms
where y
is the outcome
variable and terms
is a series of terms which specifies linear predictors (on the link function scale). out.formula
by default specifies generalized
linear model given the gaussian error through the default arguments method = "glm"
and family = "gaussian"
. It fits the logistic regression when family = "binomal"
,and poisson
regression when family = "poisson"
. The argument out.method
allows user to choose
model other than glm to fit the outcome regression models for constructing the augmented estimator. We have included gbm
and SuperLearner
as alternative machine learning estimators.
Additional argument in them can be supplied through the ...
argument. Please refer to the user manual of the gbm
and SuperLearner
packages for all the
allowed arguments.
Value
m.est
a vector of predicted outcome on the
dataout
.gamma.h
estimated coefficient of the outcome model when
method = "glm"
.
Examples
#' the outcome model
out.formula <- Y~cov1+cov2+cov3+cov4+cov5+cov6
y <- psdata$Y
#train on model with treatment group 1
datain <- psdata[psdata$trt==1, ]
outfit <- OUTmethod(out.formula = out.formula, y=y, datain = datain, dataout = psdata)
Fitting potential outcome regression with different methods in survey observational data
Description
The function OUTmethod_SW
is an internal function to estimate the potential outcomes given a specified model through formula.
It is built into function PSweight
, and is used for constructing the augmented estimators.
Usage
OUTmethod_SW(
out.formula = out.formula,
out.weights = NULL,
y = y,
out.method = "glm",
family = "gaussian",
datain = datain,
dataout = dataout,
out.control = list()
)
Arguments
out.formula |
an object of class |
out.weights |
A numeric vector of weights to be used in the weighted regression estimator. For the moment estimator(MOM) and clever covariates estimator(CVR), this parameter defaults to NULL; however, in the weighted regression estimator(WET) it should be set to the balancing weights used for the weighted outcome regression model. |
y |
a vector of the observed outcome in the training data ( |
out.method |
a character to specify the method for estimating the outcome regression model. |
family |
a description of the error distribution and link function to be used in the outcome model. Supported distributional families include
|
datain |
The training data for the outcome model. In the context of |
dataout |
The prediction data for the outcome model. In the context of |
out.control |
a list to specify additional options when |
Details
A typical form for out.formula
is y ~ terms
where y
is the outcome
variable and terms
is a series of terms which specifies linear predictors (on the link function scale). out.formula
by default specifies generalized
linear model given the gaussian error through the default arguments method = "glm"
and family = "gaussian"
. It fits the logistic regression when family = "binomal"
,and poisson
regression when family = "poisson"
. The argument out.method
allows user to choose
model other than glm to fit the outcome regression models for constructing the augmented estimator. We have included gbm
and SuperLearner
as alternative machine learning estimators.
Additional argument in them can be supplied through the ...
argument. Please refer to the user manual of the gbm
and SuperLearner
packages for all the
allowed arguments.
When out.weights
is provided, weighted regression is applied in all supported methods. In "glm"
,
weighted least squares is used for Gaussian outcomes, and weighted likelihood for binomial or Poisson outcomes.
If family = "poisson"
and non-integer outcomes are detected (due to continuous-valued responses or weighting effects),
the function automatically switches to "gaussian"
to ensure numerical stability. In "gbm"
, outcome
weights are passed via the weights
argument, and if non-integer outcomes exist under family = "poisson"
,
the model switches to "gaussian"
. For "SuperLearner"
, weighted estimation is supported through the
obsWeights
argument. Since "SuperLearner"
does not support Poisson regression, a warning is issued,
and the model defaults to "gaussian"
. These adjustments ensure compatibility and stability across different
estimation approaches.
Value
m.est
a vector of predicted outcome on the
dataout
.gamma.h
estimated coefficient of the outcome model when
method = "glm"
.
Examples
# Load example datasets
data("psdata")
data("psdata_bin_prospective_fp")
data("psdata_bin_retrospective_fp")
# Define the outcome model formula.
out.formula <- Y ~ cov1 + cov2 + cov3 + cov4 + cov5 + cov6
# Extract the outcome vector from the retrospective data.
y <- psdata_bin_retrospective_fp$Y
# Use only the observations in treatment group 1 as the training data.
datain <- psdata_bin_retrospective_fp[psdata_bin_retrospective_fp$trt == 1, ]
# Fit the outcome regression model via OUTmethod_SW.
# By default, out.method = "glm" and family = "gaussian" are used.
outfit <- OUTmethod_SW(out.formula = out.formula, y = y, datain = datain,
dataout = psdata_bin_retrospective_fp)
# Print the predicted outcome vector on dataout.
cat("Predicted outcomes (first 10 values):\n")
print(head(outfit$m.est, 10))
# Print the estimated coefficient vector from the GLM.
cat("\nEstimated coefficients (gamma.h):\n")
print(outfit$gamma.h)
Fitting propensity score models with different methods
Description
The function PSmethod
is an internal function to estimate the propensity scores given a specified model through formula.
It is built into function Sumstat
, PStrim
and PSweight
.
Usage
PSmethod(
ps.formula = ps.formula,
method = "glm",
data = data,
ncate = ncate,
ps.control = list()
)
Arguments
ps.formula |
an object of class |
method |
a character to specify the method for estimating propensity scores. |
data |
an optional data frame containing the variables in the propensity score model. |
ncate |
a numeric to specify the number of treatment groups present in the given data. |
ps.control |
a list to specify additional options when |
Details
A typical form for ps.formula
is treatment ~ terms
where treatment
is the treatment
variable and terms
is a series of terms which specifies a linear predictor. ps.formula
by default specifies generalized
linear models given the default argument method = "glm"
. It fits the logistic regression when ncate = 2
,and multinomial
logistic regression when ncate > 2
. The argument method
allows user to choose
model other than glm to fit the propensity score models. We have included gbm
and SuperLearner
as two alternative machine learning methods.
Additional arguments of the machine learning estimators can be supplied through the ...
argument. Note that SuperLearner does not handle multiple groups and the current version of multinomial
logistic regression is not supported by gbm. We suggest user to use them with extra caution. Please refer to the user manual of the gbm
and SuperLearner
packages for all the
allowed arguments.
Value
e.h
a data frame of estimated propensity scores.
ps.fitObjects
the fitted propensity model details
beta.h
estimated coefficient of the propensity model when
method = "glm"
.
Examples
# the propensity model
ps.formula <- trt~cov1+cov2+cov3+cov4+cov5+cov6
psfit <- PSmethod(ps.formula = ps.formula,data = psdata,ncate=3)
Fitting propensity score models with different methods in survey observational data
Description
The function PSmethod_SW
is an internal function to estimate the propensity scores given a specified model through formula.
It is built into function Sumstat_SW
, PStrim_SW
and PSweight_SW
.
Usage
PSmethod_SW(
ps.formula = ps.formula,
method = "glm",
data = data,
survey.weight = survey.weight,
ncate = ncate,
ps.control = list()
)
Arguments
ps.formula |
an object of class |
method |
a character to specify the method for estimating propensity scores. |
data |
an optional data frame containing the variables in the propensity score model. |
survey.weight |
an numeric vector specifying survey weights for each observation. |
ncate |
a numeric to specify the number of treatment groups present in the given data. |
ps.control |
a list to specify additional options when |
Details
A typical form for ps.formula
is treatment ~ terms
where treatment
is the treatment
variable and terms
is a series of terms which specifies a linear predictor. ps.formula
by default specifies generalized
linear models given the default argument method = "glm"
. It fits the logistic regression when ncate = 2
,and multinomial
logistic regression when ncate > 2
. The argument method
allows user to choose
model other than glm to fit the propensity score models. We have included gbm
and SuperLearner
as two alternative machine learning methods.
Additional arguments of the machine learning estimators can be supplied through the ...
argument. Note that SuperLearner does not handle multiple groups and the current version of multinomial
logistic regression is not supported by gbm. We suggest user to use them with extra caution. Please refer to the user manual of the gbm
and SuperLearner
packages for all the
allowed arguments.
Value
e.h
a data frame of estimated propensity scores.
ps.fitObjects
the fitted propensity model details
beta.h
estimated coefficient of the propensity model when
method = "glm"
.
Examples
# Load example datasets
data("psdata")
data("psdata_bin_prospective_fp")
data("psdata_bin_retrospective_fp")
# Define the propensity score model.
ps.formula <- trt ~ cov1 + cov2 + cov3 + cov4 + cov5 + cov6
# Extract the survey weights from the retrospective data.
survey.weight <- psdata_bin_retrospective_fp$survey_weight
# Specify the number of treatment groups (for binary treatment, ncate = 2)
ncate <- 2
# Fit the propensity score model using PSmethod_SW.
psfit <- PSmethod_SW(ps.formula = ps.formula,
data = psdata_bin_retrospective_fp,
survey.weight = survey.weight,
ncate = ncate)
# Print the first 10 rows of the estimated propensity scores.
cat("Estimated propensity scores (first 10 observations):\n")
print(head(psfit$e.h, 10))
# For the 'glm' method, print the estimated coefficient vector.
cat("\nEstimated coefficients (beta.h):\n")
print(psfit$beta.h)
# Users can also inspect the full fitted model object.
cat("\nFitted propensity model object summary:\n")
print(summary(psfit$ps.fitObjects))
Trim the input data and propensity estimate
Description
Trim the original data and propensity estimate according to symmetric propensity score trimming rules.
Usage
PStrim(
data,
ps.formula = NULL,
zname = NULL,
ps.estimate = NULL,
delta = 0,
optimal = FALSE,
out.estimate = NULL,
method = "glm",
ps.control = list()
)
Arguments
data |
an optional data frame containing the variables required by |
ps.formula |
an object of class |
zname |
an optional character specifying the name of the treatment variable in |
ps.estimate |
an optional matrix or data frame containing estimated (generalized) propensity scores for each observation. Typically, this is an N by J matrix, where N is the number of observations and J is the total number of treatment levels. Preferably, the column name of this matrix should match the name of treatment level, if column name is missing or there is a mismatch, the column names would be assigned according to alphabatic order of the treatment levels. A vector of propensity score estimates is also allowed in |
delta |
trimming threshold for estimated (generalized) propensity scores. Should be no larger than 1 / number of treatment groups. Default is 0, corresponding to no trimming. |
optimal |
an logical argument indicating if optimal trimming should be used. Default is |
out.estimate |
an optional matrix or data frame containing estimated potential outcomes
for each observation. Typically, this is an N by J matrix, where N is the number of observations
and J is the total number of treatment levels. Preferably, the column name of this matrix should
match the name of treatment level, if column name is missing or there is a mismatch,
the column names would be assigned according to alphabatic order of the treatment levels, with a
similar mechanism as in |
method |
a character to specify the method for estimating propensity scores. |
ps.control |
a list to specify additional options when |
Details
A typical form for ps.formula
is treatment ~ terms
where treatment
is the treatment
variable (identical to the variable name used to specify zname
) and terms
is a series of terms
which specifies a linear predictor for treatment
. ps.formula
specifies a
model for estimating the propensity scores, when ps.estimate
is NULL
.
"glm"
is the default method for propensity score estimation. Logistic regression will be used for binary outcomes,
and multinomial logistic regression will be used for outcomes with more than two categories. The alternative method option of "gbm"
serves as an API to call the gbm()
function from the
gbm
package. Additional argument in the gbm()
function can be supplied through the ps.control=list()
argument in SumStat()
. Please refer to the user manual of the "gbm"
package for all the
allowed arguments. Currently, models for binary or multinomial treatment will be automatically chosen based on the number of treatment categories.
"SuperLearner"
is also allowed in the method
argument to call the SuperLearner()
function in SuperLearner
package.
Currently, the SuperLearner method only support binary treatment with the default method set to "SL.glm"
. The estimation approach is default to "method.NNLS"
.
Prediction algorithm and other tuning parameters can also be passed through ps.control=list()
. Please refer to the user manual of the SuperLearner
package for all the allowed specifications.
When comparing two treatments, ps.estimate
can either be a vector or a two-column matrix of estimated
propensity scores. If a vector is supplied, it is assumed to be the propensity scores to receive the treatment, and
the treatment group corresponds to the last group in the alphebatic order, unless otherwise specified by trtgrp
.
When comparing multiple (J>=3) treatments, ps.estimate
needs to be specified as an N by J matrix,
where N indicates the number of observations, and J indicates the total number of treatments.
This matrix specifies the estimated generalized propensity scores to receive each of the J treatments.
The same mechanism applies to out.estimate
, except that the input for out.estimate
must be an N by J matrix, where each row corresponds to the estimated potential outcomes (corresponding to each treatment)
for each observation.
With binary treatments, delta
defines the symmetric propensity score trimming rule following Crump et al. (2009).
With multiple treatments, delta
defines the symmetric multinomial trimming rule introduced in Yoshida et al. (2019).
With binary treatments and when optimal
equals TRUE
, the trimming function implements the optimal
symmetric trimming rule in Crump et al. (2009). The optimal trimming threshold delta
is then returned.
With multiple treatments and optimal
equals TRUE
, the trimming function implements the optimal trimming rule in Yang et al. (2016).
The optimal cutoff lambda
, which defines the acceptable upper bound for the sum of inverse generalized propensity scores, is
returned. See Yang et al. (2016) and Li and Li (2019) for details.
The argument zname
is required when ps.estimate
is not NULL
.
Value
PStrim returns a list of the following values:
data
a data frame of trimmed data.
trim_sum
a table summrizing the number of cases by treatment groups before and after trimming.
ps.estimate
a data frame of propensity estimate after trimming.
delta
an optional output of trimming threshold for symmetric trimming.
lambda
an optional output trimming threshold for optimal trimming with multiple treatment groups.
out.estimate
a data frame of estimated potential outcomes after trimming.
References
Crump, R. K., Hotz, V. J., Imbens, G. W., Mitnik, O. A. (2009). Dealing with limited overlap in estimation of average treatment effects. Biometrika, 96(1), 187-199.
Yoshida, K., Solomon, D.H., Haneuse, S., Kim, S.C., Patorno, E., Tedeschi, S.K., Lyu, H., Franklin, J.M., Stürmer, T., Hernández-Díaz, S. and Glynn, R.J. (2019). Multinomial extension of propensity score trimming methods: A simulation study. American Journal of Epidemiology, 188(3), 609-616.
Yang, S., Imbens, G. W., Cui, Z., Faries, D. E., Kadziola, Z. (2016). Propensity score matching and subclassification in observational studies with multi-level treatments. Biometrics, 72(4), 1055-1065.
Li, F., Li, F. (2019). Propensity score weighting for causal inference with multiple treatments. The Annals of Applied Statistics, 13(4), 2389-2415.
Examples
data("psdata")
# the propensity model
ps.formula<-trt~cov1+cov2+cov3+cov4+cov5+cov6
# trim the original data by setting the threshold of propensity as 0.05
PStrim(data=psdata, ps.formula=ps.formula, delta=0.05)
PStrim(data=psdata, ps.formula=ps.formula, optimal=TRUE)
Trim the input data and propensity estimate in survey observational data
Description
Trim the original data and propensity estimate according to symmetric propensity score trimming rules.
Usage
PStrim_SW(
data,
ps.formula = NULL,
zname = NULL,
ps.estimate = NULL,
svywtname = NULL,
delta = 0,
optimal = FALSE,
out.estimate = NULL,
method = "glm",
ps.control = list()
)
Arguments
data |
an optional data frame containing the variables required by |
ps.formula |
an object of class |
zname |
an optional character specifying the name of the treatment variable in |
ps.estimate |
an optional matrix or data frame containing estimated (generalized) propensity scores for each observation. Typically, this is an N by J matrix, where N is the number of observations and J is the total number of treatment levels. Preferably, the column name of this matrix should match the name of treatment level, if column name is missing or there is a mismatch, the column names would be assigned according to alphabatic order of the treatment levels. A vector of propensity score estimates is also allowed in |
svywtname |
an optional character specifying the name of the survey weight variable in |
delta |
trimming threshold for estimated (generalized) propensity scores. Should be no larger than 1 / number of treatment groups. Default is 0, corresponding to no trimming. |
optimal |
an logical argument indicating if optimal trimming should be used. Default is |
out.estimate |
an optional matrix or data frame containing estimated potential outcomes
for each observation. Typically, this is an N by J matrix, where N is the number of observations
and J is the total number of treatment levels. Preferably, the column name of this matrix should
match the name of treatment level, if column name is missing or there is a mismatch,
the column names would be assigned according to alphabatic order of the treatment levels, with a
similar mechanism as in |
method |
a character to specify the method for estimating propensity scores. |
ps.control |
a list to specify additional options when |
Details
A typical form for ps.formula
is treatment ~ terms
where treatment
is the treatment
variable (identical to the variable name used to specify zname
) and terms
is a series of terms
which specifies a linear predictor for treatment
. ps.formula
specifies a
model for estimating the propensity scores, when ps.estimate
is NULL
.
"glm"
is the default method for propensity score estimation. Logistic regression will be used for binary outcomes,
and multinomial logistic regression will be used for outcomes with more than two categories. The alternative method option of "gbm"
serves as an API to call the gbm()
function from the
gbm
package. Additional argument in the gbm()
function can be supplied through the ps.control=list()
argument in SumStat()
. Please refer to the user manual of the "gbm"
package for all the
allowed arguments. Currently, models for binary or multinomial treatment will be automatically chosen based on the number of treatment categories.
"SuperLearner"
is also allowed in the method
argument to call the SuperLearner()
function in SuperLearner
package.
Currently, the SuperLearner method only support binary treatment with the default method set to "SL.glm"
. The estimation approach is default to "method.NNLS"
.
Prediction algorithm and other tuning parameters can also be passed through ps.control=list()
. Please refer to the user manual of the SuperLearner
package for all the allowed specifications.
When comparing two treatments, ps.estimate
can either be a vector or a two-column matrix of estimated
propensity scores. If a vector is supplied, it is assumed to be the propensity scores to receive the treatment, and
the treatment group corresponds to the last group in the alphebatic order, unless otherwise specified by trtgrp
.
When comparing multiple (J>=3) treatments, ps.estimate
needs to be specified as an N by J matrix,
where N indicates the number of observations, and J indicates the total number of treatments.
This matrix specifies the estimated generalized propensity scores to receive each of the J treatments.
The same mechanism applies to out.estimate
, except that the input for out.estimate
must be an N by J matrix, where each row corresponds to the estimated potential outcomes (corresponding to each treatment)
for each observation.
With binary treatments, delta
defines the symmetric propensity score trimming rule following Crump et al. (2009).
With multiple treatments, delta
defines the symmetric multinomial trimming rule introduced in Yoshida et al. (2019).
With binary treatments and when optimal
equals TRUE
, the trimming function implements the optimal
symmetric trimming rule in Crump et al. (2009). The optimal trimming threshold delta
is then returned.
With multiple treatments and optimal
equals TRUE
, the trimming function implements the optimal trimming rule in Yang et al. (2016).
The optimal cutoff lambda
, which defines the acceptable upper bound for the sum of inverse generalized propensity scores, is
returned. See Yang et al. (2016) and Li and Li (2019) for details.
The argument zname
is required when ps.estimate
is not NULL
.
Value
PStrim returns a list of the following values:
data
a data frame of trimmed data.
trim_sum
a table summrizing the number of cases by treatment groups before and after trimming.
ps.estimate
a data frame of propensity estimate after trimming.
delta
an optional output of trimming threshold for symmetric trimming.
lambda
an optional output trimming threshold for optimal trimming with multiple treatment groups.
out.estimate
a data frame of estimated potential outcomes after trimming.
References
Crump, R. K., Hotz, V. J., Imbens, G. W., Mitnik, O. A. (2009). Dealing with limited overlap in estimation of average treatment effects. Biometrika, 96(1), 187-199.
Yoshida, K., Solomon, D.H., Haneuse, S., Kim, S.C., Patorno, E., Tedeschi, S.K., Lyu, H., Franklin, J.M., Stürmer, T., Hernández-Díaz, S. and Glynn, R.J. (2019). Multinomial extension of propensity score trimming methods: A simulation study. American Journal of Epidemiology, 188(3), 609-616.
Yang, S., Imbens, G. W., Cui, Z., Faries, D. E., Kadziola, Z. (2016). Propensity score matching and subclassification in observational studies with multi-level treatments. Biometrics, 72(4), 1055-1065.
Li, F., Li, F. (2019). Propensity score weighting for causal inference with multiple treatments. The Annals of Applied Statistics, 13(4), 2389-2415.
Examples
# Define the propensity score model
ps.formula <- trt ~ cov1 + cov2 + cov3 + cov4 + cov5 + cov6
## Example 1: Apply symmetric trimming with delta = 0.05
trim_result <- PStrim_SW(data = psdata_bin_prospective_fp,
ps.formula = ps.formula,
svywtname = "survey_weight",
delta = 0.05)
# Display the trimming summary and view the trimmed data
print(trim_result)
## Example 2: Apply optimal trimming (delta is ignored when optimal = TRUE)
trim_result_opt <- PStrim_SW(data = psdata_bin_prospective_fp,
ps.formula = ps.formula,
svywtname = "survey_weight",
optimal = TRUE)
# Display the optimal trimming summary including the computed lambda
print(trim_result_opt)
Estimate average causal effects by propensity score weighting
Description
The function PSweight
is used to estimate the average potential outcomes corresponding to
each treatment group among the target population. The function currently implements
the following types of weights: the inverse probability of treatment weights (IPW: target population is the combined population),
average treatment effect among the treated weights (treated: target population is the population receiving a specified treatment),
overlap weights (overlap: target population is the overlap population at clinical equipoise), matching weights (matching: target population
is population obtained under 1:1 matching), entropy weights (entropy: target population is the population weighted by the entropy function).
Augmented propensity score weighting estimators are also allowed, with propensity scores and outcome model estimates either estimated
within the function, or supplied by external routines.
Usage
PSweight(
ps.formula = NULL,
ps.estimate = NULL,
trtgrp = NULL,
zname = NULL,
yname,
data,
weight = "overlap",
delta = 0,
augmentation = FALSE,
bootstrap = FALSE,
R = 50,
out.formula = NULL,
out.estimate = NULL,
family = "gaussian",
ps.method = "glm",
ps.control = list(),
out.method = "glm",
out.control = list()
)
Arguments
ps.formula |
an object of class |
ps.estimate |
an optional matrix or data frame containing estimated (generalized) propensity scores for
each observation. Typically, this is an N by J matrix, where N is the number of observations and J is the
total number of treatment levels. Preferably, the column name of this matrix should match the name of treatment level,
if column name is missing or there is a mismatch, the column names would be assigned according to alphabatic order
of the treatment levels. A vector of propensity score estimates is also allowed in |
trtgrp |
an optional character defining the "treated" population for estimating the average treatment
effect among the treated (ATT). Only necessary if |
zname |
an optional character specifying the name of the treatment variable in |
yname |
an optional character specifying the name of the outcome variable in |
data |
an optional data frame containing the variables in the propensity score model
and outcome model (if augmented estimator is used). If not found in data, the variables are
taken from |
weight |
a character or vector of characters including the types of weights to be used.
|
delta |
trimming threshold for estimated (generalized) propensity scores. Should be no larger than 1 / number of treatment groups. Default is 0, corresponding to no trimming. |
augmentation |
logical. Indicate whether augmented weighting estimators should be used.
Default is |
bootstrap |
logical. Indaicate whether bootstrap is used to estimate the standard error
of the point estimates. Default is |
R |
an optional integer indicating number of bootstrap replicates. Default is |
out.formula |
an object of class |
out.estimate |
an optional matrix or data frame containing estimated potential outcomes
for each observation. Typically, this is an N by J matrix, where N is the number of observations
and J is the total number of treatment levels. Preferably, the column name of this matrix should
match the name of treatment level, if column name is missing or there is a mismatch,
the column names would be assigned according to alphabatic order of the treatment levels, with a
similar mechanism as in |
family |
a description of the error distribution and link function to be used in the outcome model.
Only required if |
ps.method |
a character to specify the method for estimating propensity scores. |
ps.control |
a list to specify additional options when |
out.method |
a character to specify the method for estimating the outcome regression model. |
out.control |
a list to specify additional options when |
Details
A typical form for ps.formula
is treatment ~ terms
where treatment
is the treatment
variable (identical to the variable name used to specify zname
) and terms
is a series of terms
which specifies a linear predictor for treatment
. Similarly, a typical form for out.formula
is
outcome ~ terms
where outcome
is the outcome variable (identical to the variable name
used to specify yname
) and terms
is a series of terms which specifies a linear
predictor for outcome
. Both ps.formula
and out.formula
by default specify generalized
linear models when ps.estimate
and/or out.estimate
is NULL
. The argument ps.method
and out.method
allow user to choose
model other than glm to fit the propensity score and outcome regression models for augmentation. Additional argument in the gbm()
function can be supplied through the ps.control
and out.control
argument. Please refer to the user manual of the gbm
package for all the
allowed arguments. "SuperLearner"
is also allowed in the ps.method
and out.method
arguments. Currently, the SuperLearner method only supports binary treatment with the default method set to "SL.glm"
. The estimation approach is default to "method.NNLS"
for both propensity and outcome regression models.
Prediction algorithm and other tuning parameters can also be passed throughps.control
and out.control
. Please refer to the user manual of the SuperLearner
package for all the allowed specifications.
When comparing two treatments, ps.estimate
can either be a vector or a two-column matrix of estimated
propensity scores. If a vector is supplied, it is assumed to be the propensity scores to receive the treatment, and
the treatment group corresponds to the last group in the alphebatic order, unless otherwise specified by trtgrp
.
When comparing multiple (J>=3) treatments, ps.estimate
needs to be specified as an N by J matrix,
where N indicates the number of observations, and J indicates the total number of treatments.
This matrix specifies the estimated generalized propensity scores to receive each of the J treatments.
In general, ps.estimate
should have column names that indicate the level of the treatment variable,
which should match the levels given in Z
.
If column names are empty or there is a mismatch, the column names will be created following
the alphebatic order of values in Z
, and the rightmost coulmn of ps.estimate
is assumed
to be the treatment group, when estimating ATT. trtgrp
can also be used to specify the treatment
group for estimating ATT. The same mechanism applies to out.estimate
, except that the input for out.estimate
must be an N by J matrix, where each row corresponds to the estimated potential outcomes (corresponding to each treatment)
for each observation.
The argument zname
and/or yname
is required when ps.estimate
and/or out.estimate
is not NULL
.
Current version of PSweight
allows for five types of propensity score weights used to estimate ATE (IPW), ATT (treated) and
ATO (overlap), ATM (matching) and ATEN (entropy). These weights are members of larger class of balancing weights defined in Li, Morgan, and Zaslavsky (2018).
Specific definitions of these weights are provided in Li, Morgan, and Zaslavsky (2018), Li and Greene (2013), Zhou, Matsouaka and Thomas (2020).
When there is a practical violation of the positivity assumption, delta
defines the symmetric
propensity score trimming rule following Crump et al. (2009). With multiple treatments, delta
defines the
multinomial trimming rule introduced in Yoshida et al. (2019). The overlap weights can also be considered as
a data-driven continuous trimming strategy without specifying trimming rules, see Li, Thomas and Li (2019).
Additional details on balancing weights and generalized overlap weights for multiple treatment groups are provided in
Li and Li (2019).
If augmentation = TRUE
, an augmented weighting estimator will be implemented. For binary treatments, the augmented
weighting estimator is presented in Mao, Li and Greene (2018). For multiple treatments, the augmented weighting estimator is
mentioned in Li and Li (2019), and additional details will appear in our ongoing work (Zhou et al. 2020+). When
weight = "IPW"
, the augmented estimator is also referred to as a doubly-robust (DR) estimator.
When bootstrap = TRUE
, the variance will be calculated by nonparametric bootstrap, with R
bootstrap
replications. The default of R
is 50. Otherwise, the variance will be calculated using the sandwich variance
formula obtained in the M-estimation framework.
Value
PSweight returns a PSweight
object containing a list of the following values:
estimated propensity scores, average potential outcomes corresponding to each treatment,
variance-covariance matrix of the point estimates, the label for each treatment group,
and estimates in each bootstrap replicate if bootstrap = TRUE
.
A summary of PSweight can be obtained with summary.PSweight
.
trtgrp
a character indicating the treatment group.
propensity
a data frame of estimated propensity scores.
muhat
average potential outcomes by treatment groups, with reference to specific target populations.
covmu
variance-covariance matrix of
muhat
.muboot
an optional list of point estimates in each bootstrap replicate
bootstrap = TRUE
.group
a table of treatment group labels corresponding to the output point estimates
muhat
.
References
Crump, R. K., Hotz, V. J., Imbens, G. W., Mitnik, O. A. (2009). Dealing with limited overlap in estimation of average treatment effects. Biometrika, 96(1), 187-199.
Li, L., Greene, T. (2013). A weighting analogue to pair matching in propensity score analysis. The International Journal of Biostatistics, 9(2), 215-234.
Li, F., Morgan, K. L., Zaslavsky, A. M. (2018). Balancing covariates via propensity score weighting. Journal of the American Statistical Association, 113(521), 390-400.
Mao, H., Li, L., Greene, T. (2019). Propensity score weighting analysis and treatment effect discovery. Statistical Methods in Medical Research, 28(8), 2439-2454.
Li, F., Thomas, L. E., Li, F. (2019). Addressing extreme propensity scores via the overlap weights. American Journal of Epidemiology, 188(1), 250-257.
Yoshida, K., Solomon, D.H., Haneuse, S., Kim, S.C., Patorno, E., Tedeschi, S.K., Lyu, H., Franklin, J.M., Stürmer, T., Hernández-Díaz, S. and Glynn, R.J. (2019). Multinomial extension of propensity score trimming methods: A simulation study. American Journal of Epidemiology, 188(3), 609-616.
Li, F., Li, F. (2019). Propensity score weighting for causal inference with multiple treatments. The Annals of Applied Statistics, 13(4), 2389-2415.
Zhou, Y., Matsouaka, R. A., Thomas, L. (2020). Propensity score weighting under limited overlap and model misspecification. Statistical Methods in Medical Research, 29(12), 3721-3756.
Examples
data("psdata")
# the propensity and outcome models
ps.formula<-trt~cov1+cov2+cov3+cov4+cov5+cov6
out.formula<-Y~cov1+cov2+cov3+cov4+cov5+cov6
# without augmentation
ato1<-PSweight(ps.formula = ps.formula,yname = 'Y',data = psdata,weight = 'overlap')
summary(ato1)
# augmented weighting estimator, takes longer time to calculate sandwich variance
# ato2<-PSweight(ps.formula = ps.formula,yname = 'Y',data = psdata,
# augmentation = TRUE,out.formula = out.formula,family = 'gaussian',weight = 'overlap')
# summary(ato2)
Estimate average causal effects by propensity score weighting in survey observational data
Description
The function PSweight_SW
is used to estimate the average potential outcomes corresponding to
each treatment group among the target population. The function currently implements
the following types of weights: the inverse probability of treatment weights (IPW: target population is the combined population),
average treatment effect among the treated weights (treated: target population is the population receiving a specified treatment),
overlap weights (overlap: target population is the overlap population at clinical equipoise), matching weights (matching: target population
is population obtained under 1:1 matching), entropy weights (entropy: target population is the population weighted by the entropy function).
Augmented propensity score weighting estimators are also allowed, with propensity scores and outcome model estimates either estimated
within the function, or supplied by external routines. The function now includes support for survey designs that include specific survey weights,
mainly focusing on binary treatments for now by incorporating survey weights into propensity score estimation and both point and augmented estimators
for the outcome estimation.
Usage
PSweight_SW(
ps.formula = NULL,
ps.estimate = NULL,
trtgrp = NULL,
zname = NULL,
yname,
data,
weight = "overlap",
survey.indicator = FALSE,
survey.design = "Independent",
svywtname = NULL,
delta = 0,
augmentation = FALSE,
augmentation.type = "WET",
bootstrap = FALSE,
R = 50,
out.formula = NULL,
out.estimate = NULL,
family = "gaussian",
ps.method = "glm",
ps.control = list(),
out.method = "glm",
out.control = list()
)
Arguments
ps.formula |
an object of class |
ps.estimate |
an optional matrix or data frame containing estimated (generalized) propensity scores for
each observation. Typically, this is an N by J matrix, where N is the number of observations and J is the
total number of treatment levels. Preferably, the column name of this matrix should match the name of treatment level,
if column name is missing or there is a mismatch, the column names would be assigned according to alphabatic order
of the treatment levels. A vector of propensity score estimates is also allowed in |
trtgrp |
an optional character defining the "treated" population for estimating the average treatment
effect among the treated (ATT). Only necessary if |
zname |
an optional character specifying the name of the treatment variable in |
yname |
an optional character specifying the name of the outcome variable in |
data |
an optional data frame containing the variables in the propensity score model
and outcome model (if augmented estimator is used). If not found in data, the variables are
taken from |
weight |
a character or vector of characters including the types of weights to be used.
|
survey.indicator |
logical. Indicates whether survey weights are used in the estimation.
Default is |
survey.design |
character. Specifies the survey design scenario for estimation. Acceptable values are "Retrospective", "Independent", and "Prospective". "Retrospective" indicates that the sampling process depends on both treatment assignment and covariates, "Independent" (the default) means that the sampling process is independent of treatment assignment, and "Prospective" signifies that sampling is conducted prior to treatment assignment, although treatment may later be influenced by the sampling process. |
svywtname |
an optional character specifying the name of the survey weight variable in |
delta |
trimming threshold for estimated (generalized) propensity scores. Should be no larger than 1 / number of treatment groups. Default is 0, corresponding to no trimming. |
augmentation |
logical. Indicate whether augmented weighting estimators should be used.
Default is |
augmentation.type |
a character specifying the type of augmentation to use when |
bootstrap |
logical. Indaicate whether bootstrap is used to estimate the standard error
of the point estimates. Default is |
R |
an optional integer indicating number of bootstrap replicates. Default is |
out.formula |
an object of class |
out.estimate |
an optional matrix or data frame containing estimated potential outcomes
for each observation. Typically, this is an N by J matrix, where N is the number of observations
and J is the total number of treatment levels. Preferably, the column name of this matrix should
match the name of treatment level, if column name is missing or there is a mismatch,
the column names would be assigned according to alphabatic order of the treatment levels, with a
similar mechanism as in |
family |
a description of the error distribution and link function to be used in the outcome model.
Only required if |
ps.method |
a character to specify the method for estimating propensity scores. |
ps.control |
a list to specify additional options when |
out.method |
a character to specify the method for estimating the outcome regression model. |
out.control |
a list to specify additional options when |
Details
A typical form for ps.formula
is treatment ~ terms
where treatment
is the treatment
variable (identical to the variable name used to specify zname
) and terms
is a series of terms
which specifies a linear predictor for treatment
. Similarly, a typical form for out.formula
is
outcome ~ terms
where outcome
is the outcome variable (identical to the variable name
used to specify yname
) and terms
is a series of terms which specifies a linear
predictor for outcome
. Both ps.formula
and out.formula
by default specify generalized
linear models when ps.estimate
and/or out.estimate
is NULL
. The argument ps.method
and out.method
allow users to choose
models other than glm to fit the propensity score and outcome regression models for augmentation. Additional arguments in the gbm()
function can be supplied through the ps.control
and out.control
arguments. Please refer to the user manual of the gbm
package for all the
allowed arguments. "SuperLearner"
is also allowed in the ps.method
and out.method
arguments. Currently, the SuperLearner method only supports binary treatment with the default method set to "SL.glm"
. The estimation approach is default to "method.NNLS"
for both propensity and outcome regression models.
Prediction algorithms and other tuning parameters can also be passed through ps.control
and out.control
. Please refer to the user manual of the SuperLearner
package for all the allowed specifications.
The function now includes support for the survey setting, mainly focusing on binary treatments by incorporating survey weights into propensity score estimation and both point and augmented estimators for the outcome stage. In survey settings,
external propensity score estimates are not supported; both population-level and sample-level propensity scores are estimated using the internal routines.
When augmentation = TRUE
in the survey setting, three augmented estimators are supported: the moment estimator(MOM),
the clever covariate regression estimator(CVR), and the weighted regression estimator (WET, which is the
default). The user can select the desired estimator by setting the augmentation.type
parameter accordingly.
When comparing two treatments, ps.estimate
can either be a vector or a two-column matrix of estimated
propensity scores. If a vector is supplied, it is assumed to be the propensity scores to receive the treatment, and
the treatment group corresponds to the last group in the alphabetical order, unless otherwise specified by trtgrp
.
When comparing multiple (J>=3) treatments, ps.estimate
needs to be specified as an N by J matrix,
where N indicates the number of observations, and J indicates the total number of treatments.
This matrix specifies the estimated generalized propensity scores to receive each of the J treatments.
In general, ps.estimate
should have column names that indicate the level of the treatment variable,
which should match the levels given in Z
.
If column names are empty or there is a mismatch, the column names will be created following
the alphabetical order of values in Z
, and the rightmost column of ps.estimate
is assumed
to be the treatment group when estimating ATT. trtgrp
can also be used to specify the treatment
group for estimating ATT. The same mechanism applies to out.estimate
, except that the input for out.estimate
must be an N by J matrix, where each row corresponds to the estimated potential outcomes (corresponding to each treatment)
for each observation.
The argument zname
and/or yname
is required when ps.estimate
and/or out.estimate
is not NULL
.
In survey settings, when survey.indicator
is TRUE
, the argument svywtname
(which specifies the survey weight variable in data
) is required;
if svywtname
is not provided, a default survey weight of 1 is applied to all observations. The argument
survey.design
must be specified to reflect the sampling mechanism: for example, "Retrospective"
indicates
that the sampling process depends on both treatment assignment and covariates, "Independent"
assumes that sampling
is independent of treatment assignment, and "Prospective"
signifies that sampling is conducted prior to treatment assignment,
although treatment may later be influenced by the sampling results.
Current version of PSweight
allows for five types of propensity score weights used to estimate population level ATE (IPW), ATT (treated) and
ATO (overlap), ATM (matching) and ATEN (entropy) under survey settings. These weights are members of larger class of balancing weights defined in Li, Morgan, and Zaslavsky (2018).
Specific definitions of these weights are provided in Li, Morgan, and Zaslavsky (2018), Li and Greene (2013), Zhou, Matsouaka and Thomas (2020), Zeng, Li and Tong (2025).
When there is a practical violation of the positivity assumption, delta
defines the symmetric
propensity score trimming rule following Crump et al. (2009). With multiple treatments, delta
defines the
multinomial trimming rule introduced in Yoshida et al. (2019). The overlap weights can also be considered as
a data-driven continuous trimming strategy without specifying trimming rules, see Li, Thomas and Li (2019).
Additional details on balancing weights and generalized overlap weights for multiple treatment groups are provided in
Li and Li (2019). Zeng, Li, and Tong (2025) further specify how the survey weights can be incorporated into propensity score weighting under both retrospective and prospective scenarios.
Their approach supports both a weighting-only estimator and all three augmented estimators(MOM, CVR and WET), with corresponding sandwich variance estimators developed.
These enhancements are implemented in the current version of PSweight
.
If augmentation = TRUE
, an augmented weighting estimator will be implemented. For binary treatments, the augmented
weighting estimator is presented in Mao, Li and Greene (2018). For multiple treatments, the augmented weighting estimator is
mentioned in Li and Li (2019), and additional details will appear in our ongoing work (Zhou et al. 2020+). When
weight = "IPW"
, the augmented estimator is also referred to as a doubly-robust (DR) estimator.
In survey settings, the augmented estimator is further extended to support three variants: the moment estimator (MOM),
the clever covariate estimator (CVR), and the weighted regression estimator (WET); the default choice is WET.
Users can select the desired variant by specifying the augmentation.type
parameter.
When bootstrap = TRUE
, the variance will be calculated by nonparametric bootstrap, with R
bootstrap
replications. The default of R
is 50. Otherwise, the variance will be calculated using the sandwich variance
formula obtained in the M-estimation framework. In survey settings, however, bootstrapping is currently not supported;
we recommend that users employ the sandwich variance estimator instead.
Value
PSweight_SW returns a PSweight_SW
object containing a list of the following values:
estimated propensity scores for both population and sample levels, average potential outcomes corresponding to each treatment,
variance-covariance matrix of the point estimates, the label for each treatment group,
and estimates in each bootstrap replicate if bootstrap = TRUE
.
A summary of PSweight_SW can be obtained with summary.PSweight
.
trtgrp
a character indicating the treatment group.
propensity
a data frame of estimated propensity scores. When
survey.indicator = TRUE
, it is population level propensity score estimated by survey-weighted regression.propensity.sample
a data frame of estimated sample level propensity scores. This element is included when
survey.indicator = TRUE
.muhat
average potential outcomes by treatment groups, with reference to specific target populations.
covmu
variance-covariance matrix of
muhat
.muboot
an optional list of point estimates in each bootstrap replicate
bootstrap = TRUE
.group
a table of treatment group labels corresponding to the output point estimates
muhat
.
References
Crump, R. K., Hotz, V. J., Imbens, G. W., Mitnik, O. A. (2009). Dealing with limited overlap in estimation of average treatment effects. Biometrika, 96(1), 187-199.
Li, L., Greene, T. (2013). A weighting analogue to pair matching in propensity score analysis. The International Journal of Biostatistics, 9(2), 215-234.
Li, F., Morgan, K. L., Zaslavsky, A. M. (2018). Balancing covariates via propensity score weighting. Journal of the American Statistical Association, 113(521), 390-400.
Mao, H., Li, L., Greene, T. (2019). Propensity score weighting analysis and treatment effect discovery. Statistical Methods in Medical Research, 28(8), 2439-2454.
Li, F., Thomas, L. E., Li, F. (2019). Addressing extreme propensity scores via the overlap weights. American Journal of Epidemiology, 188(1), 250-257.
Yoshida, K., Solomon, D.H., Haneuse, S., Kim, S.C., Patorno, E., Tedeschi, S.K., Lyu, H., Franklin, J.M., Stürmer, T., Hernández-Díaz, S. and Glynn, R.J. (2019). Multinomial extension of propensity score trimming methods: A simulation study. American Journal of Epidemiology, 188(3), 609-616.
Li, F., Li, F. (2019). Propensity score weighting for causal inference with multiple treatments. The Annals of Applied Statistics, 13(4), 2389-2415.
Zhou, Y., Matsouaka, R. A., Thomas, L. (2020). Propensity score weighting under limited overlap and model misspecification. Statistical Methods in Medical Research, 29(12), 3721-3756.
Zeng, Y., Li, F., & Tong, G. (2025). Moving toward best practice when using propensity score weighting in survey observational studies. arXiv preprint arXiv:2501.16156.
Examples
data("psdata")
data("psdata_bin_prospective_fp")
data("psdata_bin_retrospective_fp")
# Define the formulas
ps.formula <- trt ~ cov1 + cov2 + cov3 + cov4 + cov5 + cov6
out.formula <- Y ~ cov1 + cov2 + cov3 + cov4 + cov5 + cov6
# Prospective design without augmentation
pato1_sw <- PSweight_SW(ps.formula = ps.formula, yname = "Y",
data = psdata_bin_prospective_fp, weight = "overlap",
survey.indicator = TRUE, survey.design = "Prospective",
svywtname = "survey_weight",
delta = 0.1, augmentation = FALSE, bootstrap = FALSE, R = 50,
out.formula = NULL, out.estimate = NULL, family = "gaussian",
ps.method = "glm", ps.control = list(),
out.method = "glm", out.control = list())
summary(pato1_sw)
# Retrospective design with augmentation using the Weighted Regression Estimator(WET) estimator
pato2_sw <- PSweight_SW(ps.formula = ps.formula, yname = "Y",
data = psdata_bin_retrospective_fp, weight = "overlap",
survey.indicator = TRUE, survey.design = "Retrospective",
svywtname = "survey_weight",
delta = 0.1, augmentation = TRUE, augmentation.type = "WET",
bootstrap = FALSE, R = 50,
out.formula = out.formula, out.estimate = NULL, family = "gaussian",
ps.method = "glm", ps.control = list(),
out.method = "glm", out.control = list())
summary(pato2_sw)
Estimate average causal effects by propensity score weighting for a binary treatment with clustering.
Description
The function PSweight_cl
is used to estimate the average potential outcomes corresponding to
each treatment group among the target population with two-level data. The function currently implements
the following types of weights: the inverse probability of treatment weights (IPW: target population is the combined population),
average treatment effect among the treated weights (treated: target population is the population receiving a specified treatment),
overlap weights (overlap: target population is the overlap population at clinical equipoise), matching weights (matching: target population
is population obtained under 1:1 matching), entropy weights (entropy: target population is the population weighted by the entropy function).
Augmented propensity score weighting estimators are also allowed, with propensity scores and outcome model estimated
within the function through mixed effect model.
Usage
PSweight_cl(
ps.formula = NULL,
trtgrp = NULL,
yname,
data,
weight = "overlap",
delta = 0,
augmentation = FALSE,
bootstrap = FALSE,
bs_level = NULL,
R = 50,
out.formula = NULL,
family = "gaussian",
nAGQ = 1L
)
Arguments
ps.formula |
an object of class |
trtgrp |
an optional character defining the "treated" population for estimating the average treatment
effect among the treated (ATT). Only necessary if |
yname |
an optional character specifying the name of the outcome variable in |
data |
an optional data frame containing the variables in the propensity score model
and outcome model (if augmented estimator is used). If not found in data, the variables are
taken from |
weight |
a character or vector of characters including the types of weights to be used.
|
delta |
trimming threshold for estimated (generalized) propensity scores. Should be no larger than 1 / number of treatment groups. Default is 0, corresponding to no trimming. |
augmentation |
logical. Indicate whether augmented weighting estimators should be used.
Default is |
bootstrap |
logical. Indaicate whether bootstrap is used to estimate the standard error
of the point estimates. Default is |
bs_level |
an optional character defining the cluster level (name of the variable) for each bootstrap resampling.
Default is |
R |
an optional integer indicating number of bootstrap replicates. Default is |
out.formula |
an object of class |
family |
a description of the error distribution and link function to be used in the outcome model.
Only required if |
nAGQ |
integer scalar - the number of points per axis for evaluating the adaptive Gauss-Hermite approximation to the log-likelihood. Defaults to 1, corresponding to the Laplace approximation. Please refer to lme4 package for more details. |
Details
A typical form for ps.formula
is treatment ~ terms+1|clusters
where treatment
is the treatment
variable and terms
is a series of terms
which specifies a linear predictor for treatment
and cluster level effects. Similarly, a typical form for out.formula
is
outcome ~ treatment+terms+1|cluster
where outcome
is the outcome variable (identical to the variable name
used to specify yname
); terms
is a series of terms which specifies a linear
predictor for outcome
; clusters
is the random effects term for clusters. Both ps.formula
and out.formula
by default specify generalized
linear mixed effect models.
Current version of PSweight_cl
allows for five types of propensity score weights used to estimate ATE (IPW), ATT (treated) and
ATO (overlap), ATM (matching) and ATEN (entropy). These weights are members of larger class of balancing weights defined in Li, Morgan, and Zaslavsky (2018).
Specific definitions of these weights are provided in Li, Morgan, and Zaslavsky (2018), Li and Greene (2013), Zhou, Matsouaka and Thomas (2020).
When there is a practical violation of the positivity assumption, delta
defines the symmetric
propensity score trimming rule following Crump et al. (2009). The overlap weights can also be considered as
a data-driven continuous trimming strategy without specifying trimming rules, see Li, Thomas and Li (2019).
Additional details on balancing weights and generalized overlap weights for multiple treatment groups are provided in
Li and Li (2019).
If augmentation = TRUE
, an augmented weighting estimator will be implemented. For binary treatments, the augmented
weighting estimator is presented in Mao, Li and Greene (2018). When
weight = "IPW"
, the augmented estimator is also referred to as a doubly-robust (DR) estimator.
When bootstrap = TRUE
, the variance will be calculated by nonparametric bootstrap, with R
bootstrap
replications. bs_level
needs to be specified as the variable name for the cluster in order to conduct cluster
level resampling and maintaining the cluster level coorelation. The default value NULL
treat each observation independently.
The default of R
is 50. Otherwise, the variance will be calculated using the sandwich variance
formula obtained in the M-estimation framework.
Value
PSweight_cl returns a PSweight
object containing a list of the following values:
estimated propensity scores, average potential outcomes corresponding to each treatment,
variance-covariance matrix of the point estimates, the label for each treatment group,
and estimates in each bootstrap replicate if bootstrap = TRUE
.
A summary of PSweight_cl can be obtained with summary.PSweight
.
trtgrp
a character indicating the treatment group.
propensity
a data frame of estimated propensity scores.
muhat
average potential outcomes by treatment groups, with reference to specific target populations.
covmu
variance-covariance matrix of
muhat
.muboot
an optional list of point estimates in each bootstrap replicate
bootstrap = TRUE
.group
a table of treatment group labels corresponding to the output point estimates
muhat
.
References
Li, F., Zaslavsky, A. M., Landrum, M. B. (2013). Propensity score weighting with multilevel data. Statistics in Medicine, 32(19), 3373-3387.
Fuentes, A., Lüdtke, O., Robitzsch, A. (2021). Causal inference with multilevel data: A comparison of different propensit score weighting appropaches. Multivariate Behavioral Research, 1-24.
Crump, R. K., Hotz, V. J., Imbens, G. W., Mitnik, O. A. (2009). Dealing with limited overlap in estimation of average treatment effects. Biometrika, 96(1), 187-199.
Li, L., Greene, T. (2013). A weighting analogue to pair matching in propensity score analysis. The International Journal of Biostatistics, 9(2), 215-234.
Li, F., Morgan, K. L., Zaslavsky, A. M. (2018). Balancing covariates via propensity score weighting. Journal of the American Statistical Association, 113(521), 390-400.
Mao, H., Li, L., Greene, T. (2019). Propensity score weighting analysis and treatment effect discovery. Statistical Methods in Medical Research, 28(8), 2439-2454.
Li, F., Thomas, L. E., Li, F. (2019). Addressing extreme propensity scores via the overlap weights. American Journal of Epidemiology, 188(1), 250-257.
Yoshida, K., Solomon, D.H., Haneuse, S., Kim, S.C., Patorno, E., Tedeschi, S.K., Lyu, H., Franklin, J.M., Stürmer, T., Hernández-Díaz, S. and Glynn, R.J. (2019). Multinomial extension of propensity score trimming methods: A simulation study. American Journal of Epidemiology, 188(3), 609-616.
Li, F., Li, F. (2019). Propensity score weighting for causal inference with multiple treatments. The Annals of Applied Statistics, 13(4), 2389-2415.
Zhou, Y., Matsouaka, R. A., Thomas, L. (2020). Propensity score weighting under limited overlap and model misspecification. Statistical Methods in Medical Research 29(12), 3721-3756.
Examples
#data("psdata_cl")
#ps.formula<-trt~cov1+cov2+cov3+cov4+cov5+cov6+(1|clt)
#ato_cl<-PSweight(ps.formula = ps.formula,yname = 'Y',data = psdata_cl)
#summary(ato_cl)
Calculate summary statistics for propensity score weighting
Description
SumStat
is used to generate distributional plots of the estimated propensity scores and balance
diagnostics after propensity score weighting.
Usage
SumStat(
ps.formula = NULL,
ps.estimate = NULL,
trtgrp = NULL,
Z = NULL,
covM = NULL,
zname = NULL,
xname = NULL,
data = NULL,
weight = "overlap",
delta = 0,
method = "glm",
ps.control = list()
)
Arguments
ps.formula |
an object of class |
ps.estimate |
an optional matrix or data frame containing estimated (generalized) propensity scores for each observation. Typically, this is an N by J matrix, where N is the number of observations and J is the total number of treatment levels. Preferably, the column names of this matrix should match the names of treatment level, if column names are missing or there is a mismatch, the column names would be assigned according to the alphabatic order of treatment levels. A vector of propensity score estimates is also allowed in |
trtgrp |
an optional character defining the "treated" population for estimating the average treatment effect among the treated (ATT). Only necessary if |
Z |
an optional vector specifying the values of treatment, only necessary when the covariate matrix |
covM |
an optional covariate matrix or data frame including covariates, their interactions and higher-order terms. When the covariate matrix |
zname |
an optional character specifying the name of the treatment variable in |
xname |
an optional vector of characters including the names of covariates in |
data |
an optional data frame containing the variables in the propensity score model. If not found in data, the variables are taken from |
weight |
a character or vector of characters including the types of weights to be used. |
delta |
trimming threshold for estimated (generalized) propensity scores. Should be no larger than 1 / number of treatment groups. Default is 0, corresponding to no trimming. |
method |
a character to specify the method for estimating propensity scores. |
ps.control |
a list to specify additional options when |
Details
A typical form for ps.formula
is treatment ~ terms
where treatment
is the treatment
variable (identical to the variable name used to specify zname
) and terms
is a series of terms
which specifies a linear predictor for treatment
. ps.formula
specifies logistic or multinomial logistic
models for estimating the propensity scores, when ps.estimate
is NULL
.
When comparing two treatments, ps.estimate
can either be a vector or a two-column matrix of estimated
propensity scores. If a vector is supplied, it is assumed to be the propensity scores to receive the treatment, and
the treatment group corresponds to the last group in the alphebatic order, unless otherwise specified by trtgrp
.
When comparing multiple (J>=3) treatments, ps.estimate
needs to be specified as an N by J matrix,
where N indicates the number of observations, and J indicates the total number of treatments.
This matrix specifies the estimated generalized propensity scores to receive each of the J treatments.
In general, ps.estimate
should have column names that indicate the level of the treatment variable,
which should match the levels given in Z
.
If column names are empty or there is a mismatch, the column names will be created following
the alphebatic order of treatmentlevels. The rightmost coulmn of ps.estimate
is then assumed
to be the treatment group when estimating ATT ("treated"
). trtgrp
can also be used to specify the treatment
group for estimating ATT.
To generate balance statistics, one can directly specify Z
and covM
to indicate the treatment levels and
covariate matrix. Alternatively, one can supply data
, zname
, and xname
to indicate the
same information. When both are specified, the function will prioritize inputs from Z
and covM
.
When ps.estimate
is not NULL
, argument zname
.
Current version of PSweight
allows for five types of propensity score weights used to estimate ATE ("IPW"
), ATT ("treated"
), and
ATO("overlap"
), ATM ("matching"
) and ATEN ("entropy"
). These weights are members of a larger class of balancing weights defined in Li, Morgan, and Zaslavsky (2018).
When there is a practical violation of the positivity assumption, delta
defines the symmetric
propensity score trimming rule following Crump et al. (2009). With multiple treatments, delta
defines the
multinomial trimming rule introduced in Yoshida et al. (2019). The overlap weights can also be considered as
a data-driven continuous trimming strategy without specifying trimming rules, see Li, Thomas and Li (2019).
Additional details on balancing weights and generalized overlap weights for multiple treatment groups are provided in
Li and Li (2019). For details about matching weights and entropy weights, please refer to Li and Greene (2013) and Zhou, Matsouaka and Thomas (2020).
"glm"
is the default method for propensity score estimation. Logistic regression will be used for binary outcomes,
and multinomial logistic regression will be used for outcomes with more than two categories. The alternative method option of "gbm"
serves as an API to call the gbm()
function from the
gbm
package. Additional argument in the gbm()
function can be supplied through the ps.control=list()
argument in SumStat()
. Please refer to the user manual of the gbm package for all the
allowed arguments. Currently, models for binary or multinomial treatment will be automatically chosen based on the number of treatment categories.
"SuperLearner"
is also allowed in the method
argument to pass the propensity score estimation to the SuperLearner()
function in SuperLearner package.
Currently, the SuperLearner method only supports binary treatment with the default method set to "SL.glm"
. The estimation approach is default to "method.NNLS"
in the SumStat()
function.
Prediction algorithm and other tuning parameters can also be passed through ps.control=list()
to SumStat()
. Please refer to the user manual of the SuperLearner
package for all the allowed specifications.
Value
SumStat returns a SumStat
object including a list of the following value:
treatment group, propensity scores, fitted propensity model, propensity score weights, effective sample sizes,
and balance statistics. A summary of SumStat
can be obtained with summary.SumStat
.
trtgrp
a character indicating the treatment group.
propensity
a data frame of estimated propensity scores.
ps.fitObjects
the fitted propensity model details
ps.weights
a data frame of propensity score weights.
ess
a table of effective sample sizes. This serves as a conservative measure to characterize the variance inflation or precision loss due to weighting, see Li and Li (2019).
unweighted.sumstat
A list of tables including covariate means and variances by treatment group and standardized mean differences.
ATE.sumstat
If
"IPW"
is included inweight
, this is a list of summary statistics using inverse probability weighting.ATT.sumstat
If
"treated"
is included inweight
, this is a list of summary statistics using the ATT weights.ATO.sumstat
If
"overlap"
is included inweight
, this is a list of summary statistics using the overlap weights.ATM.sumstat
If
"matching"
is included inweight
, this is a list of summary statistics using the matching weights.ATEN.sumstat
If
"entropy"
is included inweight
, this is a list of summary statistics using the entropy weights.trim
If
delta > 0
, this is a table summarizing the number of observations before and after trimming.
References
Crump, R. K., Hotz, V. J., Imbens, G. W., Mitnik, O. A. (2009). Dealing with limited overlap in estimation of average treatment effects. Biometrika, 96(1), 187-199.
Greenwell B., Boehmke B.,Cunningham J, GBM Developers (2020) gbm: Generalized Boosted Regression Models. Cran: https://cran.r-project.org/web/packages/gbm/index.html
Li, L., Greene, T. (2013). A weighting analogue to pair matching in propensity score analysis. The International Journal of Biostatistics, 9(2), 215-234.
Li, F., Morgan, K. L., Zaslavsky, A. M. (2018). Balancing covariates via propensity score weighting. Journal of the American Statistical Association, 113(521), 390-400.
Li, F., Thomas, L. E., Li, F. (2019). Addressing extreme propensity scores via the overlap weights. American Journal of Epidemiology, 188(1), 250-257.
Polley E., LeDell E., Kennedy C., Lendle S., van der Laan M. (2019) SuperLearner: Super Learner Prediction. Cran: https://cran.r-project.org/web/packages/SuperLearner/index.html
Yoshida, K., Solomon, D.H., Haneuse, S., Kim, S.C., Patorno, E., Tedeschi, S.K., Lyu, H., Franklin, J.M., Stürmer, T., Hernández-Díaz, S. and Glynn, R.J. (2019). Multinomial extension of propensity score trimming methods: A simulation study. American Journal of Epidemiology, 188(3), 609-616.
Li, F., Li, F. (2019). Propensity score weighting for causal inference with multiple treatments. The Annals of Applied Statistics, 13(4), 2389-2415.
Zhou, Y., Matsouaka, R. A., Thomas, L. (2020). Propensity score weighting under limited overlap and model misspecification. Statistical Methods in Medical Research (Online)
Examples
data("psdata")
# the propensity model
ps.formula<-trt~cov1+cov2+cov3+cov4+cov5+cov6
# using SumStat to estimate propensity scores
msstat <- SumStat(ps.formula, trtgrp="2", data=psdata,
weight=c("IPW","overlap","treated","entropy","matching"))
#summary(msstat)
# importing user-supplied propensity scores "e.h"
# fit <- nnet::multinom(formula=ps.formula, data=psdata, maxit=500, trace=FALSE)
# e.h <- fit$fitted.values
# varname <- c("cov1","cov2","cov3","cov4","cov5","cov6")
# msstat0 <- SumStat(zname="trt", xname=varname, data=psdata, ps.estimate=e.h,
# trtgrp="2", weight=c("IPW","overlap","treated","entropy","matching"))
# summary(msstat0)
Calculate summary statistics for propensity score weighting
Description
SumStat
is used to generate distributional plots of the estimated propensity scores and balance
diagnostics after propensity score weighting.
Usage
SumStat_SW(
ps.formula = NULL,
ps.estimate = NULL,
trtgrp = NULL,
Z = NULL,
covM = NULL,
zname = NULL,
xname = NULL,
data = NULL,
weight = "overlap",
survey.indicator = FALSE,
survey.design = "Independent",
svywtname = NULL,
delta = 0,
method = "glm",
ps.control = list()
)
Arguments
ps.formula |
an object of class |
ps.estimate |
an optional matrix or data frame containing estimated (generalized) propensity scores for each observation. Typically, this is an N by J matrix, where N is the number of observations and J is the total number of treatment levels. Preferably, the column names of this matrix should match the names of treatment level, if column names are missing or there is a mismatch, the column names would be assigned according to the alphabatic order of treatment levels. A vector of propensity score estimates is also allowed in |
trtgrp |
an optional character defining the "treated" population for estimating the average treatment effect among the treated (ATT). Only necessary if |
Z |
an optional vector specifying the values of treatment, only necessary when the covariate matrix |
covM |
an optional covariate matrix or data frame including covariates, their interactions and higher-order terms. When the covariate matrix |
zname |
an optional character specifying the name of the treatment variable in |
xname |
an optional vector of characters including the names of covariates in |
data |
an optional data frame containing the variables in the propensity score model. If not found in data, the variables are taken from |
weight |
a character or vector of characters including the types of weights to be used. |
survey.indicator |
logical. Indicates whether survey weights are used in the estimation.
Default is |
survey.design |
character. Specifies the survey design scenario for estimation. Acceptable values are "Retrospective", "Independent", and "Prospective". "Retrospective" indicates that the sampling process depends on both treatment assignment and covariates, "Independent" (the default) means that the sampling process is independent of treatment assignment, and "Prospective" signifies that sampling is conducted prior to treatment assignment, although treatment may later be influenced by the sampling process. |
svywtname |
an optional character specifying the name of the survey weight variable in |
delta |
trimming threshold for estimated (generalized) propensity scores. Should be no larger than 1 / number of treatment groups. Default is 0, corresponding to no trimming. |
method |
a character to specify the method for estimating propensity scores. |
ps.control |
a list to specify additional options when |
Details
A typical form for ps.formula
is treatment ~ terms
where treatment
is the treatment
variable (identical to the variable name used to specify zname
) and terms
is a series of terms
which specifies a linear predictor for treatment
. ps.formula
specifies logistic or multinomial logistic
models for estimating the propensity scores, when ps.estimate
is NULL
.
The function now includes support for the survey setting, mainly focusing on binary treatments by incorporating survey weights into propensity score estimation and both point and augmented estimators for the outcome stage. In survey settings,
external propensity score estimates are not supported; both population-level and sample-level propensity scores are estimated using the internal routines.
When comparing two treatments, ps.estimate
can either be a vector or a two-column matrix of estimated
propensity scores. If a vector is supplied, it is assumed to be the propensity scores to receive the treatment, and
the treatment group corresponds to the last group in the alphebatic order, unless otherwise specified by trtgrp
.
When comparing multiple (J>=3) treatments, ps.estimate
needs to be specified as an N by J matrix,
where N indicates the number of observations, and J indicates the total number of treatments.
This matrix specifies the estimated generalized propensity scores to receive each of the J treatments.
In general, ps.estimate
should have column names that indicate the level of the treatment variable,
which should match the levels given in Z
.
If column names are empty or there is a mismatch, the column names will be created following
the alphebatic order of treatmentlevels. The rightmost coulmn of ps.estimate
is then assumed
to be the treatment group when estimating ATT ("treated"
). trtgrp
can also be used to specify the treatment
group for estimating ATT.
To generate balance statistics, one can directly specify Z
and covM
to indicate the treatment levels and
covariate matrix. Alternatively, one can supply data
, zname
, and xname
to indicate the
same information. When both are specified, the function will prioritize inputs from Z
and covM
.
When ps.estimate
is not NULL
, argument zname
.
In survey settings, when survey.indicator
is TRUE
, the argument svywtname
(which specifies the survey weight variable in data
) is required;
if svywtname
is not provided, a default survey weight of 1 is applied to all observations. The argument
survey.design
must be specified to reflect the sampling mechanism: for example, "Retrospective"
indicates
that the sampling process depends on both treatment assignment and covariates, "Independent"
assumes that sampling
is independent of treatment assignment, and "Prospective"
signifies that sampling is conducted prior to treatment assignment,
although treatment may later be influenced by the sampling results.
Current version of PSweight
allows for five types of propensity score weights used to estimate ATE ("IPW"
), ATT ("treated"
), and
ATO("overlap"
), ATM ("matching"
) and ATEN ("entropy"
). These weights are members of a larger class of balancing weights defined in Li, Morgan, and Zaslavsky (2018).
When there is a practical violation of the positivity assumption, delta
defines the symmetric
propensity score trimming rule following Crump et al. (2009). With multiple treatments, delta
defines the
multinomial trimming rule introduced in Yoshida et al. (2019). The overlap weights can also be considered as
a data-driven continuous trimming strategy without specifying trimming rules, see Li, Thomas and Li (2019).
Additional details on balancing weights and generalized overlap weights for multiple treatment groups are provided in
Li and Li (2019). For details about matching weights and entropy weights, please refer to Li and Greene (2013) and Zhou, Matsouaka and Thomas (2020).
Zeng, Li, and Tong (2025) further specify how the survey weights can be incorporated into propensity score weighting under both retrospective and prospective scenarios.
Their approach supports both a weighting-only estimator and all three augmented estimators(MOM, CVR and WET), with corresponding sandwich variance estimators developed.
These enhancements are implemented in the current version of PSweight
.
"glm"
is the default method for propensity score estimation. Logistic regression will be used for binary outcomes,
and multinomial logistic regression will be used for outcomes with more than two categories. The alternative method option of "gbm"
serves as an API to call the gbm()
function from the
gbm
package. Additional argument in the gbm()
function can be supplied through the ps.control=list()
argument in SumStat()
. Please refer to the user manual of the gbm package for all the
allowed arguments. Currently, models for binary or multinomial treatment will be automatically chosen based on the number of treatment categories.
"SuperLearner"
is also allowed in the method
argument to pass the propensity score estimation to the SuperLearner()
function in SuperLearner package.
Currently, the SuperLearner method only supports binary treatment with the default method set to "SL.glm"
. The estimation approach is default to "method.NNLS"
in the SumStat()
function.
Prediction algorithm and other tuning parameters can also be passed through ps.control=list()
to SumStat()
. Please refer to the user manual of the SuperLearner
package for all the allowed specifications.
Value
SumStat returns a SumStat
object including a list of the following value:
treatment group, propensity scores, fitted propensity model, propensity score weights, effective sample sizes,
and balance statistics. A summary of SumStat
can be obtained with summary.SumStat
.
trtgrp
a character indicating the treatment group.
propensity
a data frame of estimated population level propensity scores. When
survey.indicator = TRUE
, it is population level propensity score estimated by survey-weighted regression.ps.population.fitObjects
the fitted population level propensity model details
propensity.sample
a data frame of estimated sample level propensity scores.
ps.sample.fitObjects
the fitted sample level propensity model details
ps.weights
a data frame of propensity score weights.
ess
a table of effective sample sizes. This serves as a conservative measure to characterize the variance inflation or precision loss due to weighting, see Li and Li (2019).
unweighted.sumstat
A list of tables including covariate means and variances by treatment group and standardized mean differences.
ATE.sumstat
If
"IPW"
is included inweight
, this is a list of summary statistics using inverse probability weighting.ATT.sumstat
If
"treated"
is included inweight
, this is a list of summary statistics using the ATT weights.ATO.sumstat
If
"overlap"
is included inweight
, this is a list of summary statistics using the overlap weights.ATM.sumstat
If
"matching"
is included inweight
, this is a list of summary statistics using the matching weights.ATEN.sumstat
If
"entropy"
is included inweight
, this is a list of summary statistics using the entropy weights.trim
If
delta > 0
, this is a table summarizing the number of observations before and after trimming.
References
Crump, R. K., Hotz, V. J., Imbens, G. W., Mitnik, O. A. (2009). Dealing with limited overlap in estimation of average treatment effects. Biometrika, 96(1), 187-199.
Greenwell B., Boehmke B.,Cunningham J, GBM Developers (2020) gbm: Generalized Boosted Regression Models. Cran: https://cran.r-project.org/web/packages/gbm/index.html
Li, L., Greene, T. (2013). A weighting analogue to pair matching in propensity score analysis. The International Journal of Biostatistics, 9(2), 215-234.
Li, F., Morgan, K. L., Zaslavsky, A. M. (2018). Balancing covariates via propensity score weighting. Journal of the American Statistical Association, 113(521), 390-400.
Li, F., Thomas, L. E., Li, F. (2019). Addressing extreme propensity scores via the overlap weights. American Journal of Epidemiology, 188(1), 250-257.
Polley E., LeDell E., Kennedy C., Lendle S., van der Laan M. (2019) SuperLearner: Super Learner Prediction. Cran: https://cran.r-project.org/web/packages/SuperLearner/index.html
Yoshida, K., Solomon, D.H., Haneuse, S., Kim, S.C., Patorno, E., Tedeschi, S.K., Lyu, H., Franklin, J.M., Stürmer, T., Hernández-Díaz, S. and Glynn, R.J. (2019). Multinomial extension of propensity score trimming methods: A simulation study. American Journal of Epidemiology, 188(3), 609-616.
Li, F., Li, F. (2019). Propensity score weighting for causal inference with multiple treatments. The Annals of Applied Statistics, 13(4), 2389-2415.
Zhou, Y., Matsouaka, R. A., Thomas, L. (2020). Propensity score weighting under limited overlap and model misspecification. Statistical Methods in Medical Research (Online)
Zeng, Y., Li, F., & Tong, G. (2025). Moving toward best practice when using propensity score weighting in survey observational studies. arXiv preprint arXiv:2501.16156.
Examples
data("psdata")
data("psdata_bin_prospective_fp")
data("psdata_bin_retrospective_fp")
# Define the common propensity score formula
ps.formula <- trt ~ cov1 + cov2 + cov3 + cov4 + cov5 + cov6
# The weight choices we want to test:
myWeights <- c("IPW", "overlap", "treated", "entropy", "matching", "svywt")
## Example 1: Prospective design, survey indicator TRUE, using all weight types
sumstat_pros <- SumStat_SW(ps.formula = ps.formula, trtgrp = "2",
data = psdata_bin_prospective_fp,
weight = myWeights, delta = 0.1, method = "glm", ps.control = list(),
survey.indicator = TRUE, svywtname = "survey_weight",
survey.design = "Prospective")
summary(sumstat_pros)
print(sumstat_pros$ps.weights)
print(sumstat_pros$ess)
print(sumstat_pros$unweighted.sumstat)
## Example 2: Retrospective design, survey indicator TRUE, using all weight types
sumstat_retro <- SumStat_SW(ps.formula = ps.formula, trtgrp = "2",
data = psdata_bin_retrospective_fp,
weight = myWeights, delta = 0.1, method = "glm", ps.control = list(),
survey.indicator = TRUE, svywtname = "survey_weight",
survey.design = "Retrospective")
summary(sumstat_retro)
print(sumstat_retro$ps.weights)
print(sumstat_retro$ess)
print(sumstat_retro$unweighted.sumstat)
Calculate summary statistics for propensity score weighting with clustering (for binary treatment only)
Description
SumStat_cl
is used to generate distributional plots of the estimated propensity scores and balance
diagnostics after propensity score weighting with two-level data.
Usage
SumStat_cl(
ps.formula = NULL,
trtgrp = NULL,
data = NULL,
weight = "overlap",
delta = 0,
nAGQ = 1L
)
Arguments
ps.formula |
an object of class |
trtgrp |
an optional character defining the "treated" population for estimating the average treatment effect among the treated (ATT). Only necessary if |
data |
an data frame containing the variables in the propensity score model. If not found in data, the variables are taken from |
weight |
a character or vector of characters including the types of weights to be used. |
delta |
trimming threshold for estimated (generalized) propensity scores. Should be no larger than 1 / number of treatment groups. Default is 0, corresponding to no trimming. |
nAGQ |
integer scalar - the number of points per axis for evaluating the adaptive Gauss-Hermite approximation to the log-likelihood. Defaults to 1, corresponding to the Laplace approximation. Please refer to lme4 package for more details. |
Details
A typical form for ps.formula
is treatment ~ terms+1|clusters
where treatment
is the treatment
variable, terms
is a series of terms
which specifies a linear predictor for treatment
, and clusters
is the cluster indicator. The current version supports two-level models and the random-effects term is required to be the last piece in the formula. ps.formula
specifies a mixed-effects logistic regression
model for estimating propensity scores. The treatment group corresponds to the last group in the alphebatic order, unless otherwise specified by trtgrp
.
Current version of PSweight
allows for five types of propensity score weights used to estimate ATE ("IPW"
), ATT ("treated"
), and
ATO("overlap"
), ATM ("matching"
) and ATEN ("entropy"
). These weights are members of a larger class of balancing weights defined in Li, Morgan, and Zaslavsky (2018).
When there is a practical violation of the positivity assumption, delta
defines the symmetric
propensity score trimming rule following Crump et al. (2009). With multiple treatments, delta
defines the
multinomial trimming rule introduced in Yoshida et al. (2019). The overlap weights can also be considered as
a data-driven continuous trimming strategy without specifying trimming rules, see Li, Thomas and Li (2019).
Additional details on balancing weights and generalized overlap weights for multiple treatment groups are provided in
Li and Li (2019). For details about matching weights and entropy weights, please refer to Li and Greene (2013) and Zhou, Matsouaka and Thomas (2020).
Value
SumStat_cl returns a SumStat
object including a list of the following value:
treatment group, propensity scores, fitted propensity model, propensity score weights, effective sample sizes,
and balance statistics. A summary of SumStat
can be obtained with summary.SumStat
.
trtgrp
a character indicating the treatment group.
propensity
a data frame of estimated propensity scores.
ps.fitObjects
the fitted propensity model details
ps.weights
a data frame of propensity score weights.
ess
a table of effective sample sizes. This serves as a conservative measure to characterize the variance inflation or precision loss due to weighting, see Li and Li (2019).
unweighted.sumstat
A list of tables including covariate means and variances by treatment group and standardized mean differences.
ATE.sumstat
If
"IPW"
is included inweight
, this is a list of summary statistics using inverse probability weighting.ATT.sumstat
If
"treated"
is included inweight
, this is a list of summary statistics using the ATT weights.ATO.sumstat
If
"overlap"
is included inweight
, this is a list of summary statistics using the overlap weights.ATM.sumstat
If
"matching"
is included inweight
, this is a list of summary statistics using the matching weights.ATEN.sumstat
If
"entropy"
is included inweight
, this is a list of summary statistics using the entropy weights.trim
If
delta > 0
, this is a table summarizing the number of observations before and after trimming.
References
Crump, R. K., Hotz, V. J., Imbens, G. W., Mitnik, O. A. (2009). Dealing with limited overlap in estimation of average treatment effects. Biometrika, 96(1), 187-199.
Li, L., Greene, T. (2013). A weighting analogue to pair matching in propensity score analysis. The International Journal of Biostatistics, 9(2), 215-234.
Li, F., Morgan, K. L., Zaslavsky, A. M. (2018). Balancing covariates via propensity score weighting. Journal of the American Statistical Association, 113(521), 390-400.
Li, F., Thomas, L. E., Li, F. (2019). Addressing extreme propensity scores via the overlap weights. American Journal of Epidemiology, 188(1), 250-257.
Li, F., Li, F. (2019). Propensity score weighting for causal inference with multiple treatments. The Annals of Applied Statistics, 13(4), 2389-2415.
Zhou, Y., Matsouaka, R. A., Thomas, L. (2020). Propensity score weighting under limited overlap and model misspecification. Statistical Methods in Medical Research 29(12), 3721-3756.
Li, F., Zaslavsky, A. M., & Landrum, M. B. (2013). Propensity score weighting with multilevel data. Statistics in Medicine, 32(19), 3373-3387.
Examples
data("psdata_cl")
# the propensity model
ps.formula<-trt~cov1+cov2+cov3+cov4+cov5+cov6+(1|clt)
# using SumStat to estimate propensity scores
msstat <- SumStat_cl(ps.formula, trtgrp="1", data=psdata_cl,
weight=c("IPW","overlap","treated","entropy","matching"))
summary(msstat)
Point estimates of PSweight
Description
The coef
method for class "PSweight"
Usage
## S3 method for class 'PSweight'
coef(object, ...)
Arguments
object |
an object for which the extraction of model coefficients is meaningful. |
... |
other arguments. |
Value
The output from coef
Plot the distribution of propensity scores and balance statistics
Description
Summarize the SumStat object, generate histogram or density of estimated propensity scores and plot the balance statistics under weighting versus no weighting.
Usage
## S3 method for class 'SumStat'
plot(
x,
type = "balance",
weighted.var = TRUE,
threshold = 0.1,
metric = "ASD",
breaks = 50,
...
)
Arguments
x |
a |
type |
a character indicating the type of plot to produce, including histogram of estimated propensity scores ( |
weighted.var |
logical. Indicating whether weighted variance should be used in calculating the balance statistics. Default is |
threshold |
an optional numeric value indicating the balance threshold for the balance plot. Default is 0.1. Only valid when |
metric |
a character indicating the type of metric used in balance plot. Only |
breaks |
a single number giving the number of cells for the histogram. Default is 50. |
... |
further arguments passed to or from other methods. |
Details
For the balance plot, a vertical line at threshold
is used to define balance on covariates.
The default value is threshold = 0.1
following Austin and Stuart (2015). If more than 2 treatments
are considered, only density of the estimated generalized propensity scores will be produced, regardless of
whether type = "density"
or type = "hist"
.
Value
Plot of the indicated type.
References
Austin, P.C. and Stuart, E.A. (2015). Moving towards best practice when using inverse probability of treatment weighting (IPTW) using the propensity score to estimate causal treatment effects in observational studies. Statistics in Medicine, 34(28), 3661-3679.
Examples
data("psdata")
ps.formula<-trt~cov1+cov2+cov3+cov4+cov5+cov6
msstat <- SumStat(ps.formula, trtgrp="2", data=subset(psdata,trt>1),
weight=c("IPW","overlap","treated","entropy","matching"))
plot(msstat, type="hist")
plot(msstat, type="balance", weighted.var=TRUE, threshold=0.1, metric="ASD")
Plot the distribution of propensity scores and balance statistics
Description
Summarize the SumStat object, generate histogram or density of estimated propensity scores and plot the balance statistics under weighting versus no weighting.
Usage
## S3 method for class 'SumStat_SW'
plot(
x,
type = "balance",
weighted.var = TRUE,
threshold = 0.1,
metric = "ASD",
breaks = 50,
...
)
Arguments
x |
a |
type |
a character indicating the type of plot to produce, including histogram of estimated propensity scores ( |
weighted.var |
logical. Indicating whether weighted variance should be used in calculating the balance statistics. Default is |
threshold |
an optional numeric value indicating the balance threshold for the balance plot. Default is 0.1. Only valid when |
metric |
a character indicating the type of metric used in balance plot. Only |
breaks |
a single number giving the number of cells for the histogram. Default is 50. |
... |
further arguments passed to or from other methods. |
Details
For the balance plot, a vertical line at threshold
is used to define balance on covariates.
The default value is threshold = 0.1
following Austin and Stuart (2015). If more than 2 treatments
are considered, only density of the estimated generalized propensity scores will be produced, regardless of
whether type = "density"
or type = "hist"
.
Value
Plot of the indicated type.
References
Austin, P.C. and Stuart, E.A. (2015). Moving towards best practice when using inverse probability of treatment weighting (IPTW) using the propensity score to estimate causal treatment effects in observational studies. Statistics in Medicine, 34(28), 3661-3679.
Examples
data("psdata")
data("psdata_bin_prospective_fp")
data("psdata_bin_retrospective_fp")
ps.formula<-trt~cov1+cov2+cov3+cov4+cov5+cov6
msstat <- SumStat_SW(ps.formula = ps.formula, trtgrp = "2",
data = psdata_bin_prospective_fp,
weight=c("IPW","overlap","treated","entropy","matching"),
survey.indicator = TRUE, svywtname = "survey_weight",
survey.design = "Prospective")
plot(msstat, type="hist")
plot(msstat, type="balance", weighted.var=TRUE, threshold=0.1, metric="ASD")
Print the results of PStrim
Description
The print
method for class "PStrim"
Usage
## S3 method for class 'PStrim'
print(x, ...)
Arguments
x |
an object used to select a method. |
... |
further arguments passed to or from other methods. |
Value
The output from print
Print the results of PSweight
Description
The print
method for class "PSweight"
Usage
## S3 method for class 'PSweight'
print(x, ...)
Arguments
x |
an object used to select a method. |
... |
further arguments passed to or from other methods. |
Value
The output from print
Print the results of Summary.PSweight
Description
The print
method for class "PSweightsum"
Usage
## S3 method for class 'PSweightsum'
print(x, ...)
Arguments
x |
an object used to select a method. |
... |
further arguments passed to or from other methods. |
Value
The output from print
Print the results of SumStat
Description
The print
method for class "SumStat"
Usage
## S3 method for class 'SumStat'
print(x, ...)
Arguments
x |
an object used to select a method. |
... |
further arguments passed to or from other methods. |
Value
The output from print
Print the results of Summary.SumStat
Description
The print
method for class "SumSumStat"
Usage
## S3 method for class 'SumSumStat'
print(x, ...)
Arguments
x |
an object used to select a method. |
... |
further arguments passed to or from other methods. |
Value
The output from print
Simulated dataset for PSweight
Description
This is a simulated observational study with three treatment groups to illustated the ulity of PSweight.
Usage
data(psdata)
Format
A data frame with 1500 rows and 8 columns.
Details
The simulated dataset includes 1500 rows, with each row represents information recorded from each individual. There are 8 variables (columns). The treatment is the variable trt, which has three treatment arms. The outcome of interest is variable Y. cov1-cov6 are pre-treatment covariates among which cov1-cov5 are continous, and cov6 is binary.
Examples
data("psdata")
Prospective Factual Sample Dataset for PSweight Analysis
Description
A simulated dataset representing the observed (factual) sample drawn from the prospective
superpopulation. This dataset is obtained by selecting only those observations with
sample_indicator == 1
, where sampling depends solely on covariates.
Usage
data(psdata_bin_prospective_fp)
Format
A data frame with the subset of rows from psdata_bin_prospective_sp
(e.g. 600 rows)
and the same 10 variables as psdata_bin_prospective_sp
.
Details
This dataset is derived from psdata_bin_prospective_sp
by retaining only
observations that were selected (i.e. those with sample_indicator == 1
). It represents
the observed sample used for analysis under a prospective design.
Examples
data(psdata_bin_prospective_fp)
head(psdata_bin_prospective_fp)
Prospective Superpopulation Dataset for PSweight Analysis
Description
A simulated dataset representing the prospective superpopulation (i.e. the full dataset) for propensity score weighting analysis under a prospective design. In this design, the sampling probability is determined solely by covariates.
Usage
data(psdata_bin_prospective_sp)
Format
A data frame with 1500 rows and 10 variables:
- Y
Outcome variable.
- trt
Binary treatment indicator (coded as 1 and 2).
- cov1, cov2, cov3, cov4, cov5, cov6
Pre-treatment covariates.
- sampling_prob
Sampling probability computed solely from covariates (e.g., using a logistic function).
- survey_weight
Survey weight computed as the inverse of
sampling_prob
.- sample_indicator
Indicator variable (0/1) denoting whether an observation was selected into the sample.
Details
In the prospective design, sampling is conducted before treatment assignment, and the probability of an observation being selected depends only on its covariate values.
Examples
data(psdata_bin_prospective_sp)
head(psdata_bin_prospective_sp)
Retrospective Factual Sample Dataset for PSweight Analysis
Description
A simulated dataset representing the observed (factual) sample drawn from the retrospective
superpopulation. This dataset is obtained by selecting only those observations with
sample_indicator == 1
, where the sampling probability is a function of both covariates and treatment.
Usage
data(psdata_bin_retrospective_fp)
Format
A data frame with the subset of rows from psdata_bin_retrospective_sp
(e.g. 600 rows)
and the same 10 variables as psdata_bin_retrospective_sp
.
Details
This dataset is derived from psdata_bin_retrospective_sp
by retaining only
observations that were selected (i.e. those with sample_indicator == 1
). It represents
the observed sample used for analysis under a retrospective design.
Examples
data(psdata_bin_retrospective_fp)
head(psdata_bin_retrospective_fp)
Retrospective Superpopulation Dataset for PSweight Analysis
Description
A simulated dataset representing the retrospective superpopulation (i.e. the full dataset) for propensity score weighting analysis under a retrospective design. In this design, the sampling probability is influenced by both covariates and treatment assignment.
Usage
data(psdata_bin_retrospective_sp)
Format
A data frame with 1500 rows and 10 variables:
- Y
Outcome variable.
- trt
Binary treatment indicator (coded as 1 and 2).
- cov1, cov2, cov3, cov4, cov5, cov6
Pre-treatment covariates.
- sampling_prob
Sampling probability computed as a function of covariates and treatment (e.g., with an additional term depending on
trt
).- survey_weight
Survey weight computed as the inverse of
sampling_prob
.- sample_indicator
Indicator variable (0/1) denoting whether an observation was selected into the sample.
Details
In the retrospective design, the sampling probability is determined by both covariates and treatment assignment. This dataset represents the full superpopulation in which the sampling mechanism is retrospective.
Examples
data(psdata_bin_retrospective_sp)
head(psdata_bin_retrospective_sp)
Simulated dataset for PSweight
Description
This is a simulated observational study with three treatment groups to illustated the ulity of PSweight.
Usage
data(psdata_cl)
Format
A data frame with 1500 rows and 9 columns.
Details
The simulated dataset includes 1500 rows, with each row represents information recorded from each individual. There are 9 variables (columns). The treatment is the variable trt, which has two treatment arms. The clt is the cluster level. The outcome of interest is variable Y. cov1-cov6 are pre-treatment covariates among which cov1-cov5 are continous, and cov6 is binary.
Examples
# data("psdata_cl")
Summarize a PSweight object
Description
summary.PSweight
is used to summarize the results from PSweight
.
The output contains the average causal effects defined by specific contrasts, as well as their
standard error estimates.
Usage
## S3 method for class 'PSweight'
summary(object, contrast = NULL, type = "DIF", CI = TRUE, ...)
Arguments
object |
a PSweight object obtained from the |
contrast |
a vector or matrix specifying the causal contrast of interest. The average causal effects will be defined by such contrats. For multiple treatments, the contrast parameters are explained in Li and Li (2019) for estimating general causal effects. Default is all pairwise contrasts between any two treatment groups. |
type |
a character specifying the target estimand. The most commonly seen additive estimand is specified
by |
CI |
a logical argument indicates whether confidence interval should be calculated. Default is |
... |
further arguments passed to or from other methods. |
Details
For the contrast
argument, one specifies the contrast of interest and thus defines the target estimand
for comparing treatments. For example, if there are three treatment levels: A, B, and C, the contrast A-C
(i.e., E[Y(A)] - E[Y(C)]) can be specified by c(1,0,-1)
. The contrasts of A-C and B-C can be
jointly specified by rbind(c(1,0,-1), c(0,1,-1))
.
For estimating the causal relative risk (type = "RR"
), the contrast is specified at the log scale. For example,
the contrast A-C (specified by c(1,0,-1)
) implies the estimation of log{E[Y(A)]} - log{E[Y(C)]}. For estimating the causal odds
ratio, the contrast is specified at the log odds scale. For example, the contrast A-C (specified by c(1,0,-1)
)
implies the estimation of log{E[Y(A)]/E[1-Y(A)]} - log{E[Y(C)]/E[1-Y(C)]}.
The variance of the contrasts will be estimated by the delta method (if sandwich variance is used, or
bootstrap = FALSE
), or nonparametric bootstrap (if bootstrap = TRUE
). Details will be given in
Zhou et al. (2020+).
The argument type
takes one of three options: "DIF"
, "RR"
, or "RR"
, with "DIF"
as
the default option. Typically, "RR"
is relavent for binary or count outcomes, and "OR"
is relavent
only for binary outcomes. "DIF"
applies to all types of outcomes.
Value
A list of following values:
estimates
a matrix of point estimates, standard errors, test statistics, 95 for contrasts of interest.
bootestimates
a list of data frames containing estimated contrasts in each bootstrap replicate, if bootstrap is used to estimate standard errors.
contrast
a table listing the specified contrasts of interest.
group
a table of treatment group labels corresponding to the output point estimates, provided in results obtained from
PSweight
.trtgrp
a character indicating the treatment group, or target population under ATT weights.
type
a character specifying the target estimand.
CI
a logical indaicator of whether confidence interval should be reported.
References
Li, F., Morgan, K. L., Zaslavsky, A. M. (2018). Balancing covariates via propensity score weighting. Journal of the American Statistical Association, 113(521), 390-400.
Li, F., Li, F. (2019). Propensity score weighting for causal inference with multiple treatments. The Annals of Applied Statistics, 13(4), 2389-2415.
Examples
## For examples, run: example(PSweight).
Summarize a SumStat object.
Description
summary.SumStat
is used to summarize results obtained from function
SumStat
. The output includes effective sample sizes and tables for balance statistics.
Usage
## S3 method for class 'SumStat'
summary(object, weighted.var = TRUE, metric = "ASD", ...)
Arguments
object |
a |
weighted.var |
logical. Indicate whether the propensity score weighted variance should be used in calculating the balance metrics. Default is |
metric |
a chatacter indicating the type of balance metrics. |
... |
further arguments passed to or from other methods. |
Details
For metric
, the two options "ASD"
and "PSD"
are defined in Li and Li (2019)
for the general family of balancing weights. Similar definitions are also given in McCaffrey et al. (2013)
for inverse probability weighting. weighted.var
specifies whether weighted or unweighted variance
should be used in calculating ASD or PSD. An example of weighted variance with two treatment groups is given in
Austin and Stuart (2015). For more than two treatment groups, the maximum of ASD (across all pairs of treatments)
and maximum of PSD (across all treatments) are calcualted, as explained in Li and Li (2019).
Value
A list of tables containing effective sample sizes and balance statistics on covariates for specified propensity score weighting schemes.
effective.sample.size
a table of effective sample sizes. This serves as a conservative measure to characterize the variance inflation or precision loss due to weighting, see Li and Li (2019).
unweighted
A table summarizing mean, variance by treatment groups, and standardized mean difference.
IPW
If
"IPW"
is specified, this is a data table summarizing mean, variance by treatment groups, and standardized mean difference under inverse probability of treatment weighting.treated
If
"treated"
is specified, this is a data table summarizing mean, variance by treatment groups, and standardized mean difference under the ATT weights.overlap
If
"overlap"
is specified, this is a data table summarizing mean, variance by treatment groups, and standardized mean difference under the (generalized) overlap weights.matching
If
"matching"
is specified, this is a data table summarizing mean, variance by treatment groups, and standardized mean difference under the (generalized) matching weights.entropy
If
"entropy"
is specified, this is a data table summarizing mean, variance by treatment groups, and standardized mean difference under the (generalized) entropy weights.
References
Crump, R. K., Hotz, V. J., Imbens, G. W., Mitnik, O. A. (2009). Dealing with limited overlap in estimation of average treatment effects. Biometrika, 96(1), 187-199.
Li, L., Greene, T. (2013). A weighting analogue to pair matching in propensity score analysis. The International Journal of Biostatistics, 9(2), 215-234.
Austin, P.C. and Stuart, E.A. (2015). Moving towards best practice when using inverse probability of treatment weighting (IPTW) using the propensity score to estimate causal treatment effects in observational studies. Statistics in Medicine, 34(28), 3661-3679.
Li, F., Li, F. (2019). Propensity score weighting for causal inference with multiple treatments. The Annals of Applied Statistics, 13(4), 2389-2415.
Zhou, Y., Matsouaka, R. A., Thomas, L. (2020). Propensity score weighting under limited overlap and model misspecification. Statistical Methods in Medical Research (Online)
Examples
## For examples, run: example(SumStat).
Variance of PSweight
Description
The vcov
method for class "PSweight"
Usage
## S3 method for class 'PSweight'
vcov(object, ...)
Arguments
object |
an object used to select a method. |
... |
further arguments passed to or from other methods. |
Value
The output from vcov