Title: | Factor Model Estimation Using Proxy Variables |
Version: | 1.0 |
Description: | Functions to estimate a factor model using discrete and continuous proxy variables. The function 'dproxyme' estimates a factor model of discrete proxy variables using an EM algorithm (Dempster, Laird, Rubin (1977) <doi:10.1111/j.2517-6161.1977.tb01600.x>; Hu (2008) <doi:10.1016/j.jeconom.2007.12.001>; Hu(2017) <doi:10.1016/j.jeconom.2017.06.002> ). The function 'cproxyme' estimates a linear factor model (Cunha, Heckman, and Schennach (2010) <doi:10.3982/ECTA6551>). |
License: | GPL-3 |
Encoding: | UTF-8 |
RoxygenNote: | 7.1.1 |
Imports: | dplyr, nnet, pracma, stats, utils, gtools |
Suggests: | knitr, rmarkdown |
VignetteBuilder: | knitr |
NeedsCompilation: | no |
Packaged: | 2021-06-01 16:45:31 UTC; yujung |
Author: | Yujung Hwang |
Maintainer: | Yujung Hwang <yujungghwang@gmail.com> |
Repository: | CRAN |
Date/Publication: | 2021-06-04 07:40:05 UTC |
cproxyme
Description
This function estimates a linear factor model using continuous variables. The linear factor model to estimate has the following form. proxy = intercept + factorloading * (latent variable) + measurement error The measurement error is assumed to follow a Normal distribution with a mean zero and a variance, which needs to be estimated.
Usage
cproxyme(dat, anchor = 1, weights = NULL)
Arguments
dat |
A proxy variable data frame list. |
anchor |
This is a column index of an anchoring proxy variable. Default is 1. That is, the code will use the first column in dat data frame as an achoring variable. |
weights |
An optional weight vector |
Value
Returns a list of 3 components :
- alpha0
This is a vector of intercepts in a linear factor model. The k-th entry is the intercept of k-th proxy variable factor model.
- alpha1
This is a vector of factor loadings. The k-th entry is the factor loading of k-th proxy variable. The factor loading of anchoring variable is normalized to 1.
- varnu
This is a vector of variances of measurement errors in proxy variables. The k-th entry is the variance of k-th proxy measurement error. The measurement error is assumed to follow a Normal distribution with mean 0.
- mtheta
This is a mean of the latent variable. It is equal to the mean of the anchoring proxy variable.
- vartheta
This is a variance of the latent variable.
Author(s)
Yujung Hwang, yujungghwang@gmail.com
References
- Cunha, F., Heckman, J. J., & Schennach, S. M. (2010)
Estimating the technology of cognitive and noncognitive skill formation. Econometrica, 78(3), 883-931. doi: 10.3982/ECTA6551
- Hwang, Yujung (2021)
Bounding Omitted Variable Bias Using Auxiliary Data. Working Paper.
Examples
dat1 <- data.frame(proxy1=c(1,2,3),proxy2=c(0.1,0.3,0.6),proxy3=c(2,3,5))
cproxyme(dat=dat1,anchor=1)
## you can specify weights
cproxyme(dat=dat1,anchor=1,weights=c(0.1,0.5,0.4))
dproxyme
Description
This function estimates measurement stochastic matrices of discrete proxy variables.
Usage
dproxyme(
dat,
sbar = 2,
initvar = 1,
initvec = NULL,
seed = 210313,
tol = 0.005,
maxiter = 200,
miniter = 10,
minobs = 100,
maxiter2 = 1000,
trace = FALSE,
weights = NULL
)
Arguments
dat |
A proxy variable data frame list. |
sbar |
A number of discrete types. Default is 2. |
initvar |
A column index of a proxy variable to initialize the EM algorithm. Default is 1. That is, the proxy variable in the first column of "dat" is used for initialization. |
initvec |
This vector defines how to group the initvar to initialize the EM algorithm. |
seed |
Seed. Default is 210313 (birthday of this package). |
tol |
A tolerance for EM algorithm. Default is 0.005. |
maxiter |
A maximum number of iterations for EM algorithm. Default is 200. |
miniter |
A minimum number of iterations for EM algorithm. Default is 10. |
minobs |
Compute likelihood of a proxy variable only if there are more than "minobs" observations. Default is 100. |
maxiter2 |
Maximum number of iterations for "multinom". Default is 1000. |
trace |
Whether to trace EM algorithm progress. Default is FALSE. |
weights |
An optional weight vector |
Value
Returns a list of 5 components :
- M_param
This is a list of estimated measurement (stochastic) matrices. The k-th matrix is a measurement matrix of a proxy variable saved in the kth column of dat data frame (or matrix). The ij-th element in a measurement matrix is the conditional probability of observing j-th (largest) proxy response value conditional on that the latent type is i.
- M_param_col
This is a list of column labels of 'M_param' matrices
- M_param_row
This is a list of row labels of 'M_param' matrices. It is simply c(1:sbar).
- mparam
This is a list of multinomial logit coefficients which were used to compute 'M_param' matrices. These coefficients are useful to compute the likelihood of proxy responses.
- typeprob
This is a type probability matrix of size N-by-sbar. The ij-th entry of this matrix gives the probability of observation i to have type j.
Author(s)
Yujung Hwang, yujungghwang@gmail.com
References
- Dempster, Arthur P., Nan M. Laird, and Donald B. Rubin (1977)
"Maximum likelihood from incomplete data via the EM algorithm." Journal of the Royal Statistical Society: Series B (Methodological) 39.1 : 1-22. doi: 10.1111/j.2517-6161.1977.tb01600.x
- Hu, Yingyao (2008)
Identification and estimation of nonlinear models with misclassification error using instrumental variables: A general solution. Journal of Econometrics, 144(1), 27-61. doi: 10.1016/j.jeconom.2007.12.001
- Hu, Yingyao (2017)
The econometrics of unobservables: Applications of measurement error models in empirical industrial organization and labor economics. Journal of Econometrics, 200(2), 154-168. doi: 10.1016/j.jeconom.2017.06.002
- Hwang, Yujung (2021)
Identification and Estimation of a Dynamic Discrete Choice Models with Endogenous Time-Varying Unobservable States Using Proxies. Working Paper.
- Hwang, Yujung (2021)
Bounding Omitted Variable Bias Using Auxiliary Data. Working Paper.
Examples
dat1 <- data.frame(proxy1=c(1,2,3),proxy2=c(2,3,4),proxy3=c(4,3,2))
## default minimum num of obs to run an EM algorithm is 10
dproxyme(dat=dat1,sbar=2,initvar=1,minobs=3)
## you can specify weights
dproxyme(dat=dat1,sbar=2,initvar=1,minobs=3,weights=c(0.1,0.5,0.4))
makeDummy
Description
This function is to make dummy variables using a discrete variable.
Usage
makeDummy(tZ)
Arguments
tZ |
An input vector |
Value
Returns dZ, a matrix of size length(tZ)-by-card(tZ) :
The ij-th element in dZ is 1 if tZ[i] is equal to the j-th largest value of tZ. And the ij-th element in DZ is 0 otherwise. The row sum of dZ must be 1 by construction.
Author(s)
Yujung Hwang, yujungghwang@gmail.com
Examples
makeDummy(c(1,2,3))
weighted.cov
Description
This function is to compute an unbiased sample weighted covariance. The function uses only pairwise complete observations.
Usage
weighted.cov(x, y, w = NULL)
Arguments
x |
An input vector to compute a covariance, cov(x,y) |
y |
An input vector to compute a covariance, cov(x,y) |
w |
A weight vector |
Value
Returns an unbiased sample weighted covariance
Author(s)
Yujung Hwang, yujungghwang@gmail.com
Examples
# If you do not specify weights,
# it returns the usual unweighted sample covariance
weighted.cov(x=c(1,3,5),y=c(2,3,1))
weighted.cov(x=c(1,3,5),y=c(2,3,1),w=c(0.1,0.5,0.4))
weighted.var
Description
This function is to compute an unbiased sample weighted variance.
Usage
weighted.var(x, w = NULL)
Arguments
x |
A vector to compute a variance, var(x) |
w |
A weight vector |
Value
Returns an unbiased sample weighted variance
Author(s)
Yujung Hwang, yujungghwang@gmail.com
Examples
## If you do not specify weights,
## it returns the usual unweighted sample variance
weighted.var(x=c(1,3,5))
weighted.var(x=c(1,3,5),w=c(0.1,0.5,0.4))