Version: | 0.9.3 |
Date: | 2023-03-4 |
Title: | Factor, Bi-Factor, Second-Order and Factor Tree Copula Models |
Author: | Sayed H. Kadhem [aut], Aristidis K. Nikoloulopoulos [aut, cre] |
Maintainer: | Aristidis K. Nikoloulopoulos <a.nikoloulopoulos@uea.ac.uk> |
Depends: | R (≥ 3.5.0), statmod, abind, utils |
Imports: | igraph, matlab, polycor, VineCopula |
Description: | Estimation, model selection and goodness-of-fit of (1) factor copula models for mixed continuous and discrete data in Kadhem and Nikoloulopoulos (2021) <doi:10.1111/bmsp.12231>; (2) bi-factor and second-order copula models for item response data in Kadhem and Nikoloulopoulos (2023) <doi:10.1007/s11336-022-09894-2>; (3) factor tree copula models for item response data in Kadhem and Nikoloulopoulos (2022) <doi:10.48550/arXiv.2201.00339>. |
License: | GPL-2 | GPL-3 [expanded from: GPL (≥ 2)] |
Packaged: | 2023-03-04 11:03:27 UTC; aris |
NeedsCompilation: | yes |
Repository: | CRAN |
Date/Publication: | 2023-03-05 20:00:02 UTC |
Factor, bi-factor, second-order and factor tree copula models
Description
Estimation, model selection and goodness-of-fit of (1) factor copula models for mixed continuous and discrete data in Kadhem and Nikoloulopoulos (2021); (2) bi-factor and second-order copula models for item response data in Kadhem and Nikoloulopoulos (2023); (3) factor tree copula models for item response data in Kadhem and Nikoloulopoulos (2022).
Details
This package contains R functions for:
diagnostics based on semi-correlations (Kadhem and Nikoloulopoulos, 2021, 2023; Joe, 2014) to detect tail dependence or tail asymmetry;
diagnostics to show that a dataset has a factor structure based on linear factor analysis (Kadhem and Nikoloulopoulos, 2021,2023; Joe, 2014);
estimation of the factor copula models in Krupskii and Joe (2013), Nikoloulopoulos and Joe (2015), and Kadhem and Nikoloulopoulos (2021);
estimation of the bi-factor and second-order copula models for item response data in Kadhem and Nikoloulopoulos (2023);
estimation of the factor tree copula models for item response data in Kadhem and Nikoloulopoulos (2022);
model selection of the factor copula models in Krupskii and Joe (2013), Nikoloulopoulos and Joe (2015);
model selection of the bifactor and second-order copula models in Kadhem and Nikoloulopoulos (2023);
model selection of the factor tree copula models in Kadhem and Nikoloulopoulos (2022);
goodness-of-fit of the factor copula models in Krupskii and Joe (2013), Nikoloulopoulos and Joe (2015), and Kadhem and Nikoloulopoulos (2021) using the
M_2
statistic (Maydeu-Olivares and Joe, 2006). Note that the continuous and count data have to be transformed to ordinal;goodness-of-fit of the bi-factor and second-order copula models in Kadhem and Nikoloulopoulos (2023) using the
M_2
statistic (Maydeu-Olivares and Joe, 2006).
Author(s)
Sayed H. Kadhem
Aristidis K. Nikoloulopoulos a.nikoloulopoulos@uea.ac.uk
References
Joe, H. (2014). Dependence Modelling with Copulas. Chapman & Hall, London.
Maydeu-Olivares, A. and Joe, H. (2006). Limited information goodness-of-fit testing in multidimensional contingency tables. Psychometrika, 71, 713–732. doi:10.1007/s11336-005-1295-9.
Kadhem, S.H. and Nikoloulopoulos, A.K. (2021) Factor copula models for mixed data. British Journal of Mathematical and Statistical Psychology, 74, 365–403. doi:10.1111/bmsp.12231.
Kadhem, S.H. and Nikoloulopoulos, A.K. (2023) Bi-factor and second-order copula models for item response data. Psychometrika, 88, 132–157. doi:10.1007/s11336-022-09894-2.
Kadhem, S.H. and Nikoloulopoulos, A.K. (2022) Factor tree copula models for item response data. Arxiv e-prints, <arXiv: 2201.00339>. https://arxiv.org/abs/2201.00339.
Krupskii, P. and Joe, H. (2013) Factor copula models for multivariate data. Journal of Multivariate Analysis, 120, 85–101. doi:10.1016/j.jmva.2013.05.001.
Nikoloulopoulos, A.K. and Joe, H. (2015) Factor copula models with item response data. Psychometrika, 80, 126–150. doi:10.1007/s11336-013-9387-4.
The 1994 General Social Survey
Description
Hoff (2007) analysed seven demographic variables of 464 male respondents to the 1994 General Social Survey. Of these seven, two were continuous (income and age of the respondents), three were ordinal with 5 categories (highest degree of the survey respondent, income and highest degree of respondent's parents), and two were count variables (number of children of the survey respondent and respondent's parents).
Usage
data(GSS)
Format
A data frame with 464 observations on the following 7 variables:
INCOME
Income of the respondent in 1000s of dollars, binned into 21 ordered categories.
DEGREE
Highest degree ever obtained (0:None, 1:HS, 2:Associates, 3:Bachelors, 4:Graduate).
CHILDREN
Number of children of the survey respondent.
PINCOME
Financial status of respondent's parents when respondent was 16 (on a 5-point scale).
PDEGREE
Highest degree of the survey respondent's parents (0:None, 1:HS, 2:Associates, 3:Bachelors, 4:Graduate).
PCHILDREN
Number of children of the survey respondent's parents - 1.
AGE
Age of the respondents in years.
Source
Hoff, P. D. (2007). Extending the rank likelihood for semiparametric copula estimation. The Annals of Applied Statistics, 1, 265–283.
Goodness-of-fit of factor copula models for mixed data
Description
The limited information M_2
statistic (Maydeu-Olivares and Joe, 2006) of factor copula models for mixed continuous and discrete data.
Usage
M2.1F(tcontinuous, ordinal, tcount, cpar, copF1, gl)
M2.2F(tcontinuous, ordinal, tcount, cpar, copF1, copF2, gl, SpC)
Arguments
tcontinuous |
|
ordinal |
|
tcount |
|
cpar |
A list of estimated copula parameters. |
copF1 |
|
copF2 |
|
gl |
Gauss legendre quardrature nodes and weights. |
SpC |
Special case for the 2-factor copula model with BVN copulas. Select a bivariate copula at the 2nd factor to be fixed to independence. e.g. "SpC = 1" to set the first copula at the 2nd factor to independence. |
Details
The M_2
statistic has been developed for goodness-of-fit testing in multidimensional contingency tables by Maydeu-Olivares and Joe (2006).
Nikoloulopoulos and Joe (2015) have used the M_2
statistic to assess the goodness-of-fit of factor copula models for ordinal data. We build on the aforementioned papers and propose a methodology to assess the overall goodness-of-fit of factor copula models for mixed continuous and discrete responses. Since the M_2
statistic has been developed for multivariate ordinal data, we propose to first transform the continuous and count variables to ordinal and then calculate the M_2
statistic at the maximum likelihood estimate before transformation.
Value
A list containing the following components:
M2 |
The |
df |
|
p-value |
The resultant |
Author(s)
Sayed H. Kadhem s.kadhem@uea.ac.uk
Aristidis K. Nikoloulopoulos a.nikoloulopoulos@uea.ac.uk
References
Kadhem, S.H. and Nikoloulopoulos, A.K. (2021) Factor copula models for mixed data. British Journal of Mathematical and Statistical Psychology, 74, 365–403. doi:10.1111/bmsp.12231.
Maydeu-Olivares, A. and Joe, H. (2006). Limited information goodness-of-fit testing in multidimensional contingency tables. Psychometrika, 71, 713–732. doi:10.1007/s11336-005-1295-9.
Nikoloulopoulos, A.K. and Joe, H. (2015) Factor copula models with item response data. Psychometrika, 80, 126–150. doi:10.1007/s11336-013-9387-4.
Examples
#------------------------------------------------
# Setting quadreture points
nq <- 25
gl <- gauss.quad.prob(nq)
#------------------------------------------------
# PE Data
#------------------ -----------------
data(PE)
continuous.PE1 = -PE[,1]
continuous.PE2 = PE[,2]
continuous.PE <- cbind(continuous.PE1, continuous.PE2)
categorical.PE <- PE[, 3:5]
#------------------------------------------------
# Estimation
#------------------ -----------------
#------------------ One-factor -----------------
# one-factor copula model
cop1f.PE <- c("joe", "joe", "rjoe", "joe", "gum")
est1factor.PE <- mle1factor(continuous.PE, categorical.PE,
count=NULL, copF1=cop1f.PE, gl, hessian = TRUE)
#------------------------------------------------
# M2
#------------------------------------------------
#Transforming the continuous to ordinal data:
ncontinuous.PE = continuous2ordinal(continuous.PE, 5)
# M2 statistic for the one-factor copula model:
m2.1f.PE <- M2.1F(ncontinuous.PE, categorical.PE, tcount=NULL,
cpar=est1factor.PE$cpar, copF1=cop1f.PE, gl)
Goodness-of-fit of bi-factor and second-order copula models for item response data
Description
The limited information M_2
statistic (Maydeu-Olivares and Joe, 2006) of bi-factor and second-order copula models for item response data.
Usage
M2Bifactor(y,cpar, copnames1, copnames2, gl, ngrp, grpsize)
M2Second_order(y,cpar, copnames1, copnames2, gl, ngrp, grpsize)
Arguments
y |
|
cpar |
A list of estimated copula parameters. |
copnames1 |
For the bi-factor copula: |
copnames2 |
For the bi-factor copula: |
gl |
Gauss legendre quardrature nodes and weights. |
ngrp |
number of non-overlapping groups. |
grpsize |
vector indicating the size for each group, e.g., c(4,4,4) indicating four items in all three groups. |
Details
The M_2
statistic has been developed for goodness-of-fit testing in multidimensional contingency tables by Maydeu-Olivares and Joe (2006). We use the M_2
to assess the overall fit for the bi-factor and second-order copula models for item resposne data (Kadhem & Nikoloulopoulos, 2021).
Value
A list containing the following components:
M2 |
The |
df |
|
p-value |
The resultant |
Author(s)
Sayed H. Kadhem s.kadhem@uea.ac.uk
Aristidis K. Nikoloulopoulos a.nikoloulopoulos@uea.ac.uk
References
Kadhem, S.H. and Nikoloulopoulos, A.K. (2023) Bi-factor and second-order copula models for item response data. Psychometrika, 88, 132–157. doi:10.1007/s11336-022-09894-2.
Maydeu-Olivares, A. and Joe, H. (2006). Limited information goodness-of-fit testing in multidimensional contingency tables. Psychometrika, 71, 713–732. doi:10.1007/s11336-005-1295-9.
Examples
#------------------------------------------------
# Setting quadreture points
nq <- 15
gl <- gauss.quad.prob(nq)
#------------------------------------------------
# TAS Data
#------------------ -----------------
data(TAS)
#using a subset of the data
#group1: 1,3,6,7,9,13,14
grp1=c(1,3,6)
#group2: 2,4,11,12,17
grp2=c(2,4,11)
#group3: 5,8,10,15,16,18,19,20
grp3=c(5,8,10)
#Rearrange items within testlets
set.seed(123)
i=sample(1:nrow(TAS),500)
ydat=TAS[i,c(grp1,grp2,grp3)]
d=ncol(ydat);d
n=nrow(ydat);n
#size of each group
g1=length(grp1)
g2=length(grp2)
g3=length(grp3)
grpsize=c(g1,g2,g3)#group size
#number of groups
ngrp=length(grpsize)
#------------------------------------------------
# M2
#------------------------------------------------
#BI-FACTOR
tauX0 = c(0.49,0.16,0.29,#0.09,0.47,0.49,0.30,
0.46,0.41,0.33,#0.29,0.24,
0.10,0.16,0.14)#,0.12,0.03,0.03,0.10,0.10)
tauXg = c(0.09,0.37,0.23,#0.53,0.24,0.32,0.27,
0.53,0.58,0.20,#0.23,0.25,0.34,0.33,
0.30,0.19,0.24)#,0.29,0.43,0.26)
copX0 = rep("bvt2", d)
copXg = c(rep("rgum", g1), rep("bvt3", g2+g3))
#converting taus to cpars
cparX0=mapply(function(x,y) tau2par(x,y),x=copX0,y=tauX0)
cparXg=mapply(function(x,y) tau2par(x,y),x=copXg,y=tauXg)
cpar=c(cparX0,cparXg)
m2_Bifactor = M2Bifactor(y=ydat, cpar, copX0, copXg, gl, ngrp, grpsize)
Political-economic risk of 62 countries for the year 1987
Description
Quinn (2004) used 5 mixed variables, namely the continuous variable black-market premium in each country (used as a proxy for illegal economic activity), the continuous variable productivity as measured by real gross domestic product per worker in 1985 international prices, the binary variable independence of the national judiciary (1 if the judiciary is judged to be independent and 0 otherwise), and the ordinal variables measuring the lack of expropriation risk and lack of corruption.
Usage
data(PE)
Format
A data frame with 62 observations (countries) on the following 5 variables:
BM
Black-market premium.
GDP
Gross domestic product.
IJ
Independent judiciary.
XPR
Lack of expropriation risk.
CPR
Lack of corruption.
Source
Quinn, K. M. (2004). Bayesian factor analysis for mixed ordinal and continuous responses. Political Analysis, 12, 338–353.
The Post-traumatic stress disorder (PTSD)
Description
The measure consists of 20 categorical items divided into four domains: (1) intrusions (e.g., repeated, dis- turbing, and unwanted memories), (2) avoidance (e.g., avoiding external reminder of the stressful experience), (3) cognition and mood alterations (e.g., trouble remembering important parts of the stressful experience) and (4) reactivity alterations (e.g., taking too many risks or doing things that could cause you harm). Each item is answered in a five-point ordinal scale: “0 = Not at all”, “1 = A little bit”, “2 = Moderately”, “3 = Quite a bit”, and “4 = Extremely”.
Usage
data(PTSD)
Format
A data frame with 221 observations and 20 items.
Source
Williams, D. and Mulder, J. (2020). BGGM: Bayesian Gaussian Graphical Models. R package version 1.0.0.
Model selection of the factor copula models for mixed data
Description
A heuristic algorithm that automatically selects the bivariate parametric copula families that link the observed to the latent variables.
Usage
select1F(continuous, ordinal, count, copnamesF1, gl)
select2F(continuous, ordinal, count, copnamesF1, copnamesF2, gl)
Arguments
continuous |
|
ordinal |
|
count |
|
copnamesF1 |
A vector with the names of possible candidates of bivariate copulas that link the each of the oberved variabels with the 1st factor. Choices are “bvn” for BVN, “bvt |
copnamesF2 |
A list with the names of possible candidates of bivariate copulas that link the each of the oberved variabels with the 1st and 2nd factors. Choices are “bvn” for BVN, “bvt |
gl |
Gauss legendre quardrature nodes and weights. |
Details
The linking copulas at each factor are selected with a sequential algorithm under the initial assumption that linking copulas are Frank, and then sequentially copulas with non-tail quadrant independence are assigned to any of pairs where necessary to account for tail asymmetry (discrete data) or tail dependence (continuous data).
Value
A list containing the following components:
``1st factor'' |
The selected bivariate linking copulas for the 1st factor. |
``2nd factor'' |
The selected bivariate linking copulas for the 2nd factor. |
AIC |
Akaike information criterion. |
taus |
The estimated copula parameters in Kendall's tau scale. |
Author(s)
Sayed H. Kadhem s.kadhem@uea.ac.uk
Aristidis K. Nikoloulopoulos a.nikoloulopoulos@uea.ac.uk
References
Kadhem, S.H. and Nikoloulopoulos, A.K. (2021) Factor copula models for mixed data. British Journal of Mathematical and Statistical Psychology, 74, 365–403. doi:10.1111/bmsp.12231.
Examples
#------------------------------------------------
# Estimation
#------------------ -----------------
# Setting quadreture points
nq<-25
gl<-gauss.quad.prob(nq)
#------------------------------------------------
# PE Data
#------------------ -----------------
data(PE)
continuous.PE1 = -PE[,1]
continuous.PE <- cbind(continuous.PE1, PE[,2])
categorical.PE <- PE[, 3:5]
#------------------ One-factor -----------------
# listing the possible copula candidates:
d <- ncol(PE)
copulasF1 <- rep(list(c("bvn", "bvt3", "bvt5", "frk", "gum",
"rgum", "rjoe","joe", "1rjoe","2rjoe", "1rgum","2rgum")), d)
out1F.PE <- select1F(continuous.PE, categorical.PE,
count=NULL, copulasF1, gl)
Model selection of the 1- and 2-factor tree copula models for item response data
Description
Heuristic algorithms that automatically select the bivariate parametric copula families and 1-truncated vine tree structure for the 1- and 2-factor tree copula models for item response data.
Usage
selectFactorTrVine(y,rmat,alg)
copselect1FactorTree(y, A, copnames, gl)
copselect2FactorTree(y, A, copnames, gl)
Arguments
y |
|
rmat |
Polychoric correlation matrix of y. |
alg |
1-truncated vine selection using partial and polychoric correlations. Choose a number from (1,2,3). 1: Partial correlation selection algorithm for 1-factor tree copula. 2: Partial correlation selection algorithm for 2-factor tree copula. 3: Polychoric correlation selection algorithm for 1-factor/2-factor tree copulas, here any other correlation matrix (rmat) can be used. |
A |
|
copnames |
A vector with the names of possible candidates of bivariate copulas to be selected for the 1-factor/2-factor copula models for item response data. Choices are “bvn” for BVN, “bvt |
gl |
Gauss legendre quardrature nodes and weights. |
Details
The model selection involves two separate steps. At the first step we select the 1-truncated vine or Markov tree structure assuming our models are constructed with BVN copulas. We find the maximum spanning tree which is a tree on all nodes that maximizes the pairwise dependencies using the well-known algorithm of Prim (1957). That is we find the tree with d-1
edges \mathcal{E}
that minimizes
\sum_{\mathcal{E}}\log(1-r_{jk}^2).
The minimum spanning tree algorithm of Prim (1957)
guarantees to find the optimal solution when edge weights between nodes are given by \log(1-r_{jk}^2)
.
We use two different measures of pairwise dependence r_{jk}
. The first measure is the estimated polychoric correlation (Olsson, 1979). The second measures of pairwise dependence is the partial correlation which is based on the factor copula models in Nikoloulopoulos and Joe (2015) with BVN copulas; for more details see Kadhem and Nikoloulopoulos (2022). We call polychoric and partial correlation selection algorithm when the pairwise dependence is the polychoric and partial correlation, respectively.
At the the second step, we sequentially select suitable bivariate copulas to account for any tail dependence/asymmetry. We start from the 1st tree that includes copulas connecting the observed variables to the first factor and then we iterate over a set of copula candidates and select suitable bivariate copulas in the subsequent trees. We select only one copula family for all the edges in each tree to ease interpretation.
Value
A list containing the following components:
``1Factor copulas'' |
The selected bivariate linking copulas for the first factor. |
``2Factor copulas'' |
The selected bivariate linking copulas for the second factor. |
``Vine tree copulas'' |
The selected bivariate linking copulas for the 1-truncated vine tree. |
``log-likelihood'' |
The maximized joint log-likelihood. |
``estimated taus'' |
The estimated copula parameters in Kendall's tau scale. |
F1treeA |
Vine array for the 1-factor tree copula. |
F2treeA |
Vine array for the 2-factor tree copula. |
pmat_f1 |
Partial correlation matrix for the 1-factor tree copula. |
pmat_f2 |
Partial correlation matrix for the 2-factor tree copula. |
Author(s)
Sayed H. Kadhem
Aristidis K. Nikoloulopoulos a.nikoloulopoulos@uea.ac.uk
References
Fontenla, M. (2014). optrees: Optimal Trees in Weighted Graphs. R package version 1.0.
Kadhem, S.H. and Nikoloulopoulos, A.K. (2022) Factor tree copula models for item response data. Arxiv e-prints, <arXiv: 2201.00339>. https://arxiv.org/abs/2201.00339.
Nikoloulopoulos, A.K. and Joe, H. (2015) Factor copula models with item response data. Psychometrika, 80, 126–150. doi:10.1007/s11336-013-9387-4.
Olsson, F. (1979) Maximum likelihood estimation of the polychoric correlation coefficient. Psychometrika, 44, 443–460.
Prim, R. C. (1957) Shortest connection networks and some generalizations. The Bell System Technical Journal, 36, 1389–1401.
Examples
#------------------------------------------------
# Setting quadreture points
nq <- 5
gl <- gauss.quad.prob(nq)
#------------------------------------------------
# PTSD Data
#------------------ -----------------
data(PTSD)
ydat=PTSD
#------------------------------------------------
#vine tree structure
#selecting vine tree based on polychoric corr
rmat=polychoric0(ydat)$p
A.polychoric=selectFactorTrVine(y=ydat,rmat,alg=3)
#selecting bivariate copulas for the 1-2-factor tree
# listing the possible copula candidates:
copnames=c("gum", "rgum")
f1tree_model = copselect1FactorTree(ydat, A.polychoric$VineTreeA,
copnames, gl)
f2tree_model = copselect2FactorTree(ydat, A.polychoric$VineTreeA,
copnames, gl)
Model selection of the bi-factor and second-order copula models for item response data
Description
A heuristic algorithm that automatically selects the bivariate parametric copula families for the bi-factor and second-order copula models for item response data.
Usage
selectBifactor(y, grpsize, copnames, gl)
selectSecond_order(y, grpsize, copnames, gl)
Arguments
y |
|
grpsize |
vector indicating the size for each group, e.g., c(4,4,4) indicating four items in all three groups. |
copnames |
A vector with the names of possible candidates of bivariate copulas to be selected for the bi-factor and second-order copula models for item response data. Choices are “bvn” for BVN, “bvt |
gl |
Gauss legendre quardrature nodes and weights. |
Details
The linking copulas at each factor are selected with a sequential algorithm under the initial assumption that linking copulas are BVN, and then sequentially copulas with tail dependence are assigned to any of pairs where necessary to account for tail asymmetry.
Value
A list containing the following components:
``common factor'' |
The selected bivariate linking copulas for the common factor (Bi-factor: copulas link items with the common factor. Second-order: copulas link group-specific factors with the common factor). |
``group-specific factor'' |
The selected bivariate linking copulas for the items with group-speicifc factors. |
log-likelihood |
The maximized joint log-likelihood. |
taus |
The estimated copula parameters in Kendall's tau scale. |
Author(s)
Sayed H. Kadhem s.kadhem@uea.ac.uk
Aristidis K. Nikoloulopoulos a.nikoloulopoulos@uea.ac.uk
References
Kadhem, S.H. and Nikoloulopoulos, A.K. (2023) Bi-factor and second-order copula models for item response data. Psychometrika, 88, 132–157. doi:10.1007/s11336-022-09894-2.
Examples
#------------------------------------------------
# Setting quadreture points
nq <- 15
gl <- gauss.quad.prob(nq)
#------------------------------------------------
# TAS Data
#------------------ -----------------
data(TAS)
#using a subset of the data
#group1: 1,3,6,7,9,13,14
grp1=c(1,3)
#group2: 2,4,11,12,17
grp2=c(2,4)
#group3: 5,8,10,15,16,18,19,20
grp3=c(5,8)
#Rearrange items within testlets
set.seed(123)
i=sample(1:nrow(TAS),500)
ydat=TAS[i,c(grp1,grp2,grp3)]
#size of each group
g1=length(grp1)
g2=length(grp2)
g3=length(grp3)
grpsize=c(g1,g2,g3)
# listing possible copula candidates:
copnames=c("bvt2","bvt3")
Bifactor_model = selectBifactor(ydat, grpsize, copnames, gl)
Toronto Alexithymia Scale (TAS)
Description
The Toronto Alexithymia Scale is the most utilized measure of alexithymia in empirical research and is composed of d = 20
items that can be subdivided into G = 3
non-overlapping groups: d_1 = 7
items to assess difficulty identifying feelings (DIF), d_2 = 5
items to assess difficulty describing feelings (DDF) and d_3 = 8
items to assess externally oriented thinking (EOT).
Students were 17 to 25 years old and 58% of them were female and 42% were male. They were asked to respond to each item using one of K = 5 categories: “1 = completely disagree”, “2 = disagree”, “3 = neutral”, “4 = agree”, “5 = completely agree”.
Usage
data(TAS)
Format
A data frame with 1925 observations on the following 20 items:
DIF
items: 1,3,6,7,9,13,14.
DDF
items: 2,4,11,12,17.
EOT
items: 5,8,10,15,16,18,19,20.
Source
Briganti, G. and Linkowski, P. (2020). Network approach to items and domains from the toronto alexithymia scale. Psychological Reports, 123, 2038–2052.
Williams, D. and Mulder, J. (2020). BGGM: Bayesian Gaussian Graphical Models. R package version 1.0.0.
Vuong's test for the comparison of factor copula models
Description
Vuong (1989)'s test for the comparison of non-nested factor copula models for mixed data. We compute the Vuong's test between the factor copula model with BVN copulas (that is the standard factor model) and a competing factor copula model to reveal if the latter provides better fit than the standard factor model.
Usage
vuong.1f(cpar.bvn, cpar, copF1, continuous, ordinal, count, gl, param)
vuong.2f(cpar.bvn, cpar, copF1, copF2, continuous, ordinal, count, gl, param)
Arguments
cpar.bvn |
copula parameters of the factor copula model with BVN copulas. |
cpar |
copula parameters of the competing factor copula model. |
copF1 |
copula names for the first factor of the competing factor copula model. |
copF2 |
copula names for the second factor of the competing factor copula model. |
continuous |
matrix of continuous data. |
ordinal |
matrix of ordinal data. |
count |
matrix of count data. |
gl |
gauss-legendre quardature points. |
param |
parameterization of estimated copula parameters. If FALSE, then cpar are the actual copula parameters without any transformation/reparamterization. |
Value
A vector containing the following components:
z |
the test statistic. |
p.value |
the |
CI.left |
lower/left endpoint of 95% confidence interval. |
CI.right |
upper/right endpoint of 95% confidence interval. |
Author(s)
Sayed H. Kadhem s.kadhem@uea.ac.uk
Aristidis K. Nikoloulopoulos a.nikoloulopoulos@uea.ac.uk
References
Kadhem, S.H. and Nikoloulopoulos, A.K. (2021) Factor copula models for mixed data. British Journal of Mathematical and Statistical Psychology, 74, 365–403. doi:10.1111/bmsp.12231.
Vuong, Q.H. (1989). Likelihood ratio tests for model selection and non-nested hypotheses. Econometrica, 57, 307–333.
Examples
#------------------------------------------------
# Setting quadreture points
nq <- 25
gl <- gauss.quad.prob(nq)
#------------------------------------------------
# PE Data
#------------------ -----------------
data(PE)
continuous.PE1 = -PE[,1]
continuous.PE2 = PE[,2]
continuous.PE <- cbind(continuous.PE1, continuous.PE2)
categorical.PE <- PE[, 3:5]
d <- ncol(PE)
#------------------------------------------------
# Estimation
#------------------ -----------------
# factor copula model with BVN copulas
cop1f.PE.bvn <- rep("bvn", d)
PE.bvn1f <- mle1factor(continuous.PE, categorical.PE,
count=NULL, copF1=cop1f.PE.bvn, gl, hessian = TRUE)
# Selected factor copula model
cop1f.PE <- c("joe", "joe", "rjoe", "joe", "gum")
PE.selected1f <- mle1factor(continuous.PE, categorical.PE,
count=NULL, copF1=cop1f.PE, gl, hessian = TRUE)
#------------------------------------------------
# Vuong's test
#------------------ -----------------
v1f.PE.selected <- vuong.1f(PE.bvn1f$cpar$f1,
PE.selected1f$cpar$f1,cop1f.PE, continuous.PE,
categorical.PE, count=NULL, gl, param=FALSE)
Vuong's test for the comparison of 1- and 2-factor tree copula models for item response data
Description
The Vuong's test (Vuong,1989) is the sample version of the difference in Kullback-Leibler divergence between two models and can be used to differentiate two parametric models which could be non-nested. For the Vuong's test we provide the 95% confidence interval of the Vuong's test statistic (Joe, 2014, page 258). If the interval does not contain 0, then the best fitted model according to the AICs is better if the interval is completely above 0.
Usage
vuong_FactorTree(models, y, A.m1, tau.m1, copnames.m1,
tau.m2, copnames.m2,A.m2,nq)
Arguments
models |
choose a number from (1,2,3,4,5,6,7). 1: Model 1 is 1-factor tree copula, Model 2 is 1-factor tree copula. 2: Model 1 is 2-factor tree copula, Model 2 is 2-factor tree copula. 3: Model 1 is 1-factor tree copula, Model 2 is 2-factor tree copula. 4: Model 1 is 1-factor copula, Model 2 is 1-factor tree copula. 5: Model 1 is 1-factor copula, Model 2 is 2-factor tree copula. 6: Model 1 is 2-factor copula, Model 2 is 1-factor tree copula. 7: Model 1 is 2-factor copula, Model 2 is 2-factor tree copula. |
y |
|
A.m1 |
|
A.m2 |
|
tau.m1 |
vector of copula paramters in Kendall's |
tau.m2 |
vector of copula paramters in Kendall's |
copnames.m1 |
vector of names of copula families for model 1. |
copnames.m2 |
vector of names of copula families for model 2. |
nq |
Number of Gauss legendre quardrature points. |
Value
A vector containing the following components:
z |
the test statistic. |
p.value |
the |
CI.left |
lower/left endpoint of 95% confidence interval. |
CI.right |
upper/right endpoint of 95% confidence interval. |
Author(s)
Sayed H. Kadhem
Aristidis K. Nikoloulopoulos a.nikoloulopoulos@uea.ac.uk
References
Joe, H. (2014). Dependence Modelling with Copulas. Chapman and Hall/CRC.
Kadhem, S.H. and Nikoloulopoulos, A.K. (2022b) Factor tree copula models for item response data. Arxiv e-prints, <arXiv: 2201.00339>. https://arxiv.org/abs/2201.00339.
Vuong, Q.H. (1989). Likelihood ratio tests for model selection and non-nested hypotheses. Econometrica, 57, 307–333.
Examples
#------------------------------------------------
# Setting quadreture points
nq <- 25
#------------------------------------------------
# PTSD Data
#------------------ -----------------
data(PTSD)
ydat=PTSD
d=ncol(ydat);d
n=nrow(ydat);n
#vine tree structure
#selecting vine tree based on polychoric corr
rmat=polychoric0(ydat)$p
A.polychoric=selectFactorTrVine(y=ydat,rmat,alg=3)
A.polychoric=A.polychoric$VineTreeA
#------------------------------------------------
# M1 1-factor tree - M2 1-factor tree
tau.m1 = rep(0.4,d*2-1)
copnames.m1 = rep("bvn",d*2-1)
tau.m2 = rep(0.4,d*2-1)
copnames.m2 = rep("rgum",d*2-1)
vuong.1factortree = vuong_FactorTree(models=1, ydat,
A.m1=A.polychoric, tau.m1, copnames.m1, tau.m2,
copnames.m2,A.m2=A.polychoric,nq)
#------------------------------------------------
# M1 2-factor tree - M2 2-factor tree
tau.m1 = rep(0.4,d*3-1)
copnames.m1 = rep("bvn",d*3-1)
tau.m2 = rep(0.4,d*3-1)
copnames.m2 = rep("rgum",d*3-1)
vuong.2factortree = vuong_FactorTree(models=2, ydat,
A.m1=A.polychoric, tau.m1, copnames.m1, tau.m2,
copnames.m2,A.m2=A.polychoric,nq)
#------------------------------------------------
# M1 1-factor tree - M2 2-factor tree
tau.m1 = rep(0.4,d*2-1)
copnames.m1 = rep("bvn",d*2-1)
tau.m2 = rep(0.4,d*3-1)
copnames.m2 = rep("rgum",d*3-1)
vuong.12factortree = vuong_FactorTree(models=3, ydat,
A.m1=A.polychoric, tau.m1, copnames.m1, tau.m2,
copnames.m2,A.m2=A.polychoric,nq)
Vuong's test for the comparison of bi-factor and second-order copula models
Description
The Vuong's test (Vuong,1989) is the sample version of the difference in Kullback-Leibler divergence between two models and can be used to differentiate two parametric models which could be non-nested. For the Vuong's test we provide the 95% confidence interval of the Vuong's test statistic (Joe, 2014, page 258). If the interval does not contain 0, then the best fitted model according to the AICs is better if the interval is completely above 0.
Usage
vuong_structured(models, cpar.m1, copnames.m1, cpar.m2,
copnames.m2, y, grpsize)
Arguments
models |
choose a number from (1,2,3,4). 1: Model1 is bifactor, Model2 is bifactor. 2: Model1 is second-order, Model2 is second-order. 3: Model1 is second-order, Model2 is bifactor. 4: Model1 is bifactor, Model2 is nested. |
cpar.m1 |
vector of copula paramters for model 1, starting with copula parameters that link the items with common factor (bifactor), or group factors with common factor (second-order). |
cpar.m2 |
vector of copula paramters for model 2, starting with copula parameters that link the items with common factor (bifactor), or group factors with common factor (second-order). |
copnames.m1 |
vector of names of copula families for model 1, starting with copulas that link the items with common factor (bifactor), or group factors with common factor (second-order). |
copnames.m2 |
vector of names of copula families for model 2, starting with copulas that link the items with common factor (bifactor), or group factors with common factor (second-order). |
y |
matrix of ordinal data. |
grpsize |
vector indicating the size for each group, e.g., c(4,4,4) indicating four items in all three groups. |
Value
A vector containing the following components:
z |
the test statistic. |
p.value |
the |
CI.left |
lower/left endpoint of 95% confidence interval. |
CI.right |
upper/right endpoint of 95% confidence interval. |
Author(s)
Sayed H. Kadhem s.kadhem@uea.ac.uk
Aristidis K. Nikoloulopoulos a.nikoloulopoulos@uea.ac.uk
References
Joe, H. (2014). Dependence Modelling with Copulas. Chapman and Hall/CRC.
Kadhem, S.H. and Nikoloulopoulos, A.K. (2023) Bi-factor and second-order copula models for item response data. Psychometrika, 88, 132–157. doi:10.1007/s11336-022-09894-2.
Vuong, Q.H. (1989). Likelihood ratio tests for model selection and non-nested hypotheses. Econometrica, 57, 307–333.
Examples
#------------------------------------------------
# Setting quadreture points
nq <- 25
gl <- gauss.quad.prob(nq)
#------------------------------------------------
# TAS Data
#------------------ -----------------
data(TAS)
grp1=c(1,3,6,7,9,13,14)
grp2=c(2,4,11,12,17)
grp3=c(5,8,10,15,16,18,19,20)
ydat=TAS[,c(grp1,grp2,grp3)]
d=ncol(ydat);d
n=nrow(ydat);n
#Rearrange items within testlets
g1=length(grp1)
g2=length(grp2)
g3=length(grp3)
grpsize=c(g1,g2,g3)#group size
#number of groups
ngrp=length(grpsize)
#------------------------------------------------
# M1 bifactor - M2 bifactor
cpar.m1 = rep(0.6,d*2)
copnames.m1 = rep("bvn",d*2)
cpar.m2 = rep(1.6,d*2)
copnames.m2 = rep("rgum",d*2)
vuong.bifactor = vuong_structured(models=1, cpar.m1, copnames.m1,
cpar.m2, copnames.m2,
y=ydat, grpsize)
# M1 seconod-order - M2 seconod-order
cpar.m1 = rep(0.6,d+ngrp)
copnames.m1 = rep("bvn",d+ngrp)
cpar.m2 = rep(1.6,d+ngrp)
copnames.m2 = rep("rgum",d+ngrp)
vuong.second_order = vuong_structured(models=2, cpar.m1,
copnames.m1, cpar.m2, copnames.m2, y=ydat, grpsize)
# M1 seconod-order - M2 bifactor
cpar.m1 = rep(0.6,d+ngrp)
copnames.m1 = rep("bvn",d+ngrp)
cpar.m2 = rep(1.6,d*2)
copnames.m2 = rep("rgum",d*2)
vuong.2ndO_bif = vuong_structured(models=3, cpar.m1, copnames.m1,
cpar.m2, copnames.m2,
y=ydat, grpsize)
# M1 bifactor - M2 seconod-order
cpar.m1 = rep(0.6,d*2)
copnames.m1 = rep("bvn",d*2)
cpar.m2 = rep(1.6,d+ngrp)
copnames.m2 = rep("rgum",d+ngrp)
vuong.bif_2ndO = vuong_structured(models=4, cpar.m1, copnames.m1,
cpar.m2, copnames.m2,
y=ydat, grpsize)
Diagnostics to detect a factor dependence structure
Description
The diagnostic method in Joe (2014, pages 245-246) to show that each dataset has a factor structure based on linear factor analysis. The correlation matrix \R_{\mathrm{observed}}
has been obtained based on the sample correlations from the bivariate pairs of the observed variables.
These are the linear (when both variables are continuous), polychoric (when both variables are ordinal), and polyserial (when one variable is continuous and the other is ordinal) sample correlations among the observed variables.
The resulting \R_{\mathrm{observed}}
is generally positive definite if the sample size is not small enough; if not one has to convert it to positive definite. We calculate various measures of discrepancy between \R_{\mathrm{observed}}
and \R_{\mathrm{model}}
(the resulting correlation matrix of linear factor analysis), such as the maximum absolute correlation
difference D_1=\max|\R_{\mathrm{model}} - \R_{\mathrm{observed}}|
, the average absolute correlation
difference D_2=\mathrm{avg}| \R_{\mathrm{model}} - \R_{\mathrm{observed}}|
, and the correlation matrix discrepancy measure D_3=\log\bigl( \det(\R_{\mathrm{model}}) \bigr) - \log\bigl( \det(\R_{\mathrm{observed}})\bigr) + \mathrm{tr}( \R^{-1}_{\mathrm{model}} \R_{\mathrm{observed}} ) - d
.
Usage
discrepancy(cormat, n, f3)
Arguments
cormat |
|
n |
Sample size. |
f3 |
If TRUE, then the linear 3-factor analysis is fitted. |
Value
A matrix with the calculated discrepancy measures for different number of factors.
Author(s)
Sayed H. Kadhem s.kadhem@uea.ac.uk
Aristidis K. Nikoloulopoulos a.nikoloulopoulos@uea.ac.uk
References
Joe, H. (2014). Dependence Modelling with Copulas. Chapman & Hall, London.
Kadhem, S.H. and Nikoloulopoulos, A.K. (2021) Factor copula models for mixed data. British Journal of Mathematical and Statistical Psychology, 74, 365–403. doi:10.1111/bmsp.12231.
Examples
#------------------------------------------------
# PE Data
#------------------ -----------------
data(PE)
#correlation
continuous.PE1 <- -PE[,1]
continuous.PE <- cbind(continuous.PE1, PE[,2])
u.PE <- apply(continuous.PE, 2, rank)/(nrow(PE)+1)
z.PE <- qnorm(u.PE)
categorical.PE <- data.frame(apply(PE[, 3:5], 2, factor))
nPE <- cbind(z.PE, categorical.PE)
#-------------------------------------------------
# Discrepancy measures----------------------------
#-------------------------------------------------
#correlation matrix for mixed data
cormat.PE <- as.matrix(polycor::hetcor(nPE, std.err=FALSE))
#discrepancy measures
out.PE = discrepancy(cormat.PE, n = nrow(nPE), f3 = FALSE)
#------------------------------------------------
#------------------------------------------------
# GSS Data
#------------------ -----------------
data(GSS)
attach(GSS)
continuous.GSS <- cbind(INCOME,AGE)
continuous.GSS <- apply(continuous.GSS, 2, rank)/(nrow(GSS)+1)
z.GSS <- qnorm(continuous.GSS)
ordinal.GSS <- cbind(DEGREE,PINCOME,PDEGREE)
count.GSS <- cbind(CHILDREN,PCHILDREN)
# Transforming the count variables to ordinal
# count1 : CHILDREN
count1 = count.GSS[,1]
count1[count1 > 3] = 3
# count2: PCHILDREN
count2 = count.GSS[,2]
count2[count2 > 7] = 7
# Combining both transformed count variables
ncount.GSS = cbind(count1, count2)
# Combining ordinal and transformed count variables
categorical.GSS <- cbind(ordinal.GSS, ncount.GSS)
categorical.GSS <- data.frame(apply(categorical.GSS, 2, factor))
# combining continuous and categorical variables
nGSS = cbind(z.GSS, categorical.GSS)
#-------------------------------------------------
# Discrepancy measures----------------------------
#-------------------------------------------------
#correlation matrix for mixed data
cormat.GSS <- as.matrix(polycor::hetcor(nGSS, std.err=FALSE))
#discrepancy measures
out.GSS = discrepancy(cormat.GSS, n = nrow(nGSS), f3 = TRUE)
Mapping of Kendall's tau and copula parameter
Description
Bivariate copulas: mapping of Kendall's tau and copula parameter.
Usage
par2tau(copulaname, cpar)
tau2par(copulaname, tau)
Arguments
copulaname |
Choices are “bvn” for BVN, “bvt |
cpar |
Copula parameter(s). |
tau |
Kendall's tau. |
Value
Kendall's tau or copula parameter.
References
Joe H (1997) Multivariate Models and Dependence Concepts. Chapman & Hall, London.
Joe H (2014) Dependence Modeling with Copulas. Chapman & Hall, London.
Joe H (2014) CopulaModel: Dependence Modeling with Copulas. Software for book: Dependence Modeling with Copulas, Chapman & Hall, London 2014.
Examples
# 1-param copulas
#BVN copula
cpar.bvn = tau2par("bvn", 0.5)
tau.bvn = par2tau("bvn", cpar.bvn)
#Frank copula
cpar.frk = tau2par("frk", 0.5)
tau.frk = par2tau("frk", cpar.frk)
#Gumbel copula
cpar.gum = tau2par("gum", 0.5)
tau.gum = par2tau("gum", cpar.gum)
#Joe copula
cpar.joe = tau2par("joe", 0.5)
tau.joe = par2tau("joe", cpar.joe)
# 2-param copulas
#BB1 copula
tau.bb1 = par2tau("bb1", c(0.5,1.5))
#BB7 copula
tau.bb7 = par2tau("bb7", c(1.5,1))
#BB8 copula
tau.bb8 = par2tau("bb8", c(3,0.8))
#BB10 copula
tau.bb10 = par2tau("bb10", c(3,0.8))
Maximum likelhood estimation of factor copula models for mixed data
Description
We use a two-stage etimation approach toward the estimation of factor copula models for mixed continuous and discrete data.
Usage
mle1factor(continuous, ordinal, count, copF1, gl, hessian, print.level)
mle2factor(continuous, ordinal, count, copF1, copF2, gl, hessian, print.level)
mle2factor.bvn(continuous, ordinal, count, copF1, copF2, gl, SpC, print.level)
Arguments
continuous |
|
ordinal |
|
count |
|
copF1 |
|
copF2 |
|
gl |
Gauss legendre quardrature nodes and weights. |
SpC |
Special case for the 2-factor copula model with BVN copulas. Select a bivariate copula at the 2nd factor to be fixed to independence. e.g. "SpC = 1" to set the first copula at the 2nd factor to independence. |
hessian |
If TRUE, the hessian of the negative log-likelihood is calculated during the minimization process. |
print.level |
Determines the level of printing which is done during the minimization process; same as in |
Details
Estimation is achieved by maximizing the joint log-likelihood over the copula parameters with the univariate parameters/distributions fixed as estimated at the first step of the proposed two-step estimation approach.
Value
A list containing the following components:
cutpoints |
The estimated univariate cutpoints. |
negbinest |
The estimated univariate parametes for the count responses (fitting the negative binomial distribution). |
loglik |
The maximized joint log-likelihood. |
cpar |
Estimated copula parameters in a list form. |
taus |
The estimated copula parameters in Kendall's tau scale. |
SEs |
The SEs of the Kendall's tau estimates. |
Author(s)
Sayed H. Kadhem s.kadhem@uea.ac.uk
Aristidis K. Nikoloulopoulos a.nikoloulopoulos@uea.ac.uk
References
Kadhem, S.H. and Nikoloulopoulos, A.K. (2021) Factor copula models for mixed data. British Journal of Mathematical and Statistical Psychology, 74, 365–403. doi:10.1111/bmsp.12231.
Krupskii, P. and Joe, H. (2013) Factor copula models for multivariate data. Journal of Multivariate Analysis, 120, 85–101. doi:10.1016/j.jmva.2013.05.001.
Nikoloulopoulos, A.K. and Joe, H. (2015) Factor copula models with item response data. Psychometrika, 80, 126–150. doi:10.1007/s11336-013-9387-4.
Examples
#------------------------------------------------
# Setting quadreture points
nq <- 25
gl <- gauss.quad.prob(nq)
#------------------------------------------------
# PE Data
#------------------ -----------------
data(PE)
continuous.PE1 = -PE[,1]
continuous.PE2 = PE[,2]
continuous.PE <- cbind(continuous.PE1, continuous.PE2)
categorical.PE <- PE[, 3:5]
#------------------------------------------------
# Estimation
#------------------ -----------------
#------------------ One-factor -----------------
# one-factor copula model
cop1f.PE <- c("joe", "joe", "rjoe", "joe", "gum")
est1factor.PE <- mle1factor(continuous.PE, categorical.PE,
count=NULL, copF1=cop1f.PE, gl, hessian = TRUE)
est1factor.PE
#------------------------------------------------
#------------------------------------------------
# GSS Data
#------------------ -----------------
data(GSS)
attach(GSS)
continuous.GSS <- cbind(INCOME, AGE)
ordinal.GSS <- cbind(DEGREE, PINCOME, PDEGREE)
count.GSS <- cbind(CHILDREN, PCHILDREN)
#------------------------------------------------
# Estimation
#------------------ -----------------
#------------------ One-factor -----------------
# one-factor copula model
cop1f.GSS <- c("joe","2rjoe","bvt3","bvt3",
"rgum","2rjoe","2rgum")
est1factor.GSS <- mle1factor(continuous.GSS, ordinal.GSS,
count.GSS, copF1 = cop1f.GSS, gl, hessian = TRUE)
Maximum likelhood estimation of factor tree copula models
Description
We use a two-stage estimation approach toward the estimation of factor tree copula models for item response data.
Usage
mle1FactorTree(y, A, cop, gl, hessian, print.level)
mle2FactorTree(y, A, cop, gl, hessian, print.level)
Arguments
y |
|
A |
|
cop |
|
gl |
Gauss legendre quardrature nodes and weights. |
hessian |
If TRUE, the hessian of the negative log-likelihood is calculated during the minimization process. |
print.level |
Determines the level of printing which is done during the minimization process; same as in |
Details
Estimation is achieved by maximizing the joint log-likelihood over the copula parameters with the univariate cutpoints fixed as estimated at the first step of the proposed two-step estimation approach.
Value
A list containing the following components:
cutpoints |
The estimated univariate cutpoints. |
loglik |
The maximized joint log-likelihood. |
taus |
The estimated copula parameters in Kendall's tau scale. |
SEs |
The SEs of the Kendall's tau estimates. |
Author(s)
Sayed H. Kadhem
Aristidis K. Nikoloulopoulos a.nikoloulopoulos@uea.ac.uk
References
Joe, H. (2014). Dependence Modelling with Copulas. Chapman & Hall, London.
Kadhem, S.H. and Nikoloulopoulos, A.K. (2022b) Factor tree copula models for item response data. Arxiv e-prints, <arXiv: 2201.00339>. https://arxiv.org/abs/2201.00339.
Examples
#------------------------------------------------
# Setting quadreture points
nq <- 5
gl <- gauss.quad.prob(nq)
#------------------------------------------------
# PTSD Data
#------------------ -----------------
data(PTSD)
ydat=PTSD
n=nrow(ydat)
d=ncol(ydat)
#------------------------------------------------
# Estimation
#------------------ -----------------
#selecting vine tree based on polychoric
rmat=polychoric0(ydat)$p
A.polychoric=selectFactorTrVine(y=ydat,rmat,alg=3)
#---------------- 1-factor tree ----------------
# 1-factor tree copula model
copf1 <- rep("frk",d)
coptree <- rep("frk",d-1)
cop <- c(copf1,coptree)
est1factortree <- mle1FactorTree(y=ydat, A=A.polychoric$VineTreeA, cop,
gl, hessian=FALSE, print.level=2)
#---------------- 2-factor tree ----------------
# 2-factor tree copula model
copf1 <- rep("frk",d)
copf2 <- rep("frk",d)
coptree <- rep("frk",d-1)
cop <- c(copf1,copf2,coptree)
est2factortree <- mle2FactorTree(y=ydat, A=A.polychoric$VineTreeA,
cop, gl, hessian=FALSE, print.level=2)
Maximum likelihood estimation of the bi-factor and second-order copula models for item response data
Description
We approach the estimation of the bi-factor and second-order copula models for item response data with the IFM method of Joe (2005).
Usage
mleBifactor(y, copnames1, copnames2, gl, ngrp, grpsize,
hessian, print.level)
mleSecond_order(y, copnames1, copnames2, gl, ngrp, grpsize,
hessian, print.level)
Arguments
y |
|
copnames1 |
For the bi-factor copula: |
copnames2 |
For the bi-factor copula: |
gl |
Gauss legendre quardrature nodes and weights. |
ngrp |
number of non-overlapping groups. |
grpsize |
vector indicating the size for each group, e.g., c(4,4,4) indicating four items in all three groups. |
hessian |
If TRUE, the hessian of the negative log-likelihood is calculated during the minimization process. |
print.level |
Determines the level of printing which is done during the minimization process; same as in |
Details
Estimation is achieved by maximizing the joint log-likelihood over the copula parameters with the univariate cutpoints fixed as estimated at the first step of the proposed two-step estimation approach.
Value
A list containing the following components:
cutpoints |
The estimated univariate cutpoints. |
taus |
The estimated copula parameters in Kendall's tau scale. |
SEs |
The SEs of the Kendall's tau estimates. |
loglik |
The maximized joint log-likelihood. |
Author(s)
Sayed H. Kadhem s.kadhem@uea.ac.uk
Aristidis K. Nikoloulopoulos a.nikoloulopoulos@uea.ac.uk
References
Joe, H. (2005) Asymptotic efficiency of the two-stage estimation method for copula-based models. Journal of Multivariate Analysis, 94, 401–419. doi:10.1016/j.jmva.2004.06.003.
Kadhem, S.H. and Nikoloulopoulos, A.K. (2023) Bi-factor and second-order copula models for item response data. Psychometrika, 88, 132–157. doi:10.1007/s11336-022-09894-2.
Examples
#------------------------------------------------
# Setting quadreture points
nq <- 25
gl <- gauss.quad.prob(nq)
#------------------------------------------------
# TAS Data
#------------------ -----------------
data(TAS)
#using a subset of the data
#group1: 1,3,6,7,9,13,14
grp1=c(1,3,6)
#group2: 2,4,11,12,17
grp2=c(2,4,11)
#group3: 5,8,10,15,16,18,19,20
grp3=c(5,8,10)
#Rearrange items within testlets
set.seed(123)
i=sample(1:nrow(TAS),500)
ydat=TAS[i,c(grp1,grp2,grp3)]
d=ncol(ydat);d
n=nrow(ydat);n
#size of each group
g1=length(grp1)
g2=length(grp2)
g3=length(grp3)
grpsize=c(g1,g2,g3)#group size
#number of groups
ngrp=length(grpsize)
#BI-FACTOR
copX0 = rep("bvt2", d)
copXg = c(rep("rgum", g1), rep("bvt3", g2+g3))
mle_Bifactor = mleBifactor(y = ydat, copX0, copXg, gl, ngrp, grpsize, hessian=FALSE, print.level=2)
Bivariate normal and Student cdfs with vectorized inputs
Description
Bivariate normal and Student cdfs with vectorized inputs
Usage
pbnorm(z1,z2,rho,icheck=FALSE)
pbvt(z1,z2,param,icheck=FALSE)
Arguments
z1 |
scalar or vector of reals |
z2 |
scalar or vector of reals |
rho |
scalar or vector parameter in (-1,1); vectors cannot have different lengths if larger than 1, each of z1,z2,rho either has length 1 or a constant n greater than 1 |
param |
vector of length 2, or matrix with 2 columns; vectors and number of rows of matrix cannot be different if larger than 1; for param, first column is rho, second column is df. |
icheck |
T if checks are made for proper inputs, default of F |
Details
Donnelly's code can be inaccurate in the tail when the tail probability is 2.e-9 or less (it sometimes returns 0). In the case the exchmvn code is used with dimension 2. Alternatively a user can use vectorized function pbivnorm() in the library pbivnorm, and write a function pbvncop based on it.
Value
cdf value(s)
References
Donnelly TG (1973). Algorithm 462: bivariate normal distribution, Communications of the Association for Computing Machinery, 16, 638;
Joe H (2014) Dependence Modeling with Copulas. Chapman & Hall/CRC.
Joe H (2014) CopulaModel: Dependence Modeling with Copulas. Software for book: Dependence Modeling with Copulas, Chapman & Hall/CRC, 2014.
Examples
cat("\n pbnorm rho changing\n")
z1=.3; z2=.4; rho=seq(-1,1,.1)
out1=pbnorm(z1,z2,rho)
print(cbind(rho,out1))
cat("\n pbnorm matrix inputs for z1, z2\n")
rho=.4
z1=c(-.5,.5,10.)
z2=c(-.4,.6,10.)
z1=matrix(z1,3,3)
z2=matrix(z2,3,3,byrow=TRUE)
out3=pbnorm(z1,z2,rho)
print(out3)
cdf2=rbind(rep(0,3),out3)
cdf2=cbind(rep(0,4),cdf2)
pmf=apply(cdf2,2,diff)
pmf2=apply(t(pmf),2,diff)
pmf2=t(pmf2) # rectangle probabilities
print(pmf2)
cat("\n pbvt rho changing\n")
z1=.3; z2=.4; rho=seq(-.9,.9,.1); nu=2
param=cbind(rho,rep(nu,length(rho)))
out1=pbvt(z1,z2,param)
print(cbind(rho,out1))
cat("\n pbvt z1 changing\n")
z1=seq(-2,2,.4)
z2=.4; rho=.5; nu=2
out2=pbvt(z1,z2,c(rho,nu))
print(cbind(z1,out2))
Polychoric correlation
Description
Polychoric correlation
Usage
polychoric0(odat,iprint=FALSE,prlevel=0)
Arguments
odat |
nxd matrix of ordinal responses in 0,...,(ncat-1) or 1,...,ncat |
iprint |
flag for printing of intermediate results, including bivariate tables for observed versus expected assuming discretized bivariate Gaussian |
prlevel |
print.level for nlm for numerical optimization |
Details
Polychoric correlation for ordinal random variables. The number of categories can vary.
Value
$polych is a polychoric correlation matrix based on two-stage estimate; $iposdef is an indicator if the 2-stage correlation matrix estimate is positive definite.
Examples
data(PTSD)
ydat=PTSD
rmat=polychoric0(ydat)$p
Simulation of 1- and 2-factor tree copula models for item response data
Description
Simulating item response data from the 1- and 2-factor tree copula models.
Usage
r1factortree(n, d, A, copname1, copnametree, theta1, delta,K)
r2factortree(n, d, A, copname1, copname2, copnametree,theta1, theta2, delta,K)
Arguments
n |
Sample size. |
d |
Number of observed variables/items. |
A |
|
theta1 |
copula parameter vector of size |
theta2 |
copula parameter vector of size |
delta |
copula parameter vector of size |
copname1 |
A name of a bivariate copula that link each of the oberved variabels with the first factor (note only a single copula family for all items with the factor). Choices are “bvn” for BVN, “bvt |
copname2 |
A name of a bivariate copula that link each of the oberved variabels with the second factor (note only a single copula family for all items with the factor). Choices are “bvn” for BVN, “bvt |
copnametree |
A name of a bivariate copula that link each of the oberved variabels with one another given the factors in the 1-truncated vine (note only a single copula family for all tree). Choices are “bvn” for BVN, “bvt |
K |
Number of categories for the observed variables/items. |
Value
Data matrix of dimension n \times d
, where n
is the sample size, and d
is the total number of observed variables/items.
Author(s)
Sayed H. Kadhem
Aristidis K. Nikoloulopoulos a.nikoloulopoulos@uea.ac.uk
References
Joe, H. (2014). Dependence Modelling with Copulas. Chapman & Hall, London.
Kadhem, S.H. and Nikoloulopoulos, A.K. (2022b) Factor tree copula models for item response data. Arxiv e-prints, <arXiv: 2201.00339>. https://arxiv.org/abs/2201.00339.
Examples
# ---------------------------------------------------
# ---------------------------------------------------
#Sample size
n = 500
#Ordinal Variables ---------------------------------
d = 5
#Categories for ordinal ----------------------------
K = 5
# ---------------------------------------------------
# 1-2-factor tree copula model
# ---------------------------------------------------
#Copula parameters
theta1 = rep(3, d)
theta2 = rep(2, d)
delta = rep(1.5, d-1)
#Copula names
copulaname_1f = "gum"
copulaname_2f = "gum"
copulaname_vine = "gum"
#vine array
#Dvine
d=5
A=matrix(0,d,d)
A[1,]=c(1,c(1:(d-1)))
diag(A)=1:d
#----------------- Simulating data ------------------
#1-factor tree copula
data_1ft = r1factortree(n, d, A, copulaname_1f, copulaname_vine,
theta1, delta,K)
#2-factor tree copula
data_2ft = r2factortree(n, d, A, copulaname_1f, copulaname_2f,
copulaname_vine, theta1,theta2, delta,K)
Simulation of bi-factor and second-order copula models for item response data
Description
Simulating item response data from the bi-factor and second-order copula models for item response data.
Usage
rBifactor(n, d, grpsize, categ, copnames1,copnames2, theta1, theta2)
rSecond_order(n, d, grpsize, categ, copnames1, copnames2, theta1, theta2)
Arguments
n |
Sample size. |
d |
Number of observed variables/items. |
grpsize |
vector indicating the size for each group, e.g., c(4,4,4) indicating four items in all three groups. |
categ |
A vector of categories for the observed variables/items. |
theta1 |
For the bi-factor model: copula parameter vector of size |
theta2 |
For the bi-factor model: copula parameter vector of size |
copnames1 |
For the bi-factor copula: |
copnames2 |
For the bi-factor copula: |
Value
Data matrix of dimension n\times d
, where n
is the sample size, and d
is the total number of observed variables/items.
Author(s)
Sayed H. Kadhem s.kadhem@uea.ac.uk
Aristidis K. Nikoloulopoulos a.nikoloulopoulos@uea.ac.uk
References
Kadhem, S.H. and Nikoloulopoulos, A.K. (2023) Bi-factor and second-order copula models for item response data. Psychometrika, 88, 132–157. doi:10.1007/s11336-022-09894-2.
Examples
# ---------------------------------------------------
# ---------------------------------------------------
#Sample size
n = 500
#Ordinal Variables ---------------------------------
d = 9
grpsize=c(3,3,3)
ngrp=length(grpsize)
#Categories for ordinal ----------------------------
categ = rep(3,d)
# ---------------------------------------------------
# ---------------------------------------------------
# Bi-factor copula model
# ---------------------------------------------------
# ---------------------------------------------------
#Copula parameters
theta = rep(2.5, d)
delta = rep(1.5, d)
#Copula names
copulanames1 = rep("gum", d)
copulanames2 = rep("gum", d)
#----------------- Simulating data ------------------
data_Bifactor = rBifactor(n, d, grpsize, categ, copulanames1,
copulanames2, theta, delta)
# ---------------------------------------------------
# ---------------------------------------------------
# Second-order copula model
# ---------------------------------------------------
# ---------------------------------------------------
#Copula parameters
theta= rep(1.5, ngrp)
delta = rep(2.5, d)
#Copula names
copulanames1 = rep("gum", ngrp)
copulanames2 = rep("gum", d)
data_Second_order = rSecond_order(n, d, grpsize, categ,
copulanames1, copulanames2, theta, delta)
Simulation of factor copula models for mixed continuous and discrete data
Description
Simulating standard uniform and ordinal response data from factor copula models.
Usage
r1factor(n, d1, d2, categ, theta, copF1)
r2factor(n, d1, d2, categ, theta, delta, copF1, copF2)
Arguments
n |
Sample size. |
d1 |
Number of standard uniform variables. |
d2 |
Number of ordinal variables. |
categ |
A vector of categories for the ordinal variables. |
theta |
Copula parameters for the 1st factor. |
delta |
Copula parameters for the 2nd factor. |
copF1 |
|
copF2 |
|
Value
Data matrix of dimension n\times d
, where n
is the sample size, and d=d_1+d_2
is the total number of variables.
Author(s)
Sayed H. Kadhem s.kadhem@uea.ac.uk
Aristidis K. Nikoloulopoulos a.nikoloulopoulos@uea.ac.uk
References
Kadhem, S.H. and Nikoloulopoulos, A.K. (2021) Factor copula models for mixed data. British Journal of Mathematical and Statistical Psychology, 74, 365–403. doi:10.1111/bmsp.12231.
Examples
# ---------------------------------------------------
# ---------------------------------------------------
# One-factor copula model
# ---------------------------------------------------
# ---------------------------------------------------
#Sample size ----------------------------------------
n = 100
#Continuous Variables ------------------------------
d1 = 5
#Ordinal Variables ---------------------------------
d2 = 3
#Categories for ordinal ----------------------------
categ = c(3,4,5)
#Copula parameters ---------------------------------
theta = rep(2, d1+d2)
#Copula names --------------------------------------
copnamesF1 = rep("gum", d1+d2)
#----------------- Simulating data ------------------
datF1 = r1factor(n, d1=d1, d2=d2, categ, theta, copnamesF1)
#------------ Plotting continuous data -------------
pairs(qnorm(datF1[, 1:d1]))
# ---------------------------------------------------
# ---------------------------------------------------
# Two-factor copula model
# ---------------------------------------------------
# ---------------------------------------------------
#Sample size ----------------------------------------
n = 100
#Continuous Variables ------------------------------
d1 = 5
#Ordinal Variables ---------------------------------
d2 = 3
#Categories for ordinal ----------------------------
categ = c(3,4,5)
#Copula parameters ---------------------------------
theta = rep(2.5, d1+d2)
delta = rep(1.5, d1+d2)
#Copula names --------------------------------------
copnamesF1 = rep("gum", d1+d2)
copnamesF2 = rep("gum", d1+d2)
#----------------- Simulating data ------------------
datF2 = r2factor(n, d1=d1, d2=d2, categ, theta, delta,
copnamesF1, copnamesF2)
#----------------- Plotting data ------------------
pairs(qnorm(datF2[,1:d1]))
Diagnostics to detect tail dependence or tail asymmetry.
Description
The sample versions of the correlation \rho_N
, upper semi-correlation \rho_N^+
(correlation in the joint upper quadrant) and lower semi-correlation \rho_N^-
(correlation in the joint lower quadrant). These are sample linear (when both variables are continuous), polychoric (when both variables are ordinal), and polyserial (when one variable is continuous and the
other is ordinal) correlations.
Usage
semicorr(dat,type)
Arguments
dat |
Data frame of mixed continuous and ordinal data. |
type |
A vector with 1's for the location of continuous data and 2's for the location of ordinal data. |
Value
A matrix containing the following components for semicorr():
rho |
|
lrho |
|
urho |
|
Author(s)
Sayed H. Kadhem s.kadhem@uea.ac.uk
Aristidis K. Nikoloulopoulos a.nikoloulopoulos@uea.ac.uk
References
Joe, H. (2014). Dependence Modelling with Copulas. Chapman and Hall/CRC.
Kadhem, S.H. and Nikoloulopoulos, A.K. (2021) Factor copula models for mixed data. British Journal of Mathematical and Statistical Psychology, 74, 365–403. doi:10.1111/bmsp.12231.
Examples
#------------------------------------------------
# PE Data
#------------------ -----------------
data(PE)
#correlation
continuous.PE1 <- -PE[,1]
continuous.PE <- cbind(continuous.PE1, PE[,2])
categorical.PE <- data.frame(apply(PE[, 3:5], 2, factor))
nPE <- cbind(continuous.PE, categorical.PE)
#-------------------------------------------------
# Semi-correlations-------------------------------
#-------------------------------------------------
# Exclude the dichotomous variable
sem.PE = nPE[,-3]
semicorr.PE = semicorr(dat = sem.PE, type = c(1,1,2,2))
#------------------------------------------------
#------------------------------------------------
# GSS Data
#------------------ -----------------
data(GSS)
attach(GSS)
continuous.GSS <- cbind(INCOME,AGE)
ordinal.GSS <- cbind(DEGREE,PINCOME,PDEGREE)
count.GSS <- cbind(CHILDREN,PCHILDREN)
# Transforming the COUNT variables to ordinal
# count1 : CHILDREN
count1 = count.GSS[,1]
count1[count1 > 3] = 3
# count2: PCHILDREN
count2 = count.GSS[,2]
count2[count2 > 7] = 7
# Combining both transformed count variables
ncount.GSS = cbind(count1, count2)
# Combining ordinal and transformed count variables
categorical.GSS <- cbind(ordinal.GSS, ncount.GSS)
categorical.GSS <- data.frame(apply(categorical.GSS, 2, factor))
# combining continuous and categorical variables
nGSS = cbind(continuous.GSS, categorical.GSS)
#-------------------------------------------------
# Semi-correlations-------------------------------
#-------------------------------------------------
semicorr.GSS = semicorr(dat = nGSS, type = c(1, 1, rep(2,5)))
Continuous/count to ordinal responses
Description
Transforming a continuous/count to ordinal variable with K
categories.
Usage
continuous2ordinal(continuous, categ)
count2ordinal(count, categ)
Arguments
continuous |
Matrix of continuous data. |
count |
Matrix of count data. |
categ |
The number of categories |
Examples
#------------------------------------------------
# PE Data
#------------------ -----------------
data(PE)
continuous.PE <- PE[, 1:2]
#Transforming the continuous to ordinal data :
tcontinuous = continuous2ordinal(continuous.PE, categ=5)
table(tcontinuous)
#Transforming the count to ordinal data:
set.seed(12345)
count.data = rpois(1000, 3)
tcount = count2ordinal(count.data, 5)
table(tcount)