Title: Theory-Driven Item Response Theory (IRT) Models
Version: 0.0.1.1
Description: IRT-M is a semi-supervised approach based on Bayesian Item Response Theory that produces theoretically identified underlying dimensions from input data and a constraints matrix. The methodology is fully described in 'Morucci et al. (2024), "Measurement That Matches Theory: Theory-Driven Identification in Item Response Theory Models"'. Details are available at https://www.cambridge.org/core/journals/american-political-science-review/article/measurement-that-matches-theory-theorydriven-identification-in-item-response-theory-models/395DA1DFE3DCD7B866DC053D7554A30B.
License: MIT + file LICENSE
Encoding: UTF-8
RoxygenNote: 7.3.2
Depends: truncnorm, tmvtnorm, utils, RcppProgress, RcppDist, ggplot2, R (≥ 3.5.0)
Suggests: knitr, rmarkdown, testthat (≥ 3.0.0), RColorBrewer, fastDummies, ggrepel, tidyverse, spelling
VignetteBuilder: knitr
LazyData: true
LinkingTo: Rcpp, RcppArmadillo, RcppDist, RcppProgress
Imports: coda, Rcpp, RcppArmadillo, ggridges, rlang, dplyr, reshape2,
Config/testthat/edition: 3
Language: en-US
NeedsCompilation: yes
Packaged: 2025-04-16 20:29:50 UTC; Promachos
Author: Marco Morucci [aut], Margaret Foster ORCID iD [cre], David Siegel ORCID iD [aut]
Maintainer: Margaret Foster <m.jenkins.foster@gmail.com>
Repository: CRAN
Date/Publication: 2025-04-19 12:22:01 UTC

Geweke Convergence

Description

Runs Geweke tests to assess MCMC convergence

Usage

Geweke_convergence(THETA)

Arguments

THETA

Matrix of parameter estimates from IRTM

Value

Proportion of values that fail the Geweke convergence test (p < 0.05) for each parameter


Methodological Codes

Description

Factor loading matrix for IRT-M vignette. This is a 793 row and 9 column dataset. The rows are derived from the binary encoding of the synthetic survey, with a row for every binarized question in the survey. The first 56 rows are retained metadata, and have lots of NA.

The data format is an intermediary processing for IRT-M, and is detailed in the vignette text.

@format A data frame with the following variables: #'

QCode

Mapping of the dimension coding key to the underlying question in the original (synthetic) survey data. The first 56 rows are blank because they map to survey and respondent metadata that doesn't relate to dimensions.

QMap

Mapping to the question in data processed by the vignette for variable estimation.

SubstantiveNotes

Brief human readable comments on the substantive meaning of the coded questions. These are for convenience of reference.

D1-Culture threat

Loading vector for the cultural threat dimension. A 1 indicates that the question is expected to load.

D2-ReligionThreat

Loading vector for the religious threat dimension.

D3-Economic Threat

Loading vector for the economic threat dimension.

D4-HealthThreat

Loading vector for the health threat dimension.

O1-OutcomeSupportImmigration

Loading vector for the immigration support composite.

O2-OutcomeSupportEU

Loading vector for the European Union support composite.

Details

Datasets for IRT-M Package

MCodes, synth_idvs, and synth_questions are included in the vignette

Source

IRT-M vignette walk through.


M_constrained_irt

Description

This function allows you to run the IRT model.

Usage

M_constrained_irt(
  Y,
  d,
  M = NULL,
  theta_fix = NULL,
  which_fix = NULL,
  nburn = 1000,
  nsamp = 1000,
  thin = 10,
  learn_Sigma = TRUE,
  learn_Omega = FALSE,
  hyperparameters = list(),
  display_progress = TRUE
)

Arguments

Y

a N x K matrix of responses given by N respondents to K items. Can contain missing values.

d

an integer specifying the number of latent dimensions.

M

a list of K d x d matrices (default=NULL).

theta_fix

a matrix with d columns containing the values of the latent dimensions for respondents that have pre-specified latent factors.

which_fix

a vector containing the indices of the respondents for which latent factors have been fixed.

nburn

an integer specifying the number of burn-in MCMC iterations.

nsamp

an integer specifying the number of sampling MCMC iterations.

thin

an integer specifying the number of thinning MCMC samples.

learn_Sigma

a Boolean specifying whether a covariance matrix for the latent factors should be learned.

learn_Omega

a Boolean specifying whether a covariance matrix for the latent loadings should be learned.

hyperparameters

a list of hyperparameters for the model.

display_progress

a Boolean specifying whether a progress bar should be displayed.

Value

A list containing the following components:

lambda

An array of dimension (K x d x nsamp/thin) containing posterior samples of item discrimination parameters.

b

A matrix of dimension (K x nsamp/thin) containing posterior samples of item difficulty parameters.

theta

An array of dimension (N x d x nsamp/thin) containing posterior samples of respondent latent trait values.

Sigma

An array of dimension (d x d x nsamp/thin) containing posterior samples of the covariance matrix of latent traits (only if learn_Sigma=TRUE).

Omega

An array of dimension (d x d x nsamp/thin) containing posterior samples of the covariance matrix of item loadings (only if learn_Omega=TRUE).


learned correlations

Description

Takes as input either the Sigma covariance matrix, if the user has learned the factor covariance, or the Omega covariance matrix, if the user has learned the loading covariance, as well as a vector of dimension names. Returns a correlation matrix with correlations between the dimensions.

Usage

dim_corr(cov_array, dim_names = NULL)

Arguments

cov_array

An array of dimension (d x d x nsamp/thin) containing posterior samples of the relevant covariance matrix.

dim_names

Vector of dimension names.

Value

A data frame containing the correlation matrix derived from t input covariance array, with rows and columns labeled according to dim_names (if provided). Each cell represents the correlation between the corresponding dimensions.


get_lambdas

Description

Takes as input the array of lambdas from the irt list, a vector of item names (can be taken from either Y_in or M_matrix), a vector of dimension names, and, optionally, a vector comprising elaborations about each item. Returns a list containing a data frame with the mean lambdas for each item-dimension pair, possibly attaching elaborations to each item's string, and a data frame with the items with the highest mean values of lambda for each dimension in order

Usage

get_lambdas(lambda_array, item_names, dim_names, item_elab = NULL)

Arguments

lambda_array

An array of dimension (K x d x nsamp/thin) containing posterior samples of item discrimination parameters.

item_names

Vector of item names.

dim_names

Vector of dimension names.

item_elab

A vector comprising elaborations about each item (Default = NULL).

Value

A list containing the following components:

av_lams

A data frame of dimension (K x (1+d)) containing averages of item discrimination parameters.

high_lams

A data frame of dimension (K x d) containing an ordered list of the items with the highest mean values of lambda for each dimension.


irt_m

Description

This function is a wrapper to enable easier use of the IRT-M model in M_constrained_irt. It takes as input two data frames: a N x K data frame, and a K x (1+d) M-matrix. The first column of the M-matrix should contain item identifiers that match the K column headers in the N x K data frame. If they do not match, the wrapper exits with an error. The wrapper computes anchors, Y_all (merged data and anchors), and a list of diagonal M-Matrices. The second two are used as inputs to M_constrained_irt, which runs the sampler. Also used as input are nburn (Default = 10^3), nsamp (Default = 10^3), thin (Default = 1), and learn_loadings (Default = FALSE). This last one defaults to having the sampler learn factor covariances. If set to true, it will learn loading covariances instead. Finally, the wrapper removes the anchors and returns an irt list.

Usage

irt_m(
  Y_in,
  d,
  M_matrix = NULL,
  nburn = 1000,
  nsamp = 1000,
  thin = 1,
  learn_loadings = FALSE
)

Arguments

Y_in

a N x K matrix of responses given by N respondents to K items. Can contain missing values. Column names should match first column in M_matrix.

d

an integer specifying the number of latent dimensions.

M_matrix

a K x (d+1) matrix of theoretical codings used to constrain IRT-M (default=NULL). First column should match column names in Y_in.

nburn

an integer specifying the number of burn-in MCMC iterations.

nsamp

an integer specifying the number of sampling MCMC iterations.

thin

an integer specifying the number of thinning MCMC samples.

learn_loadings

a Boolean specifying whether a covariance matrix for the latent loadings should be learned, instead of the default covariance matrix for latent dimensions.

Value

A list containing the following components:

lambda

An array of dimension (K x d x nsamp/thin) containing posterior samples of item discrimination parameters.

b

A matrix of dimension (K x nsamp/thin) containing posterior samples of item difficulty parameters.

theta

An array of dimension (N x d x nsamp/thin) containing posterior samples of respondent latent trait values.

Sigma

An array of dimension (d x d x nsamp/thin) containing posterior samples of the covariance matrix of latent traits (only if learn_Sigma=TRUE).

Omega

An array of dimension (d x d x nsamp/thin) containing posterior samples of the covariance matrix of item loadings (only if learn_Omega=TRUE).


irt_vis

Description

Takes as input the number of latent dimensions (d), an N x (d+z) data frame with average thetas in the first d columns and variables not included in the calculation of the thetas in the rest (T_out), and, optionally, a variable name (sub_name) taken from T_out, and an output file name (out_file), and returns either unconditional theta distributions or distributions subset by that variable

Usage

irt_vis(d, T_out, sub_name = NULL, out_file = NULL)

Arguments

d

The number of latent dimensions

T_out

N x (d+z) data frame with average latent dimensions in first d columns

sub_name

The name of a variable in T_out used for levels in the plot (Default = NULL)

out_file

Output file name for plot (Default = NULL)

Value

A ggplot2 object containing density ridge plots of the latent dimensions. When sub_name is NULL, the plot shows the distribution of each theta dimension. When sub_name is provided, the plot shows distributions faceted by theta dimension and grouped by the specified variable.


Mean Squared Error

Description

loss function for benchmarks

Usage

mse(ytrue, ypred, aggregate = TRUE, root = FALSE)

Arguments

ytrue

observed values,

ypred

predicted values

aggregate

logical for whether to take mean of estimate

root

logical for whether to return square root of MSE

Value

mean squared error


pair_gen_anchors

Description

This function generates anchor points from the M matrices. It creates d(d-1)*4 fake respondents such that, for every pair of dimensions: The first respondent has an extremely positive value of both dimensions of the pair, The second has an extremely positive value for dim 1 and an extremely negative value for dim 2 of each pair, The third has an extremely negative value for dim 1 and an extremely positive value for dim 2 of each pair, The fourth has an extremely negative value of both dimensions of the pair. These respondents' answers are imputed according to the directions of loadings specified by the M-matrices, i.e., if question k loads positively on dim 1 and positively on dim 2, the first respondent in the dim 1/dim 2 pair will have yes for question k. If question k+1 loads positively on dim 1 and negatively on dim 2 then the second respondent in the dim 1/dim 2 pair will have yes on question k+1 and so on.

Usage

pair_gen_anchors(M, A)

Arguments

M

a list containing K dxd M-matrices

A

What value should be considered extreme for the latent dimensions.

Value

A list with two elements:

Yfake

A matrix of dimension (d(d-1)*4 x K) containing the imputed answers of the fake anchor respondents, where d is the number of dimensions and K is the number of questions. Values are 0 (no) or 1 (yes).

theta_fake

A list of d(d-1)*4 vectors, each of length d, representing the latent trait values for the fake anchor respondents. Each vector contains mostly zeros, with extreme values (A or -A) at the positions corresponding to the pair of dimensions being considered.


Standardize Theta

Description

standardizes theta estimates

Usage

standardize_theta(theta, Sigma)

Arguments

theta

estimated object

Sigma

covariance matrix

Value

theta divided by sigma param


Synthetic Independent Variables

Description

A synthetic dataset of independent variables for post-estimate analysis in the vignette. Extraction of the data from the synthetic survey is described in the vignette.

Format

A 3000 row and 27 column dataset of synthetic survey responses. This closely follows the 94.3 Eurobarometer survey in structure. This dataset is a toy that is intended for the IRT-M vignette. For real analysis, see the original Eurobarometer data collection.


Questions for the Synthetic European sentiment survey in the vignette

Description

A synthetic dataset with 3000 rows and 148 questions. This data replicates the structure of questions following Eurobarometer 94.3 (2021). It is not intended to be analyzed independently from the vignette. '

Format

A data frame with 3000 rows and 148 columns representing synthetic survey responses


Average Thetas

Description

Compute matrix of theta means over posterior distributions

Usage

theta_av(theta_array)

Arguments

theta_array

An array of dimension (N x d x nsamp/thin) containing posterior samples of respondent latent trait values

Value

theta_av N x d matrix of average thetas


Theta Lambda Traceplots

Description

Creates traceplots for IRT parameter convergence diagnostics

Usage

theta_lambda_traceplots(irt, i = NULL, k = NULL)

Arguments

irt

An object containing theta and lambda parameters from an IRTM model

i

Index of the respondent to plot (randomly selected if NULL)

k

Index of the item to plot (randomly selected if NULL)

Value

Plots of theta, lambda, and their product across MCMC iterations

mirror server hosted at Truenetwork, Russian Federation.