Help for package regDIF

Title:

Regularized Differential Item Functioning

Version:

1.1.1

Author:

William Belzak

Maintainer:

William Belzak <wbelzak@gmail.com>

Description:

Performs regularization of differential item functioning (DIF) parameters in item response theory (IRT) models (Belzak & Bauer, 2020) https://pubmed.ncbi.nlm.nih.gov/31916799/ using a penalized expectation-maximization algorithm.

License:

MIT + file LICENSE

Encoding:

UTF-8

LazyData:

true

RoxygenNote:

7.3.1

VignetteBuilder:

knitr

Imports:

stats (> 3.0.0), utils, statmod, parallel, foreach

Suggests:

knitr, testthat (≥ 2.1.0), covr, rmarkdown

NeedsCompilation:

Packaged:

2024-02-23 00:10:48 UTC; wbelz

Repository:

CRAN

Date/Publication:

2024-02-23 00:30:02 UTC

Regularized differential item functioning for IRT and CFA models.

Description

Regularized Differential Item Functioning

Details

regDIF is a package that performs regularization of differential item functioning (DIF) in item response theory (IRT) and confirmatory factor analysis (CFA) models using a penalized expectation-maximization algorithm.

Author(s)

William Belzak wbelzak@gmail.com

Expectation step.

Description

Expectation step.

Usage

Estep(
  p,
  item_data,
  pred_data,
  item_type,
  mean_predictors,
  var_predictors,
  theta,
  samp_size,
  num_items,
  num_responses,
  adapt_quad,
  num_quad,
  get_eap,
  NA_cases
)

Arguments

p

List of parameters.

item_data

Matrix or dataframe of item responses.

pred_data

Matrix or dataframe of DIF and/or impact predictors.

item_type

Vector of character values indicating the item type.

mean_predictors

Possibly different matrix of predictors for the mean impact equation.

var_predictors

Possibly different matrix of predictors for the variance impact equation.

theta

Vector of fixed quadrature points.

samp_size

Sample size in dataset.

num_items

Number of items in dataset.

num_responses

Number of responses for each item.

adapt_quad

Logical indicating whether to use adaptive quadrature.

num_quad

Number of quadrature points used for approximating the latent variable.

get_eap

Logical indicating whether to compute EAP scores.

NA_cases

Logical vector indicating missing observations.

Value

a "list" of posterior values from the expectation step

Expectation step with proxy data.

Description

Expectation step with proxy data.

Usage

Estep_proxy(
  p,
  item_data,
  pred_data,
  item_type,
  mean_predictors,
  var_predictors,
  prox_data,
  samp_size,
  num_items,
  num_responses,
  get_eap,
  NA_cases
)

Arguments

p

List of parameters.

item_data

Matrix or dataframe of item responses.

pred_data

Matrix or dataframe of DIF and/or impact predictors.

item_type

Vector of character values indicating the item type.

mean_predictors

Possibly different matrix of predictors for the mean impact equation.

var_predictors

Possibly different matrix of predictors for the variance impact equation.

prox_data

Vector of observed proxy scores.

samp_size

Sample size in dataset.

num_items

Number of items in dataset.

num_responses

Number of responses for each item.

get_eap

Logical indicating whether to compute EAP scores.

NA_cases

Logical vector indicating missing observations.

Value

a "list" of posterior values from the expectation step

Maximization step.

Description

Maximization step.

Usage

Mstep(
  p,
  item_data,
  pred_data,
  prox_data,
  mean_predictors,
  var_predictors,
  eout,
  item_type,
  pen_type,
  tau_current,
  pen,
  pen.deriv,
  alpha,
  gamma,
  anchor,
  final_control,
  samp_size,
  num_responses,
  num_items,
  num_quad,
  num_predictors,
  num_tau,
  max_tau,
  method
)

Arguments

p

List of parameters.

item_data

Matrix or data frame of item responses.

pred_data

Matrix or data frame of DIF and/or impact predictors.

prox_data

Vector of observed proxy scores.

mean_predictors

Possibly different matrix of predictors for the mean impact equation.

var_predictors

Possibly different matrix of predictors for the variance impact equation.

eout

E-step output, including matrix for item and impact equations, in addition to theta values (possibly adaptive).

item_type

Optional character value or vector indicating the type of item to be modeled.

pen_type

Character value indicating the penalty function to use.

tau_current

A single numeric value of tau that exists within tau_vec.

pen

Current penalty index.

pen.deriv

Logical value indicating whether to use the second derivative of the penalized parameter during regularization. The default is TRUE.

alpha

Numeric value indicating the alpha parameter in the elastic net penalty function.

gamma

Numeric value indicating the gamma parameter in the MCP function.

anchor

Optional numeric value or vector indicating which item response(s) are anchors (e.g., anchor = 1).

final_control

Control parameters.

samp_size

Sample size in data set.

num_responses

Number of responses for each item.

num_items

Number of items in data set.

num_quad

Number of quadrature points used for approximating the latent variable.

num_predictors

Number of predictors.

num_tau

Logical indicating whether the minimum tau value needs to be identified during the regDIF procedure.

max_tau

Logical indicating whether to output the minimum tau value needed to remove all DIF from the model.

method

Character value indicating the type of optimization method. Options include "MNR", "UNR", and "CD"

Value

a "list" of estimates obtained from the maximization step using multivariate Newton-Raphson

Maximization step using latent variable and item response blocks.

Description

Maximization step using latent variable and item response blocks.

Usage

Mstep_block(
  p,
  item_data,
  pred_data,
  prox_data,
  mean_predictors,
  var_predictors,
  eout,
  item_type,
  pen_type,
  tau_current,
  pen,
  alpha,
  gamma,
  anchor,
  final_control,
  samp_size,
  num_responses,
  num_items,
  num_quad,
  num_predictors,
  num_tau,
  max_tau
)

Arguments

p

List of parameters.

item_data

Matrix or data frame of item responses.

pred_data

Matrix or data frame of DIF and/or impact predictors.

prox_data

Vector of observed proxy scores.

mean_predictors

Possibly different matrix of predictors for the mean impact equation.

var_predictors

Possibly different matrix of predictors for the variance impact equation.

eout

E-step output, including matrix for item and impact equations, in addition to theta values (possibly adaptive).

item_type

Optional character value or vector indicating the type of item to be modeled.

pen_type

Character value indicating the penalty function to use.

tau_current

A single numeric value of tau that exists within tau_vec.

pen

Current penalty index.

alpha

Numeric value indicating the alpha parameter in the elastic net penalty function.

gamma

Numeric value indicating the gamma parameter in the MCP function.

anchor

Optional numeric value or vector indicating which item response(s) are anchors (e.g., anchor = 1).

final_control

Control parameters.

samp_size

Sample size in data set.

num_responses

Number of responses for each item.

num_items

Number of items in data set.

num_quad

Number of quadrature points used for approximating the latent variable.

num_predictors

Number of predictors.

num_tau

Logical indicating whether the minimum tau value needs to be identified during the regDIF procedure.

max_tau

Logical indicating whether to output the minimum tau value needed to remove all DIF from the model.

Value

a "list" of estimates obtained from the maximization step using multivariate Newton-Raphson

Maximization step using coordinate descent optimization.

Description

Maximization step using coordinate descent optimization.

Usage

Mstep_cd(
  p,
  item_data,
  pred_data,
  mean_predictors,
  var_predictors,
  eout,
  item_type,
  pen_type,
  tau_current,
  pen,
  alpha,
  gamma,
  anchor,
  final_control,
  samp_size,
  num_responses,
  num_items,
  num_quad,
  num_predictors,
  num_tau,
  max_tau
)

Arguments

p

List of parameters.

item_data

Matrix or data frame of item responses.

pred_data

Matrix or data frame of DIF and/or impact predictors.

mean_predictors

Possibly different matrix of predictors for the mean impact equation.

var_predictors

Possibly different matrix of predictors for the variance impact equation.

eout

E step output, including matrix for item and impact equations, in addition to theta values (possibly adaptive).

item_type

Optional character value or vector indicating the type of item to be modeled.

pen_type

Character value indicating the penalty function to use.

tau_current

A single numeric value of tau that exists within tau_vec.

pen

Current penalty index.

alpha

Numeric value indicating the alpha parameter in the elastic net penalty function.

gamma

Numeric value indicating the gamma parameter in the MCP function.

anchor

Optional numeric value or vector indicating which item response(s) are anchors (e.g., anchor = 1).

final_control

Control parameters.

samp_size

Sample size in data set.

num_responses

Number of responses for each item.

num_items

Number of items in data set.

num_quad

Number of quadrature points used for approximating the latent variable.

num_predictors

Number of predictors.

num_tau

Logical indicating whether the minimum tau value needs to be identified during the regDIF procedure.

max_tau

Logical indicating whether to output the maximum tau value needed to remove all DIF from the model.

Value

a "list" of estimates obtained from the maximization step using univariate Newton-Raphson (i.e., one step of coordinate descent)

Maximization step using coordinate descent optimization.

Description

Maximization step using coordinate descent optimization.

Usage

Mstep_cd2(
  p,
  item_data,
  pred_data,
  mean_predictors,
  var_predictors,
  eout,
  item_type,
  pen_type,
  tau_current,
  pen,
  alpha,
  gamma,
  anchor,
  final_control,
  samp_size,
  num_responses,
  num_items,
  num_quad,
  num_predictors,
  num_tau,
  max_tau
)

Arguments

p

List of parameters.

item_data

Matrix or data frame of item responses.

pred_data

Matrix or data frame of DIF and/or impact predictors.

mean_predictors

Possibly different matrix of predictors for the mean impact equation.

var_predictors

Possibly different matrix of predictors for the variance impact equation.

eout

E step output, including matrix for item and impact equations, in addition to theta values (possibly adaptive).

item_type

Optional character value or vector indicating the type of item to be modeled.

pen_type

Character value indicating the penalty function to use.

tau_current

A single numeric value of tau that exists within tau_vec.

pen

Current penalty index.

alpha

Numeric value indicating the alpha parameter in the elastic net penalty function.

gamma

Numeric value indicating the gamma parameter in the MCP function.

anchor

Optional numeric value or vector indicating which item response(s) are anchors (e.g., anchor = 1).

final_control

Control parameters.

samp_size

Sample size in data set.

num_responses

Number of responses for each item.

num_items

Number of items in data set.

num_quad

Number of quadrature points used for approximating the latent variable.

num_predictors

Number of predictors.

num_tau

Logical indicating whether the minimum tau value needs to be identified during the regDIF procedure.

max_tau

Logical indicating whether to output the maximum tau value needed to remove all DIF from the model.

Value

a "list" of estimates obtained from the maximization step using coordinate descent

Maximization step.

Description

Maximization step.

Usage

Mstep_simple(
  p,
  item_data,
  pred_data,
  prox_data,
  mean_predictors,
  var_predictors,
  eout,
  item_type,
  pen_type,
  tau_current,
  pen,
  pen.deriv,
  alpha,
  gamma,
  anchor,
  final_control,
  samp_size,
  num_responses,
  num_items,
  num_quad,
  num_predictors,
  num_tau,
  max_tau,
  optim_method
)

Arguments

p

List of parameters.

item_data

Matrix or data frame of item responses.

pred_data

Matrix or data frame of DIF and/or impact predictors.

prox_data

Vector of observed proxy scores.

mean_predictors

Possibly different matrix of predictors for the mean impact equation.

var_predictors

Possibly different matrix of predictors for the variance impact equation.

eout

E-step output, including matrix for item and impact equations, in addition to theta values (possibly adaptive).

item_type

Optional character value or vector indicating the type of item to be modeled.

pen_type

Character value indicating the penalty function to use.

tau_current

A single numeric value of tau that exists within tau_vec.

pen

Current penalty index.

pen.deriv

Logical value indicating whether to use the second derivative of the penalized parameter during regularization. The default is TRUE.

alpha

Numeric value indicating the alpha parameter in the elastic net penalty function.

gamma

Numeric value indicating the gamma parameter in the MCP function.

anchor

Optional numeric value or vector indicating which item response(s) are anchors (e.g., anchor = 1).

final_control

Control parameters.

samp_size

Sample size in data set.

num_responses

Number of responses for each item.

num_items

Number of items in data set.

num_quad

Number of quadrature points used for approximating the latent variable.

num_predictors

Number of predictors.

num_tau

Logical indicating whether the minimum tau value needs to be identified during the regDIF procedure.

max_tau

Logical indicating whether to output the minimum tau value needed to remove all DIF from the model.

optim_method

Character value of the type of estimation method to use

Value

a "list" of estimates obtained from the maximization step using univariate Newton-Raphson

Binary item tracelines.

Description

Binary item tracelines.

Usage

bernoulli_traceline_pts(p_item, theta, pred_data, samp_size)

Arguments

p_item

Vector of item parameters.

theta

Vector of theta values.

pred_data

Matrix or dataframe of DIF and/or impact predictors.

samp_size

Sample size in dataset.

Value

a "matrix" of probability values for Bernoulli item likelihood

Binary item tracelines.

Description

Binary item tracelines.

Usage

bernoulli_traceline_pts2(p_item, theta, pred_data, samp_size)

Arguments

p_item

Vector of item parameters.

theta

Vector of theta values.

pred_data

Matrix or dataframe of DIF and/or impact predictors.

samp_size

Sample size in dataset.

Value

a "matrix" of probability values for Bernoulli item likelihood

Binary item tracelines for proxy scores.

Description

Binary item tracelines for proxy scores.

Usage

bernoulli_traceline_pts_proxy(p_item, prox_data, pred_data)

Arguments

p_item

Vector of item parameters.

prox_data

Vector of observed proxy scores.

pred_data

Matrix or dataframe of DIF and/or impact predictors.

Value

a "matrix" of probability values for Bernoulli item likelihood using observed proxy scores

Coefficient function for regDIF function

Description

Coefficient function for regDIF function

Usage

## S3 method for class 'regDIF'
coef(object, tau = NULL, method = "bic", ...)

Arguments

object

Fitted regDIF model object.

tau

Optional character or numeric indicating the tau(s) at which the model coefficients are returned. For character value, may be "tau.min", which returns model coefficients for the value of tau at which the minimum fit statistic is identified. For numeric, the value(s) provided corresponds to the value(s) of tau.

method

Character value indicating the model fit statistic to be used for determining "tau.min". Default is "bic". May also be "aic".

...

Additional arguments to be passed through to coef.

Value

NULL

Ordinal tracelines.

Description

Ordinal tracelines.

Usage

cumulative_traceline_pts(
  p_item,
  theta,
  pred_data,
  samp_size,
  num_responses_item,
  num_quad
)

Arguments

p_item

Vector of item parameters.

theta

Vector of theta values.

pred_data

Matrix or dataframe of DIF and/or impact predictors.

samp_size

Sample size in dataset.

num_responses_item

Number of responses for item.

num_quad

Number of quadrature points used for approximating the latent variable.

Value

a "matrix" of probability values for categorical (cumulative) item likelihood

Ordinal tracelines using proxy data.

Description

Ordinal tracelines using proxy data.

Usage

cumulative_traceline_pts_proxy(
  p_item,
  prox_data,
  pred_data,
  samp_size,
  num_responses_item
)

Arguments

p_item

Vector of item parameters.

prox_data

Vector of observed proxy scores.

pred_data

Matrix or dataframe of DIF and/or impact predictors.

samp_size

Sample size in dataset.

num_responses_item

Number of responses for item.

Value

a "matrix" of probability values for categorical (cumulative) item likelihood

Partial derivatives for mean impact equation.

Description

Partial derivatives for mean impact equation.

Usage

d_alpha(
  p_impact,
  etable,
  theta,
  mean_predictors,
  var_predictors,
  cov,
  samp_size,
  num_items,
  num_quad
)

Arguments

p_impact

Vector of impact parameters.

etable

E-table for impact.

theta

Matrix of adaptive theta values.

mean_predictors

Possibly different matrix of predictors for the mean impact equation.

var_predictors

Possibly different matrix of predictors for the variance impact equation.

cov

Covariate being maximized.

samp_size

Sample size in data set.

num_items

Number of items in data set.

num_quad

Number of quadrature points used for approximating the latent variable.

Value

a "list" of first and second partial derivatives for mean impact equation (to use with coordinate descent and univariate Newton-Raphson)

Partial derivatives for mean impact equation using proxy data.

Description

Partial derivatives for mean impact equation using proxy data.

Usage

d_alpha_proxy(
  p_impact,
  prox_data,
  mean_predictors,
  var_predictors,
  cov,
  samp_size,
  num_items
)

Arguments

p_impact

Vector of impact parameters.

prox_data

Matrix of observed proxy scores.

mean_predictors

Possibly different matrix of predictors for the mean impact equation.

var_predictors

Possibly different matrix of predictors for the variance impact equation.

cov

Covariate being maximized.

samp_size

Sample size in data set.

num_items

Number of items in data set.

Value

a "list" of first and second partial derivatives for mean impact equation (to use with coordinate descent and univariate Newton-Raphson)

Partial derivatives for binary items.

Description

Partial derivatives for binary items.

Usage

d_bernoulli(
  parm,
  p_item,
  etable_item,
  theta,
  pred_data,
  cov,
  samp_size,
  num_items,
  num_quad
)

Arguments

parm

Item parameter being maximized.

p_item

Vector of item parameters.

etable_item

E-table for item.

theta

Matrix of adaptive theta values.

pred_data

Matrix or dataframe of DIF and/or impact predictors.

cov

Covariate being maximized.

samp_size

Sample size in dataset.

num_items

Number of items in dataset.

num_quad

Number of quadrature points used for approximating the latent variable.

Value

a "list" of first and second partial derivatives for Bernoulli item likelihood (to use with coordinate descent and univariate Newton-Raphson)

Partial derivatives for binary items by item-blocks.

Description

Partial derivatives for binary items by item-blocks.

Usage

d_bernoulli_itemblock(
  p_item,
  etable,
  theta,
  pred_data,
  item_data_current,
  samp_size,
  num_items,
  num_predictors,
  num_quad
)

Arguments

p_item

Vector of item parameters.

etable

E-table for item.

theta

Matrix of adaptive theta values.

pred_data

Matrix or dataframe of DIF and/or impact predictors.

item_data_current

Vector of current item responses.

samp_size

Sample size in dataset.

num_items

Number of items in dataset.

num_predictors

Number of predictors in dataset.

num_quad

Number of quadrature points used for approximating the latent variable.

Value

a "list" of first and second partial derivatives for Bernoulli item likelihood (to use with multivariate Newton-Raphson)

Partial derivatives for binary items by item-blocks using observed score proxy.

Description

Partial derivatives for binary items by item-blocks using observed score proxy.

Usage

d_bernoulli_itemblock_proxy(
  p_item,
  pred_data,
  item_data_current,
  prox_data,
  samp_size,
  num_items,
  num_predictors,
  num_quad
)

Arguments

p_item

Vector of item parameters.

pred_data

Matrix or dataframe of DIF and/or impact predictors.

item_data_current

Vector of current item responses.

prox_data

Vector of observed proxy scores.

samp_size

Sample size in dataset.

num_items

Number of items in dataset.

num_predictors

Number of predictors in dataset.

Value

a "list" of first and second partial derivatives for Bernoulli item likelihood (to use with multivariate Newton-Raphson and observed proxy scores)

Partial derivatives for binary items with proxy data.

Description

Partial derivatives for binary items with proxy data.

Usage

d_bernoulli_proxy(
  parm,
  p_item,
  prox_data,
  pred_data,
  item_data_current,
  cov,
  samp_size,
  num_items
)

Arguments

parm

Item parameter being maximized.

p_item

Vector of item parameters.

prox_data

Vector of observed proxy scores.

pred_data

Matrix or dataframe of DIF and/or impact predictors.

item_data_current

Vector of current item responses.

cov

Covariate being maximized.

samp_size

Sample size in dataset.

num_items

Number of items in dataset.

Value

a "list" of first and second partial derivatives for Bernoulli item likelihood (to use with coordinate descent and univariate Newton-Raphson)

Partial derivatives for ordinal items.

Description

Partial derivatives for ordinal items.

Usage

d_categorical(
  parm,
  p_item,
  etable_item,
  theta,
  pred_data,
  thr,
  cov,
  samp_size,
  num_responses_item,
  num_items,
  num_quad
)

Arguments

parm

Item parameter being maximized.

p_item

Vector of item parameters.

etable_item

E-table for impact.

theta

Matrix of adaptive theta values.

pred_data

Matrix or dataframe of DIF and/or impact predictors.

thr

Threshold value being maximized.

cov

Covariate being maximized.

samp_size

Sample size in dataset.

num_responses_item

Number of responses for item.

num_items

Number of items in dataset.

num_quad

Number of quadrature points used for approximating the latent variable.

Value

a "list" of first and second partial derivatives for categorical item likelihood (to use with coordinate descent and univariate Newton-Raphson)

Partial derivatives for ordinal items.

Description

Partial derivatives for ordinal items.

Usage

d_categorical_itemblock(
  parm,
  p_item,
  etable,
  theta,
  pred_data,
  item_data_current,
  samp_size,
  num_responses_item,
  num_items,
  num_predictors,
  num_quad
)

Arguments

parm

Item parameter being maximized.

p_item

Vector of item parameters.

etable

E-table for impact.

theta

Matrix of adaptive theta values.

pred_data

Matrix or dataframe of DIF and/or impact predictors.

item_data_current

Vector of current item responses.

samp_size

Sample size in dataset.

num_responses_item

Number of responses for item.

num_items

Number of items in dataset.

num_predictors

Number of predictors in dataset.

num_quad

Number of quadrature points used for approximating the latent variable.

Value

a "list" of first and second partial derivatives for categorical item likelihood (to use with multivariate Newton-Raphson)

Partial derivatives for ordinal items using proxy data.

Description

Partial derivatives for ordinal items using proxy data.

Usage

d_categorical_proxy(
  parm,
  p_item,
  prox_data,
  pred_data,
  item_data_current,
  thr,
  cov,
  samp_size,
  num_responses_item,
  num_items
)

Arguments

parm

Item parameter being maximized.

p_item

Vector of item parameters.

prox_data

Vector of observed proxy scores.

pred_data

Matrix or dataframe of DIF and/or impact predictors.

item_data_current

Vector of current item responses.

thr

Threshold value being maximized.

cov

Covariate being maximized.

samp_size

Sample size in dataset.

num_responses_item

Number of responses for item.

num_items

Number of items in dataset.

Value

a "list" of first and second partial derivatives for categorical item likelihood (to use with coordinate descent and univariate Newton-Raphson)

Partial derivatives for continuous items.

Description

Partial derivatives for continuous items.

Usage

d_gaussian_itemblock(
  p_item,
  etable,
  theta,
  responses_item,
  pred_data,
  samp_size,
  num_items,
  num_quad,
  num_predictors
)

Arguments

p_item

Vector of item parameters.

etable

E-table.

theta

Matrix of adaptive theta values.

responses_item

Vector of item responses.

pred_data

Matrix or dataframe of DIF and/or impact predictors.

samp_size

Sample size in dataset.

num_items

Number of items in dataset.

num_quad

Number of quadrature points used for approximating the latent variable.

num_predictors

Number of predictors in dataset.

Value

a "list" of first and second partial derivatives for mean value of Gaussian item likelihood (to use with coordinate descent and univariate Newton-Raphson)

Partial derivatives for continuous items using proxy data.

Description

Partial derivatives for continuous items using proxy data.

Usage

d_gaussian_itemblock_proxy(
  p_item,
  prox_data,
  responses_item,
  pred_data,
  samp_size,
  num_items,
  num_quad,
  num_predictors
)

Arguments

p_item

Vector of item parameters.

prox_data

Vector of observed proxy scores.

responses_item

Vector of item responses.

pred_data

Matrix or dataframe of DIF and/or impact predictors.

samp_size

Sample size in dataset.

num_items

Number of items in dataset.

num_quad

Number of quadrature points used for approximating the latent variable.

num_predictors

Number of predictors in dataset.

Value

a "list" of first and second partial derivatives for mean value of Gaussian item likelihood (to use with coordinate descent and univariate Newton-Raphson)

Partial derivatives for mean and variance impact equation.

Description

Partial derivatives for mean and variance impact equation.

Usage

d_impact_block(
  p_mean,
  p_var,
  etable,
  theta,
  mean_predictors,
  var_predictors,
  samp_size,
  num_items,
  num_quad,
  num_predictors
)

Arguments

p_mean

Vector of mean impact parameters.

p_var

Vector of variance impact parameters.

etable

E-table for impact.

theta

Matrix of adaptive theta values.

mean_predictors

Possibly different matrix of predictors for the mean impact equation.

var_predictors

Possibly different matrix of predictors for the variance impact equation.

samp_size

Sample size in data set.

num_items

Number of items in data set.

num_quad

Number of quadrature points used for approximating the latent variable.

num_predictors

Number of predictors in dataset.

Value

a "list" of first and second partial derivatives for impact equation (to use with multivariate Newton-Raphson)

Partial derivatives for mean and variance impact equation using observed score proxy.

Description

Partial derivatives for mean and variance impact equation using observed score proxy.

Usage

d_impact_block_proxy(
  p_mean,
  p_var,
  prox_data,
  mean_predictors,
  var_predictors,
  samp_size,
  num_items,
  num_quad,
  num_predictors
)

Arguments

p_mean

Vector of mean impact parameters.

p_var

Vector of variance impact parameters.

prox_data

Vector of observed proxy scores.

mean_predictors

Possibly different matrix of predictors for the mean impact equation.

var_predictors

Possibly different matrix of predictors for the variance impact equation.

samp_size

Sample size in data set.

num_items

Number of items in data set.

num_predictors

Number of predictors in dataset.

Value

a "list" of first and second partial derivatives for impact equation (to use with multivariate Newton-Rapshon and observed proxy scores)

Partial derivatives for mean parameter of continuous items.

Description

Partial derivatives for mean parameter of continuous items.

Usage

d_mu_gaussian(
  parm,
  p_item,
  etable,
  theta,
  responses_item,
  pred_data,
  cov,
  samp_size,
  num_items,
  num_quad
)

Arguments

parm

Item parameter being maximized.

p_item

Vector of item parameters.

etable

E-table.

theta

Matrix of adaptive theta values.

responses_item

Vector of item responses.

pred_data

Matrix or dataframe of DIF and/or impact predictors.

cov

Covariate being maximized.

samp_size

Sample size in dataset.

num_items

Number of items in dataset.

num_quad

Number of quadrature points used for approximating the latent variable.

Value

a "list" of first and second partial derivatives for mean value of Gaussian item likelihood (to use with coordinate descent and univariate Newton-Raphson)

Partial derivatives for mean parameter of continuous items with proxy data.

Description

Partial derivatives for mean parameter of continuous items with proxy data.

Usage

d_mu_gaussian_proxy(
  parm,
  p_item,
  prox_data,
  responses_item,
  pred_data,
  cov,
  samp_size
)

Arguments

parm

Item parameter being maximized.

p_item

Vector of item parameters.

prox_data

Vector of observed proxy scores.

responses_item

Vector of item responses.

pred_data

Matrix or dataframe of DIF and/or impact predictors.

cov

Covariate being maximized.

samp_size

Sample size in dataset.

Value

a "list" of first and second partial derivatives for mean value of Gaussian item likelihood (to use with coordinate descent and univariate Newton-Raphson)

Partial derivatives for mean impact equation.

Description

Partial derivatives for mean impact equation.

Usage

d_phi(
  p_impact,
  etable,
  theta,
  mean_predictors,
  var_predictors,
  cov,
  samp_size,
  num_items,
  num_quad
)

Arguments

p_impact

Vector of impact parameters.

etable

E-table for impact.

theta

Matrix of adaptive theta values.

mean_predictors

Possibly different matrix of predictors for the mean impact equation.

var_predictors

Possibly different matrix of predictors for the variance impact equation.

cov

Covariate being maximized.

samp_size

Sample size in dataset.

num_items

Number of items in dataset.

num_quad

Number of quadrature points used for approximating the latent variable.

Value

a "list" of first and second partial derivatives for variance impact equation (to use with coordinate descent and univariate Newton-Raphson)

Partial derivatives for mean impact equation using proxy data.

Description

Partial derivatives for mean impact equation using proxy data.

Usage

d_phi_proxy(
  p_impact,
  prox_data,
  mean_predictors,
  var_predictors,
  cov,
  samp_size,
  num_items
)

Arguments

p_impact

Vector of impact parameters.

prox_data

Matrix of observed proxy scores.

mean_predictors

Possibly different matrix of predictors for the mean impact equation.

var_predictors

Possibly different matrix of predictors for the variance impact equation.

cov

Covariate being maximized.

samp_size

Sample size in dataset.

num_items

Number of items in dataset.

Value

a "list" of first and second partial derivatives for variance impact equation (to use with coordinate descent and univariate Newton-Raphson)

Partial derivatives for variance parameter of continuous items.

Description

Partial derivatives for variance parameter of continuous items.

Usage

d_sigma_gaussian(
  parm,
  p_item,
  etable,
  theta,
  responses_item,
  pred_data,
  cov,
  samp_size,
  num_items,
  num_quad
)

Arguments

parm

Item parameter being maximized.

p_item

Vector of item parameters.

etable

E-table for impact.

theta

Matrix of adaptive theta values.

responses_item

Vector of item responses.

pred_data

Matrix or dataframe of DIF and/or impact predictors.

cov

Covariate being maximized.

samp_size

Sample size in dataset.

num_items

Number of items in dataset.

num_quad

Number of quadrature points used for approximating the latent variable.

Value

a "list" of first and second partial derivatives for variance value of Gaussian item likelihood (to use with coordinate descent and univariate Newton-Raphson)

Partial derivatives for variance parameter of continuous items with proxy data.

Description

Partial derivatives for variance parameter of continuous items with proxy data.

Usage

d_sigma_gaussian_proxy(
  parm,
  p_item,
  prox_data,
  responses_item,
  pred_data,
  cov,
  samp_size,
  num_items
)

Arguments

parm

Item parameter being maximized.

p_item

Vector of item parameters.

prox_data

Vector of observed proxy scores.

responses_item

Vector of item responses.

pred_data

Matrix or dataframe of DIF and/or impact predictors.

cov

Covariate being maximized.

samp_size

Sample size in dataset.

num_items

Number of items in dataset.

Value

a "list" of first and second partial derivatives for variance value of Gaussian item likelihood (to use with coordinate descent and univariate Newton-Raphson)

Penalized expectation-maximization algorithm.

Description

Penalized expectation-maximization algorithm.

Usage

em_estimation(
  p,
  item_data,
  pred_data,
  prox_data,
  mean_predictors,
  var_predictors,
  item_type,
  theta,
  pen_type,
  tau_vec,
  id_tau,
  num_tau,
  alpha,
  gamma,
  pen,
  pen.deriv,
  anchor,
  final_control,
  samp_size,
  num_items,
  num_responses,
  num_predictors,
  num_quad,
  adapt_quad,
  optim_method,
  estimator_history,
  estimator_limit,
  NA_cases,
  exit_code
)

Arguments

p

List of parameters with starting values obtained from preprocess.

item_data

Matrix or data frame of item responses.

pred_data

Matrix or data frame of DIF and/or impact predictors.

prox_data

Vector of observed proxy scores.

mean_predictors

Possibly different matrix of predictors for the mean impact equation.

var_predictors

Possibly different matrix of predictors for the variance impact equation.

item_type

Character value or vector indicating the type of item to be modeled.

theta

Vector of fixed quadrature points.

pen_type

Character value indicating the penalty function to use.

tau_vec

Vector of tau values that either are automatically generated or provided by the user. The first tau_vec will be equal to Inf to identify a minimal value of tau in which all DIF is removed from the model.

id_tau

Logical indicating whether to identify the minimum value of tau in which all DIF parameters are removed from the model.

num_tau

Numeric value indicating the number of tau values to run regDIF on.

alpha

Numeric value indicating the alpha parameter in the elastic net penalty function.

gamma

Numeric value indicating the gamma parameter in the MCP function.

pen

Index for the tau vector.

pen.deriv

Logical value indicating whether to use the second derivative of the penalized parameter during regularization. The default is TRUE.

anchor

Optional numeric value or vector indicating which item response(s) are anchors (e.g., anchor = 1).

final_control

Control parameters.

samp_size

Numeric value indicating the sample size.

num_items

Numeric value indicating the number of items.

num_responses

Vector with number of responses for each item.

num_predictors

Numeric value indicating the number of predictors.

num_quad

Numeric value indicating the number of quadrature points.

adapt_quad

Logical value indicating whether to use adaptive quad. needs to be identified.

optim_method

Character value indicating the type of optimization method to use.

estimator_history

List to save EM iterations for supplemental EM algorithm.

estimator_limit

Logical value indicating whether the EM algorithm reached the maxit limit in the previous estimation round.

NA_cases

Logical vector indicating if observation is missing.

exit_code

Integer indicating if the model has converged properly.

Value

a "list" of matrices with unprocessed model estimates

Continuous tracelines.

Description

Continuous tracelines.

Usage

gaussian_traceline_pts(p_item, theta, responses_item, pred_data, samp_size)

Arguments

p_item

Vector of item parameters.

theta

Vector of theta values.

responses_item

Vector of item responses.

pred_data

Matrix or data frame of DIF and/or impact predictors.

samp_size

Sample size in data set.

Value

a "matrix" of probability values for Gaussian item likelihood

Continuous tracelines using proxy data.

Description

Continuous tracelines using proxy data.

Usage

gaussian_traceline_pts_proxy(
  p_item,
  prox_data,
  responses_item,
  pred_data,
  samp_size
)

Arguments

p_item

Vector of item parameters.

prox_data

Vector of observed proxy scores.

responses_item

Vector of item responses.

pred_data

Matrix or data frame of DIF and/or impact predictors.

samp_size

Sample size in data set.

Value

a "matrix" of probability values for Gaussian item likelihood

Simulated data example with multiple DIF covariates

Description

A simulated dataset containing six binary items and three DIF covariates

Usage

ida

Format

A data frame with 500 rows and 9 variables:

item1
item2
item3
item4
item5
item6
age
gender
study

...

Maximization step.

Description

Maximization step.

Usage

information_criteria(
  eout,
  p,
  item_data,
  pred_data,
  prox_data,
  mean_predictors,
  var_predictors,
  item_type,
  gamma,
  samp_size,
  num_responses,
  num_items,
  num_quad
)

Arguments

eout

E-table output.

p

List of parameters.

item_data

Matrix or data.frame of item responses.

pred_data

Matrix or data.frame of DIF and/or impact predictors.

prox_data

Vector of observed proxy scores.

mean_predictors

Possibly different matrix of predictors for the mean impact equation.

var_predictors

Possibly different matrix of predictors for the variance impact equation.

item_type

Optional character value or vector indicating the type of item to be modeled.

gamma

Numeric value indicating the gamma parameter in the MCP function.

samp_size

Sample size in data set.

num_responses

Number of responses for each item.

num_items

Number of items in data set.

num_quad

Number of quadrature points used for approximating the latent variable.

Value

a "list" of information criteria to use for model selection

Plot function for regDIF function

Description

Plot function for regDIF function

Usage

## S3 method for class 'regDIF'
plot(x, y = NULL, method = "bic", color.seed = 123, legend.plot = TRUE, ...)

Arguments

x

Fitted regDIF model object.

y

Unused for plotting regDIF model object.

method

Fit statistic to use for identifying DIF effects in plot.

color.seed

Random seed to sample line colors and line types for DIF effects in plot.

legend.plot

Logical indicating whether to plot a legend. Default is TRUE.

...

Additional arguments to be passed through to plot.

Value

a "plot" object for a "regDIF" fit

Maximization step.

Description

Maximization step.

Usage

postprocess(
  estimates,
  item.data,
  pred.data,
  prox.data,
  item_data,
  pred_data,
  prox_data,
  mean_predictors,
  var_predictors,
  item_type,
  tau_vec,
  num_tau,
  alpha,
  pen,
  anchor,
  control,
  final_control,
  final,
  samp_size,
  num_responses,
  num_predictors,
  num_items,
  num_quad,
  NA_cases
)

Arguments

estimates

List of converged parameters.

item.data

User-given matrix or data.frame of DIF and/or impact predictors.

pred.data

User-given matrix or data.frame of item responses.

prox.data

User-given matrix or data.frame of observed proxy scores.

item_data

Processed matrix or data.frame of item responses.

pred_data

Processed matrix or data.frame of DIF and/or impact predictors.

prox_data

Processed matrix or data.frame of observed proxy scores.

mean_predictors

Possibly different matrix of predictors for the mean impact equation.

var_predictors

Possibly different matrix of predictors for the variance impact equation.

item_type

Optional character value or vector indicating the type of item to be modeled.

tau_vec

Optional numeric vector of tau values.

num_tau

Logical indicating whether the minimum tau value needs to be identified during the regDIF procedure.

alpha

Numeric value indicating the alpha parameter in the elastic net penalty function.

pen

Tuning parameter index.

anchor

Anchor item(s).

control

Optional list of user-defined control parameters

final_control

List of final control parameters.

final

List of model results.

samp_size

Sample size in dataset.

num_responses

Number of responses for each item.

num_predictors

Number of predictors.

num_items

Number of items in dataset.

num_quad

Number of quadrature points used for approximating the latent variable.

NA_cases

Logical vector indicating NA cases.

Value

a "list" object of processed "regDIF" results

Pre-process data.

Description

Pre-process data.

Usage

preprocess(
  item.data,
  pred.data,
  prox.data,
  item.type,
  pen.type,
  tau,
  num.tau,
  anchor,
  stdz,
  control,
  call
)

Arguments

item.data

Matrix or data frame of item responses.

pred.data

Matrix or data frame of DIF and/or impact predictors.

prox.data

Vector of observed proxy scores.

item.type

Character value or vector indicating the item response distributions.

pen.type

Character indicating type of penalty.

tau

Optional numeric vector of tau values.

num.tau

Numeric indicating number of tau values to run Reg-DIF on.

anchor

Optional numeric value or vector indicating which item response(s) are anchors (e.g., anchor = 1).

stdz

Logical value indicating whether to standardize DIF and impact predictors for regularization.

control

Optional list of additional model specification and optimization parameters.

call

Defined from regDIF.

Value

a "list" of default controls for "em_estimation"

Print function for regDIF function

Description

Print function for regDIF function

Usage

## S3 method for class 'regDIF'
print(x, ...)

Arguments

x

Fitted regDIF model object.

...

Additional arguments to be passed through print.

Value

NULL

Regularized Differential Item Functioning

Description

Identify DIF in item response theory models using regularization.

Usage

regDIF(item.data,
       pred.data,
       prox.data = NULL,
       item.type = NULL,
       pen.type = NULL,
       pen.deriv = TRUE,
       tau = NULL,
       num.tau = 100,
       alpha = 1,
       gamma = 3,
       anchor = NULL,
       stdz = TRUE,
       control = list())

Arguments

item.data

Matrix or data frame of item responses. See below for supported item types.

pred.data

Matrix or data frame of predictors affecting item responses (DIF) and latent variable (impact). See control option below to specify different predictors for impact model.

prox.data

Optional vector of observed scores to serve as a proxy for the latent variable. If a vector is supplied, a multivariate regression model will be fit to the data. The default is NULL, indicating that latent scores will be estimated during model estimation.

item.type

Optional character value or vector indicating the type of item to be modeled. The default is NULL, corresponding to a 2PL or graded item type. Different item types may be specified for a single model by providing a vector equal in length to the number of items in item.data. The options include:

"rasch" - Slopes constrained to 1 and intercepts freely estimated.
"2pl" - Slopes and intercepts freely estimated.
"graded" - Slopes, intercepts, and thresholds freely estimated.
"cfa"

pen.type

Optional character value indicating the penalty function to use. The default is NULL, corresponding to the LASSO function. The options include:

"lasso" - The least absolute selection and shrinkage operator (LASSO), which controls DIF selection through \tau (tau).
"mcp" - The minimax concave penalty (MCP), which controls DIF selection through \tau (tau) and estimator bias through \gamma (gamma). Uses the firm-thresholding penalty function.
"grp.lasso" - The group version of the LASSO penalty, which selects intercept and slope DIF effects on each background characteristic together.
"grp.mcp" - The group version of the MCP function.

pen.deriv

Logical value indicating whether to use the second derivative of the penalized parameter during regularization. The default is TRUE.

tau

Optional numeric vector of tau values \ge 0. If tau is supplied, this overrides the automatic construction of tau values. Must be non-negative and in descending order, from largest to smallest values (e.g., seq(1,0,-.01).

num.tau

Numeric value indicating how many tau values to fit. The default is 100.

alpha

Numeric value indicating the alpha parameter in the elastic net penalty function. Alpha controls the degree to which LASSO or ridge is used during regularization. The default is 1, which is equivalent to LASSO. NOTE: If using MCP penalty, alpha may not be exactly 0.

gamma

Numeric value indicating the gamma parameter in the MCP function. Gamma controls the degree of tapering of DIF effects as tau decreases. Larger gamma leads to faster tapering (less bias but possibly more unstable optimization), whereas smaller gamma leads to slower tapering (more bias but more stable optimization). Default is 3. Must be greater than 1.

anchor

Optional numeric value or vector indicating which item response(s) are anchors (e.g., anchor = 1). Default is NULL, meaning at least one DIF effect per covariate will be fixed to zero as tau approaches 0 (required to identify the model).

stdz

Logical value indicating whether to standardize DIF and impact predictors for regularization. Default is TRUE, as it is recommended that all predictors be on the same scale.

control

Optional list of different model specifications and optimization parameters. May be:

impact.mean.data: Matrix or data frame of predictors, which allows for a different set of predictors to affect the mean impact equation compared to the item response DIF equations. Default includes all predictors from pred.data.
impact.var.data: Matrix or data frame with predictors for variance impact. See above. Default includes all predictors in pred.data.
tol: Convergence threshold of EM algorithm. Default is 10^-5.
maxit: Maximum number of EM iterations. Default is 2000.
adapt.quad: Logical value indicating whether to use adaptive quadrature to approximate the latent variable. The default is FALSE. NOTE: Adaptive quadrature is not supported yet.
num.quad: Numeric value indicating the number of quadrature points to be used. For fixed-point quadrature, the default is 21 points when all item responses are binary or else 51 points if at least one item is ordered categorical.
int.limits: Vector of 2 numeric values indicating the integral limits for quadrature. Default is c(-6,6).
optim.method: Character value indicating which optimization method to use. Default is "UNR", which updates estimates one-at-a-time using univariate Newton-Raphson, or a single iteration of coordinate descent. Another option is "MNR", which updates the impact and item parameter estimates using Multivariate Newton-Raphson. A third option is "CD", or coordinate descent with complete iterations through all parameters until convergence. "MNR" will be faster in most cases, although "UNR" may achieve faster results when the number of predictors is large.
start.values: List of numbers assigned as starting values to the regDIF procedure. List must contain only the following names: impact, for mean and variance impact parameters, in the order that is given by an object of class coef.regDIF; base, for base intercept and slope parameters, in order given by a coef.regDIF object; and finally, dif, for intercept and slope DIF parameters, again in order given by a coef.regDIF object.

Value

Function returns an object of class regDIF, which is a list of results from the regularization routine

Examples


library(regDIF)
head(ida)
item.data <- ida[,1:6]
pred.data <- ida[,7:9]
prox.data <- rowSums(item.data)
fit <- regDIF(item.data, pred.data, prox.data, num.tau = 10)
summary(fit)

Standard Errors for regDIF Model(s)

Description

Obtain standard errors for regDIF model(s).

Usage

se.regDIF(fit,
          se.type = "sem",
          tau = NULL,
          ...)

Arguments

fit

A regDIF fitted model object of class regDIF. Upon designating fit, the default is to obtain standard errors for the best-fitting model according to the minimum BIC model.

se.type

Character value indicating the method of computing standard errors for a regDIF fitted model. Default is "sem", or the supplemental EM algorithm (see Cai, 2008). Other options are in development and not yet supported.

tau

Optional numeric or vector of tau values corresponding to those already fit in fit.

...

Additional arguments to pass to regDIF function if different settings are desired.

Value

Function returns an object of class se.regDIF

Summary function for regDIF function

Description

Summary function for regDIF function

Usage

## S3 method for class 'regDIF'
summary(object, method = "bic", ...)

Arguments

object

Fitted regDIF model object.

method

Fit statistic to use for displaying minimum tau model.

...

Additional arguments to be passed through summary.

Value

NULL