Help for package clustMD

Title:

Model Based Clustering for Mixed Data

Version:

1.2.1

Description:

Model-based clustering of mixed data (i.e. data which consist of continuous, binary, ordinal or nominal variables) using a parsimonious mixture of latent Gaussian variable models.

Depends:

R (≥ 3.3.2)

Imports:

ggplot2, mclust, reshape2, MASS, msm, mvtnorm, parallel, truncnorm, viridis, stats

License:

GPL-2

LazyData:

true

RoxygenNote:

6.0.1

NeedsCompilation:

Packaged:

2017-05-08 16:35:03 UTC; damien

Author:

Damien McParland [aut, cre], Isobel Claire Gormley [aut]

Maintainer:

Damien McParland <damien.mcparland@ucd.ie>

Repository:

CRAN

Date/Publication:

2017-05-08 17:19:20 UTC

Model based clustering for mixed data: clustMD

Description

Model-based clustering of mixed data (i.e. data that consist of continuous, binary, ordinal or nominal variables) using a parsimonious mixture of latent Gaussian variable models.

Author(s)

Damien McParland

Damien McParland <damien.mcparland@ucd.ie> Isobel Claire Gormley <claire.gormley@ucd.ie>

References

McParland, D. and Gormley, I.C. (2016). Model based clustering for mixed data: clustMD. Advances in Data Analysis and Classification, 10 (2):155-169.

Byar prostate cancer data set.

Description

A data set consisting of variables of mixed type measured on a group of prostate cancer patients. Patients have either stage 3 or stage 4 prostate cancer.

Usage

Byar

Format

A data frame with 475 observations on the following 15 variables.

Age: a numeric vector indicating the age of the patient.
Weight: a numeric vector indicating the weight of the patient.
Performance.rating: an ordinal variable indicating how active the patient is: 0 - normal activity, 1 - in bed less than 50% of daytime, 2 - in bed more than 50% of daytime, 3 - confined to bed.
Cardiovascular.disease.history: a binary variable indicating if the patient has a history of cardiovascular disease: 0 - no, 1 - yes.
Systolic.Blood.pressure: a numeric vector indicating the systolic blood pressure of the patient in units of ten.
Diastolic.blood.pressure: a numeric vector indicating the diastolic blood pressure of the patient in units of ten.
Electrocardiogram.code: a nominal variable indicating the electorcardiogram code: 0 - normal, 1 - benign, 2 - rythmic disturbances and electrolyte changes, 3 - heart blocks or conduction defects, 4 - heart strain, 5 - old myocardial infarct, 6 - recent myocardial infarct.
Serum.haemoglobin: a numeric vector indicating the serum haemoglobin levels of the patient measured in g/100ml.
Size.of.primary.tumour: a numeric vector indicating the estimated size of the patient's primary tumour in centimeters squared.
Index.of.tumour.stage.and.histolic.grade: a numeric vector indicating the combined index of tumour stage and histolic grade of the patient.
Serum.prostatic.acid.phosphatase: a numeric vector indicating the serum prostatic acid phosphatase levels of the patient in King-Armstong units.
Bone.metastases: a binary vector indicating the presence of bone metastasis: 0 - no, 1 - yes.
Stage: the stage of the patient's prostate cancer.
Observation: a patient ID number.
SurvStat: the post trial survival status of the patient: 0 - alive, 1 - dead from prostatic cancer, 2 - dead from heart or vascular disease, 3 - dead from cerebrovascular accident, 3 - dead form pulmonary ebolus, 5 - dead from other cancer, 6 - dead from respiratory disease, 7 - dead from other specific non-cancer cause, 8 - dead from other unspecified non-cancer cause, 9 - dead from unknown cause.

Source

Byar, D.P. and Green, S.B. (1980). The choice of treatment for cancer patients based on covariate information: applications to prostate cancer. Bulletin du Cancer 67: 477-490.

Hunt, L., Jorgensen, M. (1999). Mixture model clustering using the multimix program. Australia and New Zealand Journal of Statistics 41: 153-171.

E-step of the (MC)EM algorithm

Description

Internal function.

Usage

E.step(N, G, D, CnsIndx, OrdIndx, zlimits, mu, Sigma, Y, J, K, norms, nom.ind.Z,
  patt.indx, pi.vec, model, perc.cut)

Arguments

N

number of observations.

G

number of mixture components.

D

dimension of the latent data.

CnsIndx

the number of continuous variables.

OrdIndx

the sum of the number of continuous and ordinal (including binary) variables.

zlimits

the truncation points for the latent data.

mu

a D x G matrix of means.

Sigma

a D x D x G array of covariance parameters.

Y

an N x J data matrix.

J

the number of observed variables.

K

the number of levels for each variable.

norms

a matrix of standard normal deviates.

nom.ind.Z

the latent dimensions corresponding to each nominal variable.

patt.indx

a list of length equal to the number of observed response patterns. Each entry of the list details the observations for which that response pattern was observed.

pi.vec

mixing weights.

model

the covariance model fitted to the data.

perc.cut

threshold parameters.

Value

Output required for clustMD function.

M-step of the (MC)EM algorithm

Description

Internal function.

Usage

M.step(tau, N, sumTauEz, J, OrdIndx, D, G, Y, CnsIndx, sumTauS, model, a,
  nom.ind.Z)

Arguments

tau

a N x G matrix of cluster membership probabilities.

N

number of observations.

sumTauEz

the sum across all observations of observed and expected latent continuous values mutiplied by the posterior probability of belonging to each cluster.

J

the number of variables.

OrdIndx

the sum of the number of continuous and ordinal (including binary) variables.

D

dimension of the latent data.

G

the number of mixture components.

Y

a N x J data matrix.

CnsIndx

the number of continuous variables.

sumTauS

the sum across all observations of outer product of observed and expected latent continuous values mutiplied by the posterior probability of belonging to each cluster.

model

which clustMD covariance model is fitted.

a

a G x D matrix of the entries of A.

nom.ind.Z

the latent dimensions corresponding to each nominal variable.

Value

Output required for clustMD function.

Approximates the observed log likelihood.

Description

Approximates the observed log likelihood.

Usage

ObsLogLikelihood(N, CnsIndx, G, Y, mu, Sigma, pi.vec, patt.indx, zlimits, J,
  OrdIndx, probs.nom, model, perc.cut, K)

Arguments

N

the number of observations.

CnsIndx

the number of continuous variables.

G

the number of mixture components.

Y

an N x J data matrix.

mu

a D x G matrix of means.

Sigma

a D x D x G array of covariance parameters.

pi.vec

the mixing weights.

patt.indx

a list of length equal to the number of observed response patterns. Each entry of the list details the observations for which that response pattern was observed.

zlimits

the truncation points for the latent data.

J

the number of variables.

OrdIndx

the sum of the number of continuous and ordinal (including binary) variables.

probs.nom

an array containing the response probabilities for each nominal variable for each cluster

model

the covariance model fitted to the data.

perc.cut

threshold parameters.

K

the number of levels for each variable.

Value

Output required for clustMD function.

Model Based Clustering for Mixed Data

Description

A function that fits the clustMD model to a data set consisting of any combination of continuous, binary, ordinal and nominal variables.

Usage

clustMD(X, G, CnsIndx, OrdIndx, Nnorms, MaxIter, model, store.params = FALSE,
  scale = FALSE, startCL = "hc_mclust", autoStop = FALSE, ma.band = 50,
  stop.tol = NA)

Arguments

X

a data matrix where the variables are ordered so that the continuous variables come first, the binary (coded 1 and 2) and ordinal variables (coded 1, 2, ...) come second and the nominal variables (coded 1, 2, ...) are in last position.

G

the number of mixture components to be fitted.

CnsIndx

the number of continuous variables in the data set.

OrdIndx

the sum of the number of continuous, binary and ordinal variables in the data set.

Nnorms

the number of Monte Carlo samples to be used for the intractable E-step in the presence of nominal data. Irrelevant if there are no nominal variables.

MaxIter

the maximum number of iterations for which the (MC)EM algorithm should run.

model

a string indicating which clustMD model is to be fitted. This may be one of: EII, VII, EEI, VEI, EVI, VVI or BD.

store.params

a logical argument indicating if the parameter estimates at each iteration should be saved and returned by the clustMD function.

scale

a logical argument indicating if the continuous variables should be standardised.

startCL

a string indicating which clustering method should be used to initialise the (MC)EM algorithm. This may be one of "kmeans" (K means clustering), "hclust" (hierarchical clustering), "mclust" (finite mixture of Gaussian distributions), "hc_mclust" (model-based hierarchical clustering) or "random" (random cluster allocation).

autoStop

a logical argument indicating whether the (MC)EM algorithm should use a stopping criterion to decide if convergence has been reached. Otherwise the algorithm will run for MaxIter iterations.

If only continuous variables are present the algorithm will use Aitken's acceleration criterion with tolerance stop.tol.

If categorical variables are present, the stopping criterion is based on a moving average of the approximated log likelihood values. Let t denote the current interation. The average of the ma.band most recent approximated log likelihood values is compared to the average of another ma.band iterations with a lag of 10 iterations. If this difference is less than the tolerance the algorithm will be said to have converged.

ma.band

the number of iterations to be included in the moving average calculation for the stopping criterion.

stop.tol

the tolerance of the (MC)EM stopping criterion.

Value

An object of class clustMD is returned. The output components are as follows:

model

The covariance model fitted to the data.

G

The number of clusters fitted to the data.

Y

The observed data matrix.

cl

The cluster to which each observation belongs.

tau

A N x G matrix of the probabilities of each observation blonging to each cluster.

means

A D x G matrix of the cluster means. Where D is the dimension of the combined observed and latent continuous space.

A

A G x D matrix containing the diagonal entries of the A matrix corresponding to each cluster.

Lambda

A G x D matrix of volume parameters corresponding to each observed or latent dimension for each cluster.

Sigma

A D x D x G array of the covariance matrices for each cluster.

BIChat

The estimated Bayesian information criterion for the model fitted.

ICLhat

The estimated integrated classification likelihood criterion for the model fitted.

paramlist

If store.params is TRUE then paramlist is a list of the stored parameter values in the order given above with the saved estimated likelihood values in last position.

Varnames

A character vector of names corresponding to the columns of Y

Varnames_sht

A truncated version of Varnames. Used for plotting.

likelihood.store

A vector containing the estimated log likelihood at each iteration.

References

McParland, D. and Gormley, I.C. (2016). Model based clustering for mixed data: clustMD. Advances in Data Analysis and Classification, 10 (2):155-169.

Examples

	data(Byar)
    # Transformation skewed variables
    Byar$Size.of.primary.tumour <- sqrt(Byar$Size.of.primary.tumour)
    Byar$Serum.prostatic.acid.phosphatase <- log(Byar$Serum.prostatic.acid.phosphatase)

    # Order variables (Continuous, ordinal, nominal)
    Y <- as.matrix(Byar[, c(1, 2, 5, 6, 8, 9, 10, 11, 3, 4, 12, 7)])

    # Start categorical variables at 1 rather than 0
    Y[, 9:12] <- Y[, 9:12] + 1

    # Standardise continuous variables
    Y[, 1:8] <- scale(Y[, 1:8])

    # Merge categories of EKG variable for efficiency
    Yekg <- rep(NA, nrow(Y))
    Yekg[Y[,12]==1] <- 1
    Yekg[(Y[,12]==2)|(Y[,12]==3)|(Y[,12]==4)] <- 2
    Yekg[(Y[,12]==5)|(Y[,12]==6)|(Y[,12]==7)] <- 3
    Y[, 12] <- Yekg

    ## Not run: 
    res <- clustMD(X = Y, G = 3, CnsIndx = 8, OrdIndx = 11, Nnorms = 20000,
    MaxIter = 500, model = "EVI", store.params = FALSE, scale = TRUE, 
    startCL = "kmeans", autoStop= TRUE, ma.band=30, stop.tol=0.0001)
    
## End(Not run)

Model Based Clustering for Mixed Data

Description

A function that fits the clustMD model to a data set consisting of any combination of continuous, binary, ordinal and nominal variables. This function is a wrapper for clustMD that takes arguments as a list.

Usage

clustMDlist(arglist)

Arguments

arglist

a list of input arguments for clustMD. See clustMD.

Value

A clustMD object. See clustMD.

References

McParland, D. and Gormley, I.C. (2016). Model based clustering for mixed data: clustMD. Advances in Data Analysis and Classification, 10 (2):155-169.

Examples

    data(Byar)

    # Transformation skewed variables
    Byar$Size.of.primary.tumour <- sqrt(Byar$Size.of.primary.tumour)
    Byar$Serum.prostatic.acid.phosphatase <- 
    log(Byar$Serum.prostatic.acid.phosphatase)

    # Order variables (Continuous, ordinal, nominal)
    Y <- as.matrix(Byar[, c(1, 2, 5, 6, 8, 9, 10, 11, 3, 4, 12, 7)])

    # Start categorical variables at 1 rather than 0
    Y[, 9:12] <- Y[, 9:12] + 1

    # Standardise continuous variables
    Y[, 1:8] <- scale(Y[, 1:8])

    # Merge categories of EKG variable for efficiency
    Yekg <- rep(NA, nrow(Y))
    Yekg[Y[,12]==1] <- 1
    Yekg[(Y[,12]==2)|(Y[,12]==3)|(Y[,12]==4)] <- 2
    Yekg[(Y[,12]==5)|(Y[,12]==6)|(Y[,12]==7)] <- 3
    Y[, 12] <- Yekg

    argList <- list(X=Y, G=3, CnsIndx=8, OrdIndx=11, Nnorms=20000,
    MaxIter=500, model="EVI", store.params=FALSE, scale=TRUE, 
    startCL="kmeans", autoStop=FALSE, ma.band=50, stop.tol=NA)

    ## Not run: 
    res <- clustMDlist(argList)
    
## End(Not run)

Run multiple clustMD models in parallel

Description

This function allows the user to run multiple clustMD models in parallel. The inputs are similar to clustMD() except G is now a vector containing the the numbers of components the user would like to fit and models is a vector of strings indicating the covariance models the user would like to fit for each element of G. The user can specify the number of cores to be used or let the function detect the number available.

Usage

clustMDparallel(X, CnsIndx, OrdIndx, G, models, Nnorms, MaxIter, store.params,
  scale, startCL = "hc_mclust", Ncores = NULL, autoStop = FALSE,
  ma.band = 50, stop.tol = NA)

Arguments

X

a data matrix where the variables are ordered so that the continuous variables come first, the binary (coded 1 and 2) and ordinal variables (coded 1, 2,...) come second and the nominal variables (coded 1, 2,...) are in last position.

CnsIndx

the number of continuous variables in the data set.

OrdIndx

the sum of the number of continuous, binary and ordinal variables in the data set.

G

a vector containing the numbers of mixture components to be fitted.

models

a vector of strings indicating which clustMD models are to be fitted. This may be one of: EII, VII, EEI, VEI, EVI, VVI or BD.

Nnorms

the number of Monte Carlo samples to be used for the intractable E-step in the presence of nominal data.

MaxIter

the maximum number of iterations for which the (MC)EM algorithm should run.

store.params

a logical variable indicating if the parameter estimates at each iteration should be saved and returned by the clustMD function.

scale

a logical variable indicating if the continuous variables should be standardised.

startCL

Ncores

the number of cores the user would like to use. Must be less than or equal to the number of cores available.

autoStop

a logical argument indicating whether the (MC)EM algorithm should use a stopping criterion to decide if convergence has been reached. Otherwise the algorithm will run for MaxIter iterations.

If only continuous variables are present the algorithm will use Aitken's acceleration criterion with tolerance stop.tol.

If categorical variables are present, the stopping criterion is based on a moving average of the approximated log likelihood values. let $t$ denote the current interation. The average of the ma.band most recent approximated log likelihood values is compared to the average of another ma.band iterations with a lag of 10 iterations. If this difference is less than the tolerance the algorithm will be said to have converged.

ma.band

the number of iterations to be included in the moving average stopping criterion.

stop.tol

the tolerance of the (MC)EM stopping criterion.

Value

An object of class clustMDparallel is returned. The output components are as follows:

BICarray

A matrix indicating the estimated BIC values for each of the models fitted.

results

A list containing the output for each of the models fitted. Each entry of this list is a clustMD object. If the algorithm failed to fit a particular model, the corresponding entry of results will be NULL.

References

McParland, D. and Gormley, I.C. (2016). Model based clustering for mixed data: clustMD. Advances in Data Analysis and Classification, 10 (2):155-169.

Examples

    data(Byar)

    # Transformation skewed variables
    Byar$Size.of.primary.tumour <- sqrt(Byar$Size.of.primary.tumour)
    Byar$Serum.prostatic.acid.phosphatase <- 
    log(Byar$Serum.prostatic.acid.phosphatase)

    # Order variables (Continuous, ordinal, nominal)
    Y <- as.matrix(Byar[, c(1, 2, 5, 6, 8, 9, 10, 11, 3, 4, 12, 7)])

    # Start categorical variables at 1 rather than 0
    Y[, 9:12] <- Y[, 9:12] + 1

    # Standardise continuous variables
    Y[, 1:8] <- scale(Y[, 1:8])

    # Merge categories of EKG variable for efficiency
    Yekg <- rep(NA, nrow(Y))
    Yekg[Y[,12]==1] <- 1
    Yekg[(Y[,12]==2)|(Y[,12]==3)|(Y[,12]==4)] <- 2
    Yekg[(Y[,12]==5)|(Y[,12]==6)|(Y[,12]==7)] <- 3
    Y[, 12] <- Yekg

    ## Not run: 
    res <- clustMDparallel(X = Y, G = 1:3, CnsIndx = 8, OrdIndx = 11, Nnorms = 20000,
    MaxIter = 500, models = c("EVI", "EII", "VII"), store.params = FALSE, scale = TRUE, 
    startCL = "kmeans", autoStop= TRUE, ma.band=30, stop.tol=0.0001)
  
    res$BICarray

## End(Not run)

Parallel coordinates plot adapted for `clustMD` output

Description

Produces a parallel coordinates plot as parcoord in the MASS library with some minor adjustments.

Usage

clustMDparcoord(x, col = 1, xlabels = NULL, lty = 1, var.label = FALSE,
  xlab = "", ylab = "", ...)

Arguments

x

a matrix or data frame who columns represent variables. Missing values are allowed.

col

a vector of colours, recycled as necessary for each observation.

xlabels

a character vector of variable names for the x axis.

lty

a vector of line types, recycled as necessary for each observation.

var.label

if TRUE, each variable's axis is labelled with maximum and minimum values.

xlab

label for the X axis.

ylab

label for the Y axis.

...

further graphics parameters which are passed to matplot.

Value

A parallel coordinates plot is drawn with one line for each cluster.

References

Wegman, E. J. (1990) Hyperdimensional data analysis using parallel coordinates. Journal of the American Statistical Association 85, 664-675.

Venables, W. N. and Ripley, B. D. (2002) Modern Applied Statistics with S. Fourth edition. Springer.

Return the mean and covariance matrix of a truncated multivariate normal distribution

Description

This function returns the mean and covariance matrix of a truncated multivariate normal distribution. It takes as inputs a vector of lower thresholds and another of upper thresholds along with the mean and covariance matrix of the untruncated distribution. This function follows the method proposed by Kan \& Robotti (2016).

Usage

dtmvnom(a, b, mu, S)

Arguments

a

a vector of lower thresholds.

b

a vector of upper thresholds.

mu

the mean of the untruncated distribution.

S

the covariance matrix of the untruncated distribution.

Value

Returns a list of two elements. The first element, tmean, is the mean of the truncated multivariate normal distribution. The second element, tvar, is the covariance matrix of the truncated distribution.

References

Kan, R., & Robotti, C. (2016). On Moments of Folded and Truncated Multivariate Normal Distributions. Available at SSRN.

Extracts relevant output from `clustMDparallel` object

Description

This function takes a clustMDparallel object, a number of clusters and a covariance model as inputs. It then returns the output corresponding to that model. If the particular model is not contained in the clustMDparallel object then the function returns an error.

Usage

getOutput_clustMDparallel(resParallel, nClus, covModel)

Arguments

resParallel

a clustMDparallel object.

nClus

the number of clusters in the desired output.

covModel

the covariance model of the desired output.

Value

A clustMD object containing the output for the relevant model.

Description

Calculate the mode of a sample

Usage

modal.value(x)

Arguments

a vector containing the sample values.

Value

The mode of the sample. In the case of a tie, the minimum is returned.

Calculates the number of free parameters for the `clustMD` model.

Description

Internal function.

Usage

npars_clustMD(model, D, G, J, CnsIndx, OrdIndx, K)

Arguments

model

the clustMD covariance model fitted.

D

the dimension of the latent data.

G

the number of mixture components.

J

the number of variables.

CnsIndx

the number of continuous variables.

OrdIndx

the sum of the number of continuous and ordinal (including binary) variables.

K

a vector indicating the number of levels of each categorical variable.

Value

Output required for clustMD function.

Check if response patterns are equal

Description

Checks whether response patterns are equal or not and returns TRUE or FALSE reprectively.

Usage

patt.equal(x, patt)

Arguments

x

a numeric vector.

patt

a vector to compare x to.

Value

Returns TRUE if x and patt are exactly the same and FALSE otherwise.

Note

Used internally in clustMD function.

Calculates the threshold parameters for ordinal variables.

Description

Calculates the threshold parameters for ordinal variables.

Usage

perc.cutoffs(CnsIndx, OrdIndx, Y, N)

Arguments

CnsIndx

the number of continuous variables.

OrdIndx

the sum of the number of continuous and ordinal (including binary) variables.

Y

an N x J data matrix.

N

number of observations.

Value

Output required for clustMD function.

Plotting method for objects of class `clustMD`

Description

Plots a parallel coordinates plot and dot plot of the estimated cluster means, a barplot of the variances by cluster for diagonal covariance models or a heatmap of the covariance matrix for non-diagonal covariance structures, and a histogram of the clustering uncertainties for each observation.

Usage

## S3 method for class 'clustMD'
plot(x, ...)

Arguments

x

a clustMD object.

...

further arguments passed to or from other methods.

Value

Prints graphical summaries of the fitted model as detailed above.

References

McParland, D. and Gormley, I.C. (2016). Model based clustering for mixed data: clustMD. Advances in Data Analysis and Classification, 10 (2):155-169.

Summary plots for a clustMDparallel object

Description

Produces a line plot of the estimated BIC values corresponding to each covariance model against the number of clusters fitted. For the optimal model according to this criteria, a parallel coordinates plot of the cluster means is produced along with a barchart or heatmap of the covariance matrices for each cluster and a histogram of the clustering uncertainties.

Usage

## S3 method for class 'clustMDparallel'
plot(x, ...)

Arguments

x

a clustMDparallel object.

...

further arguments passed to or from other methods.

Value

Produces a number of plots as detailed above.

Print basic details of `clustMD` object.

Description

Prints a short summary of a clustMD object to screen. Details the number of clusters fitted as well as the covariance model and the estimated BIC.

Usage

## S3 method for class 'clustMD'
print(x, ...)

Arguments

x

a clustMD object.

...

further arguments passed to or from other methods.

Value

Prints summary details, as described above, to screen.

Print basic details of `clustMDparallel` object

Description

Prints basic details of clustMDparallel object. Outputs the different numbers of clusters and the different covariance structures fitted to the data. It also states which model was optimal according to the estimated BIC criterion.

Usage

## S3 method for class 'clustMDparallel'
print(x, ...)

Arguments

x

a clustMDparallel object.

...

further arguments passed to or from other methods.

Value

Prints details described above to screen.

Helper internal function for `dtmvnom()`

Description

Internal function.

Usage

qfun(a, b, S)

Arguments

a

a vector of lower thresholds.

b

a vector of upper thresholds.

S

the covariance matrix of the untruncated distribution.

Value

Output required for dtmvnom function.

References

Kan, R., & Robotti, C. (2016). On Moments of Folded and Truncated Multivariate Normal Distributions. Available at SSRN.

Stable computation of the log of a sum

Description

Function takes a numeric vector and returns the log of the sum of the elements of that vector. Calculations are done on the log scale for stability.

Usage

stable.probs(s)

Arguments

s

a numeric vector.

Value

The log of the sum of the elements of s

Summarise `clustMD` object

Description

Prints a summary of a clustMD object to screen. Details the number of clusters fitted as well as the covariance model and the estimated BIC. Also prints a table detailing the number of observations in each cluster and a matrix of the cluster means.

Usage

## S3 method for class 'clustMD'
summary(object, ...)

Arguments

object

a clustMD object.

...

further arguments passed to or from other methods.

Value

Prints summary of clustMD object to screen, as detailed above.

Prints a summary of a clustMDparallel object to screen.

Description

Prints the different numbers of clusters and covariance models fitted and indicates the optimal model according to the estimated BIC criterion. The estimated BIC for the optimal model is printed to screen along with a table of the cluster membership and the matrix of cluster means for this optimal model.

Usage

## S3 method for class 'clustMDparallel'
summary(object, ...)

Arguments

object

a clustMDparallel object.

...

further arguments passed to or from other methods.

Value

Prints a summary of the clustMDparallel object to screen, as detailed above.

Calculate the outer product of a vector with itself

Description

This function calculates the outer product of a vector with itself.

Usage

vec.outer(x)

Arguments

x

a numeric vector.

Value

Returns the outer product of the vector x with itself.

Calculates the first and second moments of the latent data

Description

Internal function.

Usage

z.moments(D, G, N, CnsIndx, OrdIndx, zlimits, mu, Sigma, Y, J, K, norms,
  nom.ind.Z, patt.indx)

Arguments

D

dimension of the latent data.

G

number of mixture components.

N

number of observations.

CnsIndx

the number of continuous variables.

OrdIndx

the sum of the number of continuous and ordinal (including binary) variables.

zlimits

the truncation points for the latent data.

mu

a D x G matrix of means.

Sigma

a D x D x G array of covariance parameters.

Y

an N x J data matrix.

J

the number of variables.

K

the number of levels for each variable.

norms

a matrix of standard normal deviates.

nom.ind.Z

the latent dimensions corresponding to each nominal variable.

patt.indx

a list of length equal to the number of observed response patterns. Each entry of the list details the observations for which that response pattern was observed.

Value

Output required for clustMD function.

Calculates the first and second moments of the latent data for diagonal models

Description

Internal function.

Usage

z.moments_diag(D, G, N, CnsIndx, OrdIndx, zlimits, mu, Sigma, Y, J, K, norms,
  nom.ind.Z)

Arguments

D

dimension of the latent data.

G

number of mixture components.

N

number of observations.

CnsIndx

the number of continuous variables.

OrdIndx

the sum of the number of continuous and ordinal (including binary) variables.

zlimits

the truncation points for the latent data.

mu

a D x G matrix of means.

Sigma

a D x D x G array of covariance parameters.

Y

an N x J data matrix.

J

the number of variables.

K

the number of levels for each variable.

norms

a matrix of standard normal deviates.

nom.ind.Z

the latent dimensions corresponding to each nominal variable.

Value

Output required for clustMD function.

Transforms Monte Carlo simulated data into categorical data. Calculates empirical moments of latent data given categorical responses.

Description

Internal function.

Usage

z.nom.diag(Z)

Arguments

Z

a matrix of Monte Carlo simulated data.

Value

Output required for clustMD function.

Model based clustering for mixed data: clustMD

Description

Author(s)

References

See Also

Byar prostate cancer data set.

Description

Usage

Format

Source

E-step of the (MC)EM algorithm

Description

Usage

Arguments

Value

See Also

M-step of the (MC)EM algorithm

Description

Usage

Arguments

Value

See Also

Approximates the observed log likelihood.

Description

Usage

Arguments

Value

See Also

Model Based Clustering for Mixed Data

Description

Usage

Arguments

Value

References

Examples

Model Based Clustering for Mixed Data

Description

Usage

Arguments

Value

References

See Also

Examples

Run multiple clustMD models in parallel

Description

Usage

Arguments

Value

References

See Also

Examples

Parallel coordinates plot adapted for clustMD output

Description

Usage

Arguments

Value

References

Return the mean and covariance matrix of a truncated multivariate normal distribution

Description

Usage

Arguments

Value

References

Extracts relevant output from clustMDparallel object

Description

Usage

Arguments

Value

Calculate the mode of a sample

Description

Usage

Arguments

Value

Calculates the number of free parameters for the clustMD model.

Description

Usage

Arguments

Value

See Also

Check if response patterns are equal

Parallel coordinates plot adapted for `clustMD` output

Extracts relevant output from `clustMDparallel` object

Calculates the number of free parameters for the `clustMD` model.

Plotting method for objects of class `clustMD`

Print basic details of `clustMD` object.

Print basic details of `clustMDparallel` object

Helper internal function for `dtmvnom()`

Summarise `clustMD` object