Title: | Generic EM Algorithm |
Version: | 1.0.0 |
Description: | A generic function for running the Expectation-Maximization (EM) algorithm within a maximum likelihood framework, based on Dempster, Laird, and Rubin (1977) <doi:10.1111/j.2517-6161.1977.tb01600.x> is implemented. It can be applied after a model fitting using R's existing functions and packages. |
URL: | https://github.com/wudongjie/em |
License: | GPL (≥ 3) |
Encoding: | UTF-8 |
SystemRequirements: | C++11 |
BugReports: | https://github.com/wudongjie/em/issues |
NeedsCompilation: | yes |
Depends: | R (≥ 3.0.0) |
Imports: | stats, utils, survival, plm, methods, mclust, dplyr, numDeriv, nnet, magrittr |
LinkingTo: | Rcpp, RcppArmadillo |
Suggests: | testthat (≥ 3.0.0), parallel, fitdistrplus, gnm |
Config/testthat/edition: | 3 |
RoxygenNote: | 7.2.3 |
LazyData: | true |
Packaged: | 2023-01-09 23:27:09 UTC; eastm |
Author: | Dongjie Wu |
Maintainer: | Dongjie Wu <dwu.jacob@gmail.com> |
Repository: | CRAN |
Date/Publication: | 2023-01-11 06:20:02 UTC |
C-Step of EM algorithm
Description
Given the posterior probability, generate a matrix to assign each individual to a class. The assignment is based on which probability is the largest.
Usage
cstep(postpr)
Arguments
postpr |
('matrix()') |
A Generic EM Algorithm
Description
This is a generic EM algorithm that can work on specific models/objects. Currently, it supports 'lm', 'glm', 'gnm' in package gnm, 'clogit' in package survival and 'multinom' in package nnet. Use '?em.default' to check the manual of the default function of 'em'.
Usage
em(object, ...)
Arguments
object |
the model used, e.g. 'lm', 'glm', 'gnm', 'clogit', 'multinom' |
... |
arguments used in the 'model'. |
Value
An object of class 'em' is a list containing at least the following components:
models
a list of models/objects whose class are determined by a model fitting from the previous step.
pi
the prior probabilities.
latent
number of the latent classes.
algorithm
the algorithm used (could be either 'em', 'sem' or 'cem').
obs
the number of observations.
post_pr
the posterior probabilities.
concomitant
a list of the concomitant model. It is empty if no concomitant model is used.
init.method
the initialization method used.
call
the matched call.
terms
the codeterms object used.
Author(s)
Dongjie Wu
Examples
fit.lm <- lm(yn ~ x, data = simreg)
results <- em(fit.lm, latent = 2, verbose = FALSE)
fmm_fit <- predict(results)
fmm_fit_post <- predict(results, prob = "posterior")
The em function for 'survival::clogit'.
Description
The em function for 'survival::clogit'.
Usage
## S3 method for class 'clogit'
em(
object,
latent = 2,
verbose = FALSE,
init.method = c("random", "kmeans", "hc"),
init.prob = NULL,
algo = c("em", "cem", "sem"),
cluster.by = NULL,
max_iter = 500,
abs_tol = 1e-04,
concomitant = list(...),
use.optim = FALSE,
optim.start = c("random", "sample5"),
...
)
Arguments
object |
the model used, e.g. 'lm', 'glm', 'gnm'. |
latent |
the number of latent classes. |
verbose |
'True' to print the process of convergence. |
init.method |
the initialization method used in the model. The default method is 'random'. 'kmeans' is K-means clustering. 'hc' is model-based agglomerative hierarchical clustering. |
init.prob |
the starting prior probabilities used in classification based method. |
algo |
the algorithm used in em: 'em' the default EM algorithm, the classification em 'cem', or the stochastic em 'sem'. |
cluster.by |
a variable to define the level of clustering. |
max_iter |
the maximum iteration for em algorithm. |
abs_tol |
absolute accuracy requested. |
concomitant |
the formula to define the concomitant part of the model. The default is NULL. |
use.optim |
maximize the complete log likelihood (MLE) by using 'optim' and 'rcpp' code.The default value is 'FALSE'. |
optim.start |
the initialization method of generating the starting value for MLE. |
... |
arguments used in the 'model'. |
Value
An object of class 'em' is a list containing at least the following components:
models
a list of models/objects whose class are determined by a model fitting from the previous step.
pi
the prior probabilities.
latent
number of the latent classes.
algorithm
the algorithm used (could be either 'em', 'sem' or 'cem').
obs
the number of observations.
post_pr
the posterior probabilities.
concomitant
a list of the concomitant model. It is empty if no concomitant model is used.
init.method
the initialization method used.
call
the matched call.
terms
the codeterms object used.
The default em function
Description
The default em function
Usage
## Default S3 method:
em(
object,
latent = 2,
verbose = FALSE,
init.method = c("random", "kmeans", "hc"),
init.prob = NULL,
algo = c("em", "cem", "sem"),
cluster.by = NULL,
max_iter = 500,
abs_tol = 1e-04,
concomitant = list(...),
use.optim = FALSE,
optim.start = c("random", "sample5"),
...
)
Arguments
object |
the model used, e.g. 'lm', 'glm', 'gnm'. |
latent |
the number of latent classes. |
verbose |
'True' to print the process of convergence. |
init.method |
the initialization method used in the model. The default method is 'random'. 'kmeans' is K-means clustering. 'hc' is model-based agglomerative hierarchical clustering. |
init.prob |
the starting prior probabilities used in classification based method. |
algo |
the algorithm used in em: 'em' the default EM algorithm, the classification em 'cem', or the stochastic em 'sem'. |
cluster.by |
a variable to define the level of clustering. |
max_iter |
the maximum iteration for em algorithm. |
abs_tol |
absolute accuracy requested. |
concomitant |
the formula to define the concomitant part of the model. The default is NULL. |
use.optim |
maximize the complete log likelihood (MLE) by using 'optim' and 'rcpp' code.The default value is 'FALSE'. |
optim.start |
the initialization method of generating the starting value for MLE. |
... |
arguments used in the 'model'. |
Value
An object of class 'em' is a list containing at least the following components:
models
a list of models/objects whose class are determined by a model fitting from the previous step.
pi
the prior probabilities.
latent
number of the latent classes.
algorithm
the algorithm used (could be either 'em', 'sem' or 'cem').
obs
the number of observations.
post_pr
the posterior probabilities.
concomitant
a list of the concomitant model. It is empty if no concomitant model is used.
init.method
the initialization method used.
call
the matched call.
terms
the codeterms object used.
The default em function
Description
The default em function
Usage
## S3 method for class 'fitdist'
em(
object,
latent = 2,
verbose = FALSE,
init.method = c("random", "kmeans", "hc"),
init.prob = NULL,
algo = c("em", "cem", "sem"),
max_iter = 500,
...
)
Arguments
object |
the model used, e.g. 'lm', 'glm', 'gnm'. |
latent |
the number of latent classes. |
verbose |
'True' to print the process of convergence. |
init.method |
the initialization method used in the model. The default method is 'random'. 'kmeans' is K-means clustering. 'hc' is model-based agglomerative hierarchical clustering. |
init.prob |
the starting prior probabilities used in classification based method. |
algo |
the algorithm used in em: 'em' the default EM algorithm, the classification em 'cem', or the stochastic em 'sem'. |
max_iter |
the maximum iteration for em algorithm. |
... |
arguments used in the 'model'. |
Value
An object of class 'em' is a list containing at least the following components:
models
a list of models/objects whose class are determined by a model fitting from the previous step.
pi
the prior probabilities.
latent
number of the latent classes.
algorithm
the algorithm used (could be either 'em', 'sem' or 'cem').
obs
the number of observations.
post_pr
the posterior probabilities.
concomitant
a list of the concomitant model. It is empty if no concomitant model is used.
init.method
the initialization method used.
call
the matched call.
terms
the codeterms object used.
The em function for glmerMod
Description
The em function for glmerMod
Usage
## S3 method for class 'glmerMod'
em(
object,
latent = 2,
verbose = FALSE,
init.method = c("random", "kmeans", "hc"),
algo = c("em", "cem", "sem"),
max_iter = 500,
concomitant = list(...),
...
)
Arguments
object |
the model used, e.g. 'lm', 'glm', 'gnm'. |
latent |
the number of latent classes. |
verbose |
'True' to print the process of convergence. |
init.method |
the initialization method used in the model. The default method is 'random'. 'kmeans' is K-means clustering. 'hc' is model-based agglomerative hierarchical clustering. |
algo |
the algorithm used in em: 'em' the default EM algorithm, the classification em 'cem', or the stochastic em 'sem'. |
max_iter |
the maximum iteration for em algorithm. |
concomitant |
the formula to define the concomitant part of the model. The default is NULL. |
... |
arguments used in the 'model'. |
Value
An object of class 'em' is a list containing at least the following components:
models
a list of models/objects whose class are determined by a model fitting from the previous step.
pi
the prior probabilities.
latent
number of the latent classes.
algorithm
the algorithm used (could be either 'em', 'sem' or 'cem').
obs
the number of observations.
post_pr
the posterior probabilities.
concomitant
a list of the concomitant model. It is empty if no concomitant model is used.
init.method
the initialization method used.
call
the matched call.
terms
the codeterms object used.
The em function for 'panelmodel' such as 'plm'.
Description
The em function for 'panelmodel' such as 'plm'.
Usage
## S3 method for class 'panelmodel'
em(
object,
latent = 2,
verbose = FALSE,
init.method = c("random", "kmeans"),
algo = c("em", "cem", "sem"),
max_iter = 500,
concomitant = list(...),
...
)
Arguments
object |
the model used, e.g. 'lm', 'glm', 'gnm', 'plm'. |
latent |
the number of latent classes. |
verbose |
'True' to print the process of convergence. |
init.method |
the initialization method used in the model. The default method is 'random'. |
algo |
the algorithm used in em: the default EM algorithm, the classification em 'cem', or the stochastic em 'sem'. |
max_iter |
the maximum iteration for em algorithm. |
concomitant |
the formula to define the concomitant part of the model. The default is NULL. |
... |
arguments used in the 'model'. |
Value
An object of class 'em' is a list containing at least the following components:
models
a list of models/objects whose class are determined by a model fitting from the previous step.
pi
the prior probabilities.
latent
number of the latent classes.
algorithm
the algorithm used (could be either 'em', 'sem' or 'cem').
obs
the number of observations.
post_pr
the posterior probabilities.
concomitant
a list of the concomitant model. It is empty if no concomitant model is used.
init.method
the initialization method used.
call
the matched call.
terms
the codeterms object used.
This function performs an E-Step of EM Algorithm.
Description
This function performs an E-Step of EM Algorithm.
Usage
estep(models, pi_matrix)
Arguments
models |
models used in the EM algorithm, |
pi_matrix |
the pi matrix. |
Value
the fitting result for the model.
Fit the density function for a fitted model.
Description
This function generates the probability density of given models.
Usage
fit.den(object, ...)
Arguments
object |
the fitted model such as 'lm'. |
... |
other used arguments. |
Value
the density function.
Fit the density for the survival::clogit
Description
Fit the density for the survival::clogit
Usage
## S3 method for class 'coxph'
fit.den(object, ...)
Arguments
object |
the fitted model. |
... |
other used arguments. |
Value
the density function.
Fitting the density function using in 'fitdistrplus::fitdist()'
Description
Fitting the density function using in 'fitdistrplus::fitdist()'
Usage
## S3 method for class 'fitdist'
fit.den(object, ...)
Arguments
object |
the fitted model. |
... |
other used arguments. |
Value
the density function.
Fit the density function for a generalized linear regression model.
Description
Fit the density function for a generalized linear regression model.
Usage
## S3 method for class 'glm'
fit.den(object, ...)
Arguments
object |
the fitted model. |
... |
other used arguments. |
Value
the density function.
Fit the density function for a generalized linear mixed effect model.
Description
Fit the density function for a generalized linear mixed effect model.
Usage
## S3 method for class 'glmerMod'
fit.den(object, ...)
Arguments
object |
the fitted model. |
... |
other used arguments. |
Value
the density function.
Fit the density function for a generalized non-linear regression model.
Description
Fit the density function for a generalized non-linear regression model.
Usage
## S3 method for class 'gnm'
fit.den(object, ...)
Arguments
object |
the fitted model. |
... |
other used arguments. |
Value
the density function.
Fit the density function for a linear regression model.
Description
Fit the density function for a linear regression model.
Usage
## S3 method for class 'lm'
fit.den(object, ...)
Arguments
object |
the fitted model. |
... |
other used arguments. |
Value
the density function.
Fit the density function for a multinomial regression model.
Description
Fit the density function for a multinomial regression model.
Usage
## S3 method for class 'multinom'
fit.den(object, ...)
Arguments
object |
the fitted model. |
... |
other used arguments. |
Value
the density function.
Fit the density function for a 'nnet' model.
Description
Fit the density function for a 'nnet' model.
Usage
## S3 method for class 'nnet'
fit.den(object, ...)
Arguments
object |
the fitted model. |
... |
other used arguments. |
Value
the density function.
Fit the density function for a panel regression model.
Description
Fit the density function for a panel regression model.
Usage
## S3 method for class 'plm'
fit.den(object, ...)
Arguments
object |
the fitted model. |
... |
other used arguments. |
Value
the density function.
Flatten a data.frame or matrix by column or row with its name. The name will be transformed into the number of row/column plus the name of column/row separated by '.'.
Description
Flatten a data.frame or matrix by column or row with its name. The name will be transformed into the number of row/column plus the name of column/row separated by '.'.
Usage
flatten(x, by = c("col", "row"))
Arguments
x |
a data.frame or matrix. |
by |
either by column or by row. |
Value
a flattened vector with names
Initialization of EM algorithm
Description
Given a matrix with number of rows equal to the number of observation and number of columns equal to the number of latent classes, function 'init.em' generate the posterior probability using that matrix based on the method set by the user.
Usage
init.em(object, ...)
Arguments
object |
A matrix. |
... |
other used arguments. |
Value
The posterior probability matrix
model-based agglomerative hierarchical clustering
Description
model-based agglomerative hierarchical clustering
Usage
## S3 method for class 'hc'
init.em(object, ...)
Arguments
object |
A matrix. |
... |
other used arguments. |
Value
The posterior probability matrix
K-mean initialization
Description
K-mean initialization
Usage
## S3 method for class 'kmeans'
init.em(object, ...)
Arguments
object |
A matrix. |
... |
other used arguments. |
Value
The posterior probability matrix
Random initialization
Description
Random initialization
Usage
## S3 method for class 'random'
init.em(object, ...)
Arguments
object |
A matrix. |
... |
other used arguments. |
Value
The posterior probability matrix
Random initialization with weights
Description
Random initialization with weights
Usage
## S3 method for class 'random.weights'
init.em(object, ...)
Arguments
object |
A matrix. |
... |
other used arguments. |
Value
The posterior probability matrix
Initialization using sampling 5 times.
Description
Initialization using sampling 5 times.
Usage
## S3 method for class 'sample5'
init.em(object, ...)
Arguments
object |
A matrix. |
... |
other used arguments. |
Value
The posterior probability matrix
This function computes logLik of EM Algorithm.
Description
This function computes logLik of EM Algorithm.
Usage
## S3 method for class 'em'
logLik(object, ...)
Arguments
object |
an object of 'em'. |
... |
other used arguments. |
Value
the log-likelihood value
M-Step of EM algorithm
Description
This function performs an M-Step of EM Algorithm.
Usage
mstep(models, post_pr = NULL)
Arguments
models |
the models used in the EM algorithm |
post_pr |
the posterior probability. |
Value
the fitting result for the model.
The mstep for the concomitant model.
Description
This section was inspired by Flexmix.
Usage
mstep.concomitant(formula, data, postpr)
Arguments
formula |
the formula of the concomitant model. |
data |
the data or model.frame related to the concomitant model. |
postpr |
the posterior probability matrix. |
Value
the function returns a fitted nnet object.
The refit of for the concomitant model. This section was inspired by Flexmix.
Description
The refit of for the concomitant model. This section was inspired by Flexmix.
Usage
mstep.concomitant.refit(formula, data, postpr)
Arguments
formula |
the formula of the concomitant model. |
data |
the data or model.frame related to the concomitant model. |
postpr |
the posterior probability matrix. |
Value
the function returns a fitted multinom object.
Multiple run of EM algorithm
Description
Multiple run of EM algorithm
Usage
multi.em(object, ...)
Arguments
object |
the model to use in em, e.g. 'lm', 'glm', 'gnm' |
... |
arguments used in em. |
Value
return the 'em' object with the maximum log-likelihood.
Default generic for multi.em
Description
Default generic for multi.em
Usage
## Default S3 method:
multi.em(
object,
iter = 10,
parallel = FALSE,
num.cores = 2,
random.init = TRUE,
...
)
Arguments
object |
the model to use in em, e.g. 'lm', 'glm', 'gnm' |
iter |
number of iterations for running EM algorithm. |
parallel |
whether to use the parallel computing. |
num.cores |
number of cores used in the parallel computing. |
random.init |
whether to use a random initialization. |
... |
arguments used in em. |
Value
return the 'em' object with the maximum log-likelihood.
Plot the fitted results of EM algorithm
Description
This is the generic plot function for 'em' project. One can produce three types of graphs using this function 1. A graph of the predicted value distribution for each component. 2. A histogram of posterior probability distributions
Usage
## S3 method for class 'em'
plot(
x,
by = c("component", "prob"),
prior = FALSE,
cols = rep(1, length(x$models)),
lwds = rep(3, length(x$models)),
ltys = c(seq_len(length(x$models))),
ranges = NULL,
main = NULL,
lgd = list(),
lgd.loc = "topleft",
hist.args = list(main = "Histograms of posterior probabilities", xlab =
"Posterior Probabilities"),
...
)
Arguments
x |
the 'em' model to plot |
by |
the type of the graph to produce. The default is 'component'. |
prior |
whether fit the model using prior probabilities. |
cols |
lines' colors. |
lwds |
Lines' widths. |
ltys |
lines' types. |
ranges |
the ranges of the x-axis and the y-axis limits of plots. It should be a vector of four numeric values. The first two represent the x-axis limits. The last two represent the y-axis limits |
main |
the main title. |
lgd |
a list for legend related arguments. |
lgd.loc |
the location of the legend. The default is "topleft". |
hist.args |
The list of arguments for the histogram. |
... |
other arguments. |
Value
'NULL'
Predict the fitted finite mixture models
Description
Predict the fitted finite mixture models
Usage
## S3 method for class 'em'
predict(object, prob = c("prior", "posterior"), ...)
Arguments
object |
Output from |
prob |
the probabilities used to compute the fitted value. It can be either prior probability ('prior') or posterior probability ('posterior'). The default value is 'prior'. |
... |
other arguments. |
Value
An object of class 'predict.em' is a list containing at least the following components:
components
a list of fitted values by components with each element
a matrix/vector of fitted values.
mean
a matrix of predicted values computed by weighted sum of fitted values by components.
The weights used in the computation can be either prior probabilities or posterior probabilities
depending on the parameter 'prob'.
prob
the value used in the parameter 'prob'.
Print the 'em' object
Description
Print the 'em' object
Usage
## S3 method for class 'em'
print(x, digits = max(3L, getOption("digits") - 3L), ...)
Arguments
x |
the 'em' object. |
digits |
the maximum digits printed, the default is '3L'. |
... |
other arguments used. |
Value
print the 'em' object on the screen.
Print the 'summary.em' object
Description
Print the 'summary.em' object
Usage
## S3 method for class 'summary.em'
print(
x,
digits = max(3L, getOption("digits") - 3L),
signif.stars = getOption("show.signif.stars"),
...
)
Arguments
x |
the 'summary.em' object. |
digits |
the maximum digits printed, the default is '3L'. |
signif.stars |
logical; if 'TRUE', P-values are additionally encoded visually as 'significance stars' in order to help scanning of long coefficient tables. It defaults to the 'show.signif.stars' slot of options. |
... |
other augments used in 'printCoefmat'. |
Value
print the 'summary.em' object on the screen.
Simulated Data from a logistic regression
Description
A data set with simulated data from a mixture of a logistic regression.
Usage
simbinom
Format
A data frame with 10000 rows and 2 variables:
- y
A dependent variable generated from a mixture of a logistic regression with x
- x
An independent variable
Source
<https://www.github.com/wudongjie/em>
Simulated Data from a conditional logistic regression
Description
A data set with simulated data from a mixture of a conditional logistic regression.
Usage
simclogit
Format
A data frame with 10000 rows and 4 variables:
- x2
A dummy variable showing whether x is equal to level 2
- x3
A dummy variable showing whether x is equal to level 3
- a2
Whether the alternative choice 2 is chosen
- a2_x2
Interaction between a2 and x2
- a2_x3
Interaction between a2 and x3
- a3
Whether the alternative choice 3 is chosen
- a3_x2
Interaction between a3 and x2
- a3_x3
Interaction between a3 and x3
- chosen1
Whether the observation-alternative combination is chosen (Generated by a one-class regression).
- chosen2
Whether the observation-alternative combination is chosen (Generated by a two-class mixed regression).
- fid
Family ID
- id
Individual ID
- z
Other variables
Source
<https://www.github.com/wudongjie/em>
Simulated Regression Data
Description
A data set with simulated data from mixture regression models.
Usage
simreg
Format
A data frame with 1000 rows and 5 variables:
- yp
A dependent variable generated from a mixture of a poisson regression with x
- yn
A dependent variable generated from a mixture of a linear regression with x
- yc
A dependent variable generated from a mixture of a linear regression with x and a concomitant variable of z
- x
An independent variable
- z
A concomitant variable
Source
<https://www.github.com/wudongjie/em>
S-step of EM algorithm
Description
Given the posterior probability, generate a matrix to assign each individual to a class. The assignment is randomly sampled based on the posterior probability.
Usage
sstep(postpr)
Arguments
postpr |
('matrix()') |
Summaries of fitted finite mixture models using EM algorithm
Description
Summaries of fitted finite mixture models using EM algorithm
Usage
## S3 method for class 'em'
summary(object, ...)
Arguments
object |
Output from |
... |
other arguments used. |
Value
An object of class 'summary.em' is a list containing at least the following components:
call
the matched call.
coefficients
pi
the prior probabilities.
latent
number of the latent classes.
ll
log-likelihood value.
sum.models
summaries of models generated by 'summary()' of models from each class.
df
degree of freedom.
obs
number of observations.
AIC
the Akaike information criterion.
BIC
the Bayesian information criterion.
concomitant
a list of the concomitant model. It is empty if no concomitant model is used.
concomitant.summary
summaries of the concomitant model generated by 'summary()'.
Transform a factor variable to a matrix of dummy variables
Description
Transform a factor variable to a matrix of dummy variables
Usage
vdummy(x)
Arguments
x |
a factor vector |
Value
a matrix of dummy variables