Version: | 0.3-2 |
Date: | 2024-09-17 |
Title: | Aster Models |
Depends: | R (≥ 3.6.0), Matrix |
Imports: | stats |
Suggests: | aster |
ByteCompile: | TRUE |
Description: | Aster models are exponential family regression models for life history analysis. They are like generalized linear models except that elements of the response vector can have different families (e. g., some Bernoulli, some Poisson, some zero-truncated Poisson, some normal) and can be dependent, the dependence indicated by a graphical structure. Discrete time survival analysis, zero-inflated Poisson regression, and generalized linear models that are exponential family (e. g., logistic regression and Poisson regression with log link) are special cases. Main use is for data in which there is survival over discrete time periods and there is additional data about what happens conditional on survival (e. g., number of offspring). Uses the exponential family canonical parameterization (aster transform of usual parameterization). Unlike the aster package, this package does dependence groups (nodes of the graph need not be conditionally independent given their predecessor node), including multinomial and two-parameter normal as families. Thus this package also generalizes mark-capture-recapture analysis. |
License: | GPL-2 | GPL-3 [expanded from: GPL (≥ 2)] |
URL: | https://www.stat.umn.edu/geyer/aster/ |
NeedsCompilation: | yes |
Packaged: | 2024-09-17 20:47:12 UTC; geyer |
Author: | Charles J. Geyer [aut, cre] |
Maintainer: | Charles J. Geyer <geyer@umn.edu> |
Repository: | CRAN |
Date/Publication: | 2024-09-17 22:10:11 UTC |
Aster Models
Description
Aster models are exponential family graphical models that combine aspects of generalized linear models and survival analysis.
This package is still under development, only about half finished.
However, it does do maximum likelihood for unconditional aster models
with dependence groups, which the old package aster
does not.
The main differences between this package and the old package are as follows.
The old package had triple indices for model matrices. The first index ran over individuals, the second index over nodes of the graph for an individual, and the third index over regression coefficients. Consequently the model matrix was represented (sometimes, but not consistently) as a three-dimensional array rather than a matrix, which was very confusing, even to the package author. This package ignores individuals, one index runs over all nodes of the combined graph for all individuals. Thus model matrices are always matrices.
The old package did not implement dependence groups, although they were described in Geyer, Wagenius and Shaw (2007). This package does. Consequently, this package requires a data frame, a vector
pred
that indicates predecessors, a vectorgroup
that indicates individuals in the same dependence group, and a vectorfam
that indicates families to specify a saturated aster model (the old package required only the data frame,pred
, andfam
). To facilitate the old style model specification, there is a new functionasterdata
that constructs objects of class"asterdata"
given an old style data frame,pred
, andfam
. All other functions of the package take objects of class"asterdata"
as model specifications.The function
predict.aster
in the old package did some parameter transformations, but not all, and the returned value, when a list, had a componentgradient
, that was undocumented but useful in applying the delta method. The functionstransformSaturated
,transformConditional
, andtransformUnconditional
in this package transform between any of the following parameter vectors: the conditional canonical parameter\theta
, the unconditional canonical parameter\varphi
, the conditional mean value parameter\xi
, the unconditional mean value parameter\mu
, the canonical affine submodel canonical parameter\beta
, and (unconditional aster models only) the canonical affine submodel mean value parameter\tau
(this last parameter is new, not discussed in the cited papers below, it is\tau = M^T \mu
, whereM
is the model matrix). The change of parameter from\tau
to\beta
is equivalent to maximum likelihood estimation for an unconditional aster model when the value\tau = M^T y
is used, wherey
is the response vector. All of these transformation functions also compute derivatives, if requested. See examples.
Bugs
Functions analogous to aster
, anova
, and predict
in the old package are missing, thus model fitting, hypothesis tests,
and confidence intervals are more cumbersome. In fact, since there is
no function to calculate log likelihoods (like mlogl
in the old
package), there is no way to do likelihood ratio tests (but Rao or Wald
tests could be done, for unconditional aster models, since the derivative
of the log likelihood is observed minus expected
M^T (y - \mu)
.
References
Geyer, C. J., Wagenius, S., and Shaw, R. G. (2007) Aster Models for Life History Analysis. Biometrika 94 415–426.
Shaw, R. G., Geyer, C. J., Wagenius, S., Hangelbroek, H. H. and Etterson, J. R. (2008) Unifying Life History Analyses for Inference of Fitness and Population Growth. American Naturalist, 172, E35–E47.
See Also
asterdata
, transformSaturated
,
families
Examples
## Not run: # perfectly good example but takes longer to run than CRAN allows
data(echinacea)
#### estimate MLE (simpler model than in Biometrika paper cited, not as good)
hdct <- as.numeric(grepl("hdct", as.character(echinacea$redata$varb)))
modmat <- model.matrix(resp ~ varb + nsloc + ewloc + pop * hdct - pop,
data = echinacea$redata)
tau.hat <- as.numeric(t(modmat) %*% echinacea$redata$resp)
beta.hat <- transformUnconditional(tau.hat, modmat, echinacea,
from = "tau", to = "beta")
inverse.fisher <- jacobian(tau.hat, echinacea, transform = "unconditional",
from = "tau", to = "beta", modmat = modmat)
#### now have MLE (beta.hat) and pseudo-inverse of Fisher information
#### (inverse.fisher), pseudo-inverse because modmat is not full rank
foo <- cbind(beta.hat, sqrt(diag(inverse.fisher)))
foo <- cbind(foo, foo[ , 1]/foo[ , 2])
foo <- cbind(foo, 2 * pnorm(- abs(foo[ , 3])))
dimnames(foo) <- list(colnames(modmat),
c("Estimate", "Std. Error", "z value", "Pr(>|z|)"))
printCoefmat(foo)
#### coefficients constrained to be zero because parameterization is not
#### identifiable have estimate zero and std. error zero (and rest NA)
#### estimate fitness in populations
#### generate new data with one individual in each pop at location (0, 0)
pop.names <- levels(echinacea$redata$pop)
pop.idx <- match(pop.names, as.character(echinacea$redata$pop))
pop.id <- echinacea$redata$id[pop.idx]
newdata <- subset(echinacea, echinacea$redata$id %in% pop.id)
newdata$redata[ , "nsloc"] <- 0
newdata$redata[ , "ewloc"] <- 0
hdct <- as.integer(grepl("hdct", as.character(newdata$redata$varb)))
#### modmat for new data
newmodmat <- model.matrix(resp ~ varb + nsloc + ewloc + pop * hdct - pop,
data = newdata$redata)
#### matrix that when multiplied mean value parameter vector gives fitness
#### in each pop
amat <- matrix(NA, nrow = length(pop.id), ncol = nrow(newmodmat))
for (i in 1:nrow(amat))
amat[i, ] <- as.numeric(grepl(paste("^", pop.id[i], ".hdct", sep = ""),
rownames(newmodmat)))
#### transform to expected fitness parameters
efit <- transformUnconditional(beta.hat, newmodmat, newdata,
from = "beta", to = "mu")
efit <- as.numeric(amat %*% efit)
#### jacobian matrix of this transformation
jack <- jacobian(beta.hat, newdata, transform = "unconditional",
from = "beta", to = "mu", modmat = newmodmat)
#### delta method standard errors
sefit <- sqrt(diag(amat %*% jack %*% inverse.fisher %*% t(jack) %*% t(amat)))
foo <- cbind(efit, sefit)
dimnames(foo) <- list(pop.names, c("Est. fitness", "Std. Error"))
print(foo)
## End(Not run)
Change-of-Parameter Functions for Aster Models
Description
Calculate a change-of-parameter for an aster model or the derivative of such a change-of-parameter. Validate certain parameter vectors.
Usage
transformSaturated(parm, data, from = c("theta", "phi", "xi", "mu"),
to = c("theta", "phi", "xi", "mu"), differential,
model.type = c("unconditional", "conditional"),
tolerance = 8 * .Machine$double.eps)
transformConditional(parm, modmat, data, from = "beta",
to = c("theta", "phi", "xi", "mu"), differential,
offset, tolerance = 8 * .Machine$double.eps)
transformUnconditional(parm, modmat, data, from = c("beta", "tau"),
to = c("beta", "theta", "phi", "xi", "mu", "tau"),
differential, offset, tolerance = 8 * .Machine$double.eps)
jacobian(parm, data,
transform = c("saturated", "conditional", "unconditional"),
from = c("beta", "theta", "phi", "xi", "mu", "tau"),
to = c("beta", "theta", "phi", "xi", "mu", "tau"),
modmat, offset, tolerance = 8 * .Machine$double.eps)
validtheta(data, theta, model.type = c("unconditional", "conditional"),
tolerance = 8 * .Machine$double.eps)
is.validtheta(data, theta, model.type = c("unconditional", "conditional"),
tolerance = 8 * .Machine$double.eps)
validxi(data, xi, model.type = c("unconditional", "conditional"),
tolerance = 8 * .Machine$double.eps)
is.validxi(data, xi, model.type = c("unconditional", "conditional"),
tolerance = 8 * .Machine$double.eps)
Arguments
parm |
parameter vector to transform,
a numerical vector of length |
data |
an object of class |
from |
the kind of parameter which |
to |
the kind of parameter to which |
differential |
if not missing a numeric vector of the same length
as |
modmat |
the model matrix for a canonical affine submodel, a
numerical matrix having |
offset |
the offset vector for a canonical affine submodel, a
numerical vector of length |
theta |
conditional canonical parameter vector to validate,
a numerical vector of length |
xi |
conditional canonical parameter vector to validate,
a numerical vector of length |
model.type |
which kind of model (see Details section). May be abbreviated. |
tolerance |
numeric >= 0. Relative errors smaller
than |
transform |
the “transform” function that will be called to
calculate derivatives, e. g., |
Details
If differential
is missing, the returned value is a new parameter
vector of the specified type. If differential
is not missing,
the returned value is the derivative evaluated at parm
and differential
, that is, if f
is the change-of variable
function and \psi
is the from
parameter, then
f(\psi)
is calculated when the differential is missing and
f'(\psi)(\delta)
is calculated when the
differential \delta
is not missing, where the latter is defined by
f(\psi + \delta) \approx f(\psi) + f'(\psi)(\delta)
for small \delta
.
The kinds of parameters are "theta"
the conditional canonical parameter
for the saturated model, "phi"
the unconditional canonical parameter
for the saturated model, "xi"
the conditional mean value parameter
for the saturated model, "mu"
the unconditional mean value parameter
for the saturated model,
"beta"
the regression coefficient parameter for a canonical affine
submodel (\theta = a + M \beta
for a conditional
canonical affine submodel or
\varphi = a + M \beta
for an unconditional
canonical affine submodel, where a
is the offset vector
and M
is the model matrix),
"tau"
the mean value parameter for an unconditional canonical affine
submodel (\tau = M^T \mu
,
where M
is the model matrix).
Only the conditional canonical parameter vector \theta
and
the conditional mean value parameter vector \xi
can be checked
directly. (To check the validity of another parameter, transform to one
of these and check that.) This means that in conversions to these parameters
the output vector is checked rather than the input vector, and conversions
(apparently) not involving these parameters (which do go through these
parameters inside the transformation function) a conversion to one of
these parameters is what is checked rather than the input vector.
There is a difference between conditional and unconditional aster models
in the way they treat zero predecessors. For a conditional aster model,
if the observed value of the predecessor is zero, then the successor is
zero almost surely and can have any parameter value for \theta
or \xi
. For an unconditional aster model,
if the expected value of the predecessor is zero, then the successor is
zero almost surely and can have any parameter value for \theta
or \xi
.
Since zero values are not allowed at initial nodes (not
considered valid by the function validasterdata
), the only
way predecessor data can be zero almost surely in an unconditional aster model
is if the delta vector (data$redelta
) is not zero so we have a limiting
model.
The function jacobian
turns the derivative considered as
a linear transformation calculated by the “transform” functions
into the matrix that represents the linear transformation (sometimes
called the Jacobian matrix of the transformation). The arguments
modmat
and offset
are only used if
transform == "conditional"
or transform == "unconditional"
,
and (as with the “transform” functions) the argument offset
may be missing, in which case the zero vector is used. Not all of the
candidate values for from
and to
arguments
for the jacobian
function are valid: the value must be valid for
the “transform” function that will be called.
Value
a numeric vector of the same length as parm
. The new parameter if
deriv == FALSE
or the transform of the differential
if deriv = TRUE
. See details.
See Also
Examples
data(echinacea)
theta <- rnorm(nrow(echinacea$redata), 0, 0.1)
phi <- transformSaturated(theta, echinacea, from = "theta", to = "phi")
## rarely (if ever) want jacobian for unsaturated model transform
## result here is 5130 by 5130 matrix
## Not run: jack <- jacobian(theta, echinacea, from = "theta", to = "phi")
Object Describing Saturated Aster Model
Description
Functions to construct and test conformance to the contract for objects
of class "asterdata"
. All other functions in this package take
model descriptions of this form.
Usage
asterdata(data, vars, pred, group, code, families, delta,
response.name = "resp", varb.name = "varb",
tolerance = 8 * .Machine$double.eps)
validasterdata(object, tolerance = 8 * .Machine$double.eps)
is.validasterdata(object, tolerance = 8 * .Machine$double.eps)
Arguments
data |
a data frame containing response and predictor variables for the aster model. |
vars |
a character vector containing names of variables in the data
frame |
pred |
an integer vector satisfying |
group |
an integer vector satisfying The lines indicate a transitive relation. If there is a line from
node For example, if nodes |
code |
an integer vector satisfying all(code %in% seq(along = families) Node Note that |
families |
a list of family specifications
(see |
delta |
a numeric vector satisfying |
response.name |
a character string giving the name of the response vector. |
varb.name |
a character string giving the name of the factor covariate
that says which of the variables in the data frame |
tolerance |
numeric >= 0. Relative errors smaller
than |
object |
an object of class |
Details
Response variables in dependence groups are taken to be in the order they appear in the response vector. The first to appear in the response vector is the first canonical statistic for the dependence group distribution, the second to appear the second canonical statistic, and so forth. The number of response variables in the dependence group must match the dimension of the dependence group distribution.
This function only handles the usual case where the subgraph for every
individual is isomorphic to subgraph for every other individual
and all initial nodes (formerly
called root nodes) correspond to the constant one. Each row of data
is the data for one individual. The vectors vars
, pred
,
group
, code
, and delta
(if not missing) describe
the subgraph for one individual (which is the same for all individuals).
In other cases for which this function does not have the flexibility to
construct the appropriate object of class "asterdata"
, such an
object will have to be constructed “by hand” using R statements
not involving this function or modifying an object produced by this
function. See the following section for description of such objects.
The functions validasterdata
and is.validasterdata
can be
used to check whether objects constructed “by hand” have been
constructed correctly.
Value
an object of class "asterdata"
is a list containing the
following components
redata |
a data frame having |
repred |
an integer vector satisfying
Note that
|
initial |
a numeric vector specifying constants associated with
initial nodes (formerly called root nodes) of the graphical model
for all individuals. If |
regroup |
an integer vector satisfying
It also requires that the dimension of the family specified by
The lines indicate a transitive relation. If there is a line from
node For example, if nodes Note that
|
recode |
an integer vector satisfying
all(recode %in% seq(along = families) Node Note that |
families |
a copy of the argument of the same name of this function
except that any character string abbreviations are converted to objects
of class |
redelta |
a numeric vector satisfying
Note that
|
response.name |
a character string giving the name of the response
variable in |
varb.name |
a character string giving the name of the “varb”
variable in |
In addition an object of class "asterdata"
may contain (and those
constructed by this function do contain) components
pred
, group
, and code
,
which are copies of the arguments of the same names of this function.
Objects of class "asterdata"
not constructed by this function need
not contain these additional components, since they may make no sense if
the graph for all individuals is not the repetition of isomorphic subgraphs,
one for each individual.
See Also
Examples
data(test1)
fred <- asterdata(test1, vars = c("m1", "n1", "n2"), pred = c(0, 1, 1),
group = c(0, 0, 2), code = c(1, 2, 2),
families = list("bernoulli", "normal.location.scale"))
is.validasterdata(fred)
Constancy Spaces for Aster Models
Description
Produce basis for constancy space of an aster model. Test whether the difference of two canonical parameter vectors is in the constancy space (so the two parameter vectors correspond to the same probability model).
Usage
constancy(data, parm.type = c("theta", "phi"))
is.same(parm1, parm2, data, parm.type = c("theta", "phi"),
tolerance = sqrt(.Machine$double.eps))
Arguments
data |
an object of class |
parm.type |
the parametrization for which the constancy space is wanted. |
parm1 |
a parameter vector of the type specified by |
parm2 |
another parameter vector of the type specified
by |
tolerance |
numeric >= 0. Relative errors smaller
than |
Details
There is no need for functions to test whether different mean value parameters
(\xi
or \mu
) correspond to the same probability
distribution because these parametrizations are identifiable (different valid
parameter vectors correspond to different probability distributions).
Value
for is.same
a logical value;
for constancy
a matrix whose rows constitute a basis for the constancy space.
This means that if \delta
is a linear combination of rows
of this matrix then for all real s
the distributions having parameter
vectors \psi
and \psi + s \delta
are the
same, where \psi = \theta
or \psi = \varphi
depending on whether
parm.type = "theta"
or parm.type = "phi"
.
Conversely, if \psi_1
and \psi_2
are valid parameter
vectors of the same type, then they correspond to the same probability
distribution only if \psi_1 - \psi_2
is a linear
combination of rows of this matrix.
See Also
Examples
data(test1)
fred <- asterdata(test1,
vars = c("m1", "m2", "m3", "n1", "n2", "b1", "p1", "z1"),
pred = c(0, 0, 0, 1, 1, 2, 3, 6), group = c(0, 1, 2, 0, 4, 0, 0, 0),
code = c(1, 1, 1, 2, 2, 3, 4, 5),
families = list(fam.multinomial(3), "normal.location.scale",
"bernoulli", "poisson", "zero.truncated.poisson"))
cmat <- constancy(fred, parm.type = "phi")
Cumulant Functions for Aster Models
Description
Calculate cumulant function and up to three derivatives for families known to the package.
Usage
cumulant(theta, fam, deriv = 0, delta)
Arguments
theta |
canonical parameter value. |
fam |
an object of class |
deriv |
the number of derivatives wanted. Must be nonnegative integer less than or equal to three. |
delta |
direction in which limit is taken. Cumulant
function is for family that is limit of family specified, limit
being for distributions with parameter
|
Value
a list containing some of the following components:
zeroth |
the value of the cumulant function at |
first |
the value of the first derivative at |
second |
the value of the second derivative at |
third |
the value of the third derivative at |
Note
Not intended for use by ordinary users. Provides R interface for testing to C code called by many other functions in the package.
See Also
Examples
cumulant(-0.5, fam.bernoulli(), deriv = 3)
cumulant(-0.5, fam.bernoulli(), deriv = 3, delta = 1)
Life History Data on Echinacea angustifolia
Description
Data on life history traits for the purple coneflower Echinacea angustifolia
Usage
data(echinacea)
Format
An object of class "asterdata"
(see asterdata
)
comprising records for 570 plants observed over three years.
Nodes of the graph for one individual are associated with the variables
(levels of the factor echinacea$redata$varb
)
- ld02
Indicator of being alive in 2002. Bernoulli, predecessor the constant one.
- ld03
Ditto for 2003. Bernoulli, predecessor
ld02
.- ld04
Ditto for 2004. Bernoulli, predecessor
ld03
.- fl02
Indicator of flowering 2002. Bernoulli, predecessor
ld02
.- fl03
Ditto for 2003. Bernoulli, predecessor
ld03
.- fl04
Ditto for 2004. Bernoulli, predecessor
ld04
.- hdct02
Count of number of flower heads in 2002. Zero-truncated Poisson, predecessor
fl02
.- hdct03
Ditto for 2003. Zero-truncated Poisson, predecessor
fl03
.- hdct04
Ditto for 2004. Zero-truncated Poisson, predecessor
fl04
.
Covariates are
- pop
the remnant population of origin of the plant (all plants were grown together,
pop
encodes ancestry).- ewloc
east-west location in plot.
- nsloc
north-south location in plot.
Details
This is the data for the example in Geyer, Wagenius, and Shaw (2007).
These data were included in the R package aster
which was the
predecessor of this package as the dataset echinacea
.
Source
Stuart Wagenius, https://www.chicagobotanic.org/research/staff/wagenius
References
Geyer, C. J., Wagenius, S., and Shaw, R. G. (2007) Aster Models for Life History Analysis. Biometrika 94 415–426.
Examples
data(echinacea)
names(echinacea)
names(echinacea$redata)
levels(echinacea$redata$varb)
Families for Aster Models
Description
Families known to the package. These functions construct simple family specifications used in specifying aster models. Statistical properties of these families are described.
Usage
fam.bernoulli()
fam.poisson()
fam.zero.truncated.poisson()
fam.normal.location.scale()
fam.multinomial(dimension)
Arguments
dimension |
the dimension (number of categories) for the multinomial distribution. |
Details
Currently implemented families are
"bernoulli"
Bernoulli (binomial with sample size one). The distribution of any zero-or-one-valued random variable
Y
, which is the canonical statistic. The mean value parameter is\mu = E(Y) = \Pr(Y = 1).
The canonical parameter is
\theta = \log(\mu) - \log(1 - \mu)
, also called logit of\mu
. The cumulant function isc(\theta) = \log(1 + e^\theta).
This distribution has degenerate limiting distributions. The lower limit as
\theta \to - \infty
is the distribution concentrated at zero, having cumulant function which is the constant function everywhere equal to zero. The upper limit as\theta \to + \infty
is the distribution concentrated at one, having cumulant function which is the identity function satisfyingc(\theta) = \theta
for all\theta
.For predecessor (sample size)
n
, the successor is the sum ofn
independent and identically distributed (IID) Bernoulli random variables, that is, binomial with sample sizen
. The mean value parameter isn
times the mean value parameter for sample size one; the cumulant function isn
times the cumulant function for sample size one; the canonical parameter is the same for all sample sizes."poisson"
Poisson. The mean value parameter
\mu
is the mean of the Poisson distribution. The canonical parameter is\theta = \log(\mu)
. The cumulant function isc(\theta) = e^\theta.
This distribution has a degenerate limiting distribution. The lower limit as
\theta \to - \infty
is the distribution concentrated at zero, having cumulant function which is the constant function everywhere equal to zero. There is no upper limit because the canonical statistic is unbounded above.For predecessor (sample size)
n
, the successor is the sum ofn
IID Poisson random variables, that is, Poisson with meann \mu
. The mean value parameter isn
times the mean value parameter for sample size one; the cumulant function isn
times the cumulant function for sample size one; the canonical parameter is the same for all sample sizes."zero.truncated.poisson"
Poisson conditioned on being greater than zero. Let
m
be the mean of the corresponding untruncated Poisson distribution. Then the canonical parameters for both truncated and untruncated distributions are the same\theta = \log(m)
. The mean value parameter for the zero-truncated Poisson distribution is\mu = \frac{m}{1 - e^{- m}}
and the cumulant function is
c(\theta) = m + \log(1 - e^{- m}),
where
m
is as defined above, som = e^\theta
.This distribution has a degenerate limiting distribution. The lower limit as
\theta \to - \infty
is the distribution concentrated at one, having cumulant function which is the identity function satisfyingc(\theta) = \theta
for all\theta
. There is no upper limit because the canonical statistic is unbounded above.For predecessor (sample size)
n
, the successor is the sum ofn
IID zero-truncated Poisson random variables, which is not a brand-name distribution. The mean value parameter isn
times the mean value parameter for sample size one; the cumulant function isn
times the cumulant function for sample size one; the canonical parameter is the same for all sample sizes."normal.location.scale"
The distribution of a normal random variable
X
with unknown meanm
and unknown variancev
. Thought of as an exponential family, this is a two-parameter family, hence must have a two-dimensional canonical statisticY = (X, X^2)
. The canonical parameter vector\theta
has components\theta_1 = \frac{m}{v}
and
\theta_2 = - \frac{1}{2 v}.
The value of
\theta_1
is unrestricted, but\theta_2
must be strictly negative. The mean value parameter vector\mu
has components\mu_1 = m = - \frac{\theta_1}{2 \theta_2}
and
\mu_2 = v + m^2 = - \frac{1}{2 \theta_2} + \frac{\theta_1^2}{4 \theta_2^2}.
The cumulant function is
c(\theta) = - \frac{\theta_1^2}{4 \theta_2} + \frac{1}{2} \log\left(- \frac{1}{2 \theta_2}\right).
This distribution has no degenerate limiting distributions, because the canonical statistic is a continuous random vector so the boundary of its support has probability zero.
For predecessor (sample size)
n
, the successor is the sum ofn
IID random vectors(X_i, X_i^2)
, where eachX_i
is normal with meanm
and variancev
, and this is not a brand-name multivariate distribution (the first component of the sum is normal, the second component noncentral chi-square, and the components are not independent). The mean value parameter vector isn
times the mean value parameter vector for sample size one; the cumulant function isn
times the cumulant function for sample size one; the canonical parameter vector is the same for all sample sizes."multinomial"
Multinomial with sample size one. The distribution of any random vector
Y
having all components zero except for one component which is one (Y
is the canonical statistic vector). The mean value parameter is the vector\mu = E(Y)
having components\mu_i = E(Y_i) = \Pr(Y_i = 1).
The mean value parameter vector
\mu
is given as a function of the canonical parameter vector\theta
by\mu_i = \frac{e^{\theta_i}}{\sum_{j = 1}^d e^{\theta_j}},
where
d
is the dimension ofY
and\theta
and\mu
. This transformation is not one-to-one; adding the same number to each component of\theta
does not change the value of\mu
. The cumulant function isc(\theta) = \log\left(\sum_{j = 1}^d e^{\theta_j}\right).
This distribution is degenerate. The sum of the components of the canonical statistic is equal to one with probability one, which implies the nonidentifiability of the
d
-dimensional canonical parameter vector mentioned above. Hence one parameter (at least) is always constrained to to be zero in fitting an aster model with a multinomial family.This distribution has many degenerate distributions. For any vector
\delta
the limit of distributions having canonical parameter vectors\theta + s \delta
ass \to \infty
exists and is another multinomial distribution (the limit distribution in the direction\delta
). LetA
be the set ofi
such that\delta_i = \max(\delta)
, where\max(\delta)
denotes the maximum over the components of\delta
. Then the limit distribution in the direction\delta
has componentsY_i
of the canonical statistic fori \notin A
concentrated at zero. The cumulant function of this degenerate distribution isc(\theta) = \log\left(\sum_{j \in A} e^{\theta_j}\right).
The canonical parameters
\theta_j
forj \notin A
are not identifiable, and one other canonical parameter is not identifiable because of the constraint that the sum of the components of the canonical statistic is equal to one with probability one.For predecessor (sample size)
n
, the successor is the sum ofn
IID multinomial-sample-size-one random vectors, that is, multinomial with sample sizen
. The mean value parameter isn
times the mean value parameter for sample size one; the cumulant function isn
times the cumulant function for sample size one; the canonical parameter is the same for all sample sizes.
Value
a list of class "astfam"
giving name and values of any
hyperparameters.
Examples
fam.bernoulli()
fam.multinomial(4)
Life History Data on Manduca sexta
Description
Data on life history traits for the tobacco hornworm Manduca sexta
Usage
data(hornworm)
Format
An object of class "asterdata"
(see asterdata
)
comprising records for 162 insects (54 female, 68 male, and 40 for which
there was no opportunity to determine sex) observed over 40 days.
Nodes of the graph for one individual are associated with the variables
(levels of the factor hornworm$redata$varb
) in dependence groups
- P
Bernoulli. Predecessor 1 (initial node). Indicator of pupation.
- T330, T331, T332
Three-dimensional multinomial dependence group. Predecessor
P
.
- T330
Indicator of death after pupation. In these data, all deaths after pupation are considered to have happened on day 33 regardless of when they occurred (because the actual day of death was not recorded in the original data).
- T331
Indicator of survival to day 33 but still pre-eclosion.
- T332
Indicator of eclosion (emergence from pupa as adult moth on day 33.
- B33
Zero-truncated Poisson. Predecessor
T332
. Count of ovarioles on day 33. Only females have this node in their graphs.- Tx1, Tx2
For
x
= 34, ..., 40. Two-dimensional multinomial dependence group. PredecessorTw1
, wherew = x - 1
.
- Tx1
Indicator of survival to day
x
but still pre-eclosion.- Tx2
Indicator of eclosion (emergence from pupa as adult moth on day
x
.
- Bx
Zero-truncated Poisson. Predecessor
Tx2
. Count of ovarioles on dayx
. Only females have these nodes in their graph.
Covariates are
- Sex
a factor.
F
is known female,M
is known male,U
is unknown (no opportunity to observe).- Time_2nd
time (in weeks) to reach the 2nd instar stage. Larval instars are stages between molts (shedding of exoskeleton) of the larval form (caterpillar).
- Mass_2nd
mass (in grams) at the 2nd instar stage.
- Mass_Repro
mass (in grams) at eclosion.
- LarvaID
name of an individual in the original data.
Details
This is the data described by and analyzed by non-aster methods by Kingsolver et al. (2012) and re-analyzed using this package by Eck et al. (submitted).
For an illustration of the graph, see Figure 1 in Eck et al. (submitted).
In the description above, a concrete example of the x
and w
notation is that T351 and T352 form a two-dimensional multinomial dependence
group, the predecessor of which is T341, and B35 is a dependence group all
by itself, its predecessor being T352.
Every multinomial dependence group acts like a switch. If the predecessor
is one, the dependence group is multinomial with sample size one (exactly
one variable is one and the rest are zero). So this indicates which way
the life history goes. If the predecessor is zero, then all successors are
zero. This goes for all variables in any aster model. If Tx2
is zero,
then so is Bx
. The ovariole count is zero except for the day on
which the individual eclosed.
Source
Joel Kingsolver https://bio.unc.edu/people/faculty/kingsolver/
References
Kingsolver, J. G., Diamond, S. E., Seiter, S. A., and Higgins, J. K. (2012) Direct and indirect phenotypic selection on developmental trajectories in Manduca sexta. Functional Ecology 26 598–607.
Eck, D., Shaw, R. G., Geyer, C. J., and Kingsolver, J. (submitted) An integrated analysis of phenotypic selection on insect body size and development time.
Examples
data(hornworm)
names(hornworm)
names(hornworm$redata)
levels(hornworm$redata$varb)
Link Functions for Aster Models
Description
Calculate link function and up to one derivative for families known to the package.
Usage
link(xi, fam, deriv = 0, delta)
Arguments
xi |
mean value parameter value, a numeric vector. |
fam |
an object of class |
deriv |
the number of derivatives wanted. Must be either zero or one. |
delta |
direction in which limit is taken. Link
function is for family that is limit of family specified, limit
being for distributions with canonical parameter
|
Value
a list containing some of the following components:
zeroth |
the value of the link function at |
is the dimension of \xi
.
first |
the value of the first derivative at |
Note
Not intended for use by ordinary users. Provides R interface for testing to C code called by many other functions in the package.
See Also
Examples
link(0.3, fam.bernoulli(), deriv = 1)
link(0.3, fam.bernoulli(), deriv = 1, delta = 1)
Subset Object Describing Saturated Aster Model
Description
Subset an object of class "asterdata"
,
for which see asterdata
.
Usage
## S3 method for class 'asterdata'
subset(x, subset, successors = TRUE, ...)
Arguments
x |
an object of class |
subset |
a logical vector indicating nodes of the graph to keep: missing values are taken as false. |
successors |
a logical scalar indicating whether the subgraph must be a union of connected components of the original graph, that is, if all successors of nodes in the subset must also be in the subset. |
... |
further arguments, which are ignored (this argument is required
for methods of the generic function |
Details
Argument subset
is a logical vector of the same length as the number
of nodes in the graph specified by argument x
. It indicates the
subset of nodes in the subgraph wanted. The subgraph must be closed with
respect to predecessors (all predecessors of nodes in the subset are also
in the subset) and if successors = TRUE
with respect
to successors (all successors of nodes in the subset are also in the subset).
And similarly for dependence groups: each dependence
group in the original graph must have all or none of its elements
in the subgraph.
Value
an object of class "asterdata"
that represents the aster model having
subgraph with nodes specified by subset
.
See Also
Examples
data(echinacea)
#### select one individual from each level of pop
foo <- echinacea$redata$pop
bar <- match(levels(foo), as.character(foo))
baz <- is.element(echinacea$redata$id, echinacea$redata$id[bar])
out <- subset(echinacea, baz)
Test Data
Description
Test data of no biological interest. Does have all families implemented at the time the test data was created. No predictor variables.
Usage
data(test1)
Format
A data frame with 100 observations on the following 8 variables.
m1
a numeric vector, part of a multinomial dependence group (with
m2
andm3
). Predecessor of this group is the constant 1.m2
a numeric vector.
m3
a numeric vector.
n1
a numeric vector, part of a normal location-scale dependence group (with
n2
). Predecessor of this group ism1
.n2
a numeric vector (actually
n1^2
).b1
a numeric vector, Bernoulli. Predecessor is
m2
.p1
a numeric vector, Poisson. Predecessor is
m3
.z1
a numeric vector, zero-truncated Poisson. Predecessor is
b1
.
Source
created by R script test1.R
in directory makedata
of the
installation directory for this package.
Examples
data(test1)
fred <- asterdata(test1,
vars = c("m1", "m2", "m3", "n1", "n2", "b1", "p1", "z1"),
pred = c(0, 0, 0, 1, 1, 2, 3, 6), group = c(0, 1, 2, 0, 4, 0, 0, 0),
code = c(1, 1, 1, 2, 2, 3, 4, 5),
families = list(fam.multinomial(3), "normal.location.scale",
"bernoulli", "poisson", "zero.truncated.poisson"))