Title: Random-Effects Stochastic Reaction Networks
Version: 1.0.1
Description: A random-effects stochastic model that allows quick detection of clonal dominance events from clonal tracking data collected in gene therapy studies. Starting from the Ito-type equation describing the dynamics of cells duplication, death and differentiation at clonal level, we first considered its local linear approximation as the base model. The parameters of the base model, which are inferred using a maximum likelihood approach, are assumed to be shared across the clones. Although this assumption makes inference easier, in some cases it can be too restrictive and does not take into account possible scenarios of clonal dominance. Therefore we extended the base model by introducing random effects for the clones. In this extended formulation the dynamic parameters are estimated using a tailor-made expectation maximization algorithm. Further details on the methods can be found in L. Del Core et al., (2022) <doi:10.1101/2022.05.31.494100>.
License: GPL-3
Encoding: UTF-8
RoxygenNote: 7.2.3.9000
Imports: Matrix, xtable, scales, stringr, ggplot2, scatterpie, RColorBrewer
Depends: R (≥ 2.10)
LazyData: true
Suggests: R.rsp
VignetteBuilder: R.rsp
NeedsCompilation: no
Packaged: 2024-02-15 10:41:07 UTC; pmzld
Author: Luca Del Core ORCID iD [aut, cre, cph], Marco Grzegorczyk ORCID iD [aut, ths], Ernst Wit ORCID iD [aut, ths]
Maintainer: Luca Del Core <l.del.core@rug.nl>
Repository: CRAN
Date/Publication: 2024-02-15 11:00:02 UTC

Bhattacharyya distance

Description

Bhattacharyya distance between P and Q for multivariate normal distributions

Usage

BhattDist(mu1, S1, mu2, S2)

Arguments

mu1

mean vector of P.

S1

covariance matrix of P.

mu2

mean vector of Q.

S2

covariance matrix of Q.

Value

Bhattacharyya distance between P and Q.


Kullback-Leibler divergence

Description

Kullback-Leibler divergence KL(P \Vert Q) of Q from P for multivariate normal distributions

Usage

KLDiv(mu1, S1, mu2, S2)

Arguments

mu1

mean vector of P.

S1

covariance matrix of P.

mu2

mean vector of Q.

S2

covariance matrix of Q.

Value

Kullback-Leibler divergence KL(P \Vert Q) of Q from P.


E[u \vert y] and V[u \vert y]

Description

Conditional expectation E[u \vert y] and variance V[u \vert y] of the latent states u given the observed states y

Usage

VEuy(theta_curr, M, M_bdiag, y, V, VCNs, nObs)

Arguments

theta_curr

current p-dimensional vector parameter.

M

A n \times K dimensional (design) matrix.

M_bdiag

An \times Jp dimensional block-diagonal design matrix. Each j-th block (j = 1,\dots,J) is a n_j \times p dimensional design matrix for the j-th clone.

y

n-dimensional vector of the time-adjacent cellular increments

V

A p \times K dimensional net-effect matrix.

VCNs

A n-dimensional vector including values of the vector copy number corresponding to the cell counts of y.

nObs

A K-dimensional vector including the frequencies of each clone k (k = 1,\dots,K).

Value

the conditional expectation E[u \vert y] and variance V[u \vert y] of the latent states u given the observed states y.


Rhesus Macaque clonal tracking dataset

Description

A dataset containing clonal tracking cell counts from a Rhesus Macaque study.

Usage

Y_RM

Format

A list containing clonal tracking data for each animal (ZH33, ZH17, ZG66). Each clonal tracking dataset is a 3-dimensional array whose dimensions identify

1

time, in months

2

cell types: T, B, NK, Macrophages(M) and Granulocytes(G)

3

unique barcodes (clones)

Source

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3979461/bin/NIHMS567927-supplement-02.xlsx


Generate hazard function

Description

This function dynamically builds the function get.h() that will be used to get the hazard vector.

Usage

compile.h(rct.lst, envir)

Arguments

rct.lst

list of biochemical reactions. A differentiation move from cell type "A" to cell type "B" must be coded as "A->B" Duplication of cell "A" must be coded as "A->1" Death of cell "A" must be coded as "A->0"

Value

No return value.


Conditional AIC (cAIC)

Description

Conditional AIC (cAIC) of the conditional log-likelihood l(y \vert u) of y given the random effects u

Usage

condAIC(X, Z, y, theta, Delta, V, VCNs, nObs, verbose = TRUE)

Arguments

X

A n \times K dimensional (design) matrix.

Z

An \times Jp dimensional block-diagonal design matrix. Each j-th block (j = 1,\dots,J) is a n_j \times p dimensional design matrix for the j-th clone.

y

n-dimensional vector of the time-adjacent cellular increments

theta

p-dimensional vector parameter.

Delta

covariance matrix of the random effects u

V

A p \times K dimensional net-effect matrix.

VCNs

A n-dimensional vector including values of the vector copy number corresponding to the cell counts of y.

nObs

A K-dimensional vector including the frequencies of each clone k (k = 1,\dots,K).

verbose

(defaults to TRUE) Logical value. If TRUE, then information messages on the progress of the algorithm are printed to the console.

Value

Conditional AIC (cAIC) of the conditional log-likelihood l(y \vert u) of y given the random effects u.


Gradient of E[u \vert \theta]

Description

Gradient of the expected values of the random effects u w.r.t. the free parameters

Usage

dBu(K, p)

Arguments

K

number of clones being analyzed.

p

number of free parameters.

Value

the p-dimensional gradient of u.


Gradient of V[u \vert \theta]

Description

Gradient of the covariance matrix of the random effects u w.r.t. the free parameters

Usage

dD2(K, p)

Arguments

K

number of clones being analyzed.

p

number of free parameters.

Value

the gradient of the covariance matrix of u.


Multivariate Normal density function

Description

Proability density function of a n-variate normal distribution

Usage

dmvNorm(x, Mu, Sigma, SigmaInv)

Arguments

x

n-dimensional vector.

Mu

n-dimensional mean vector.

Sigma

n-dimensional covariance matrix.

SigmaInv

n-dimensional precision matrix.

Value

the probability density of a n-variate normal distribution with mean vector Mu and covariance Sigma evaluated at x.


Fit the base (null) model

Description

This function builds the design matrix of the null model and returns the fitted values and the corresponding statistics.

Usage

fit.null(
  Y,
  rct.lst,
  maxit = 10000,
  factr = 1e+07,
  pgtol = 1e-08,
  lmm = 100,
  trace = TRUE,
  verbose = TRUE
)

Arguments

Y

A 3-dimensional array whose dimensions are the time, the cell type and the clone respectively.

rct.lst

list of biochemical reactions. A differentiation move from cell type "A" to cell type "B" must be coded as "A->B" Duplication of cell "A" must be coded as "A->1" Death of cell "A" must be coded as "A->0"

maxit

maximum number of iterations for the optimization step. This argument is passed to optim() function. Details on "maxit" can be found in "optim()" documentation page.

factr

controls the convergence of the "L-BFGS-B" method. Convergence occurs when the reduction in the objective is within this factor of the machine tolerance. Default is 1e7, that is a tolerance of about 1e-8. This argument is passed to optim() function.

pgtol

helps control the convergence of the "L-BFGS-B" method. It is a tolerance on the projected gradient in the current search direction. This defaults to zero, when the check is suppressed. This argument is passed to optim() function.

lmm

is an integer giving the number of BFGS updates retained in the "L-BFGS-B" method, It defaults to 5. This argument is passed to optim() function.

trace

Non-negative integer. If positive, tracing information on the progress of the optimization is produced. This parameter is also passed to the optim() function. Higher values may produce more tracing information: for method "L-BFGS-B" there are six levels of tracing. (To understand exactly what these do see the source code: higher levels give more detail.)

verbose

(defaults to TRUE) Logical value. If TRUE, then information messages on the progress of the algorithm are printed to the console.

Value

A 3-length list. First element is the output returned by "optim()" function (see "optim()" documentation for details). Second element is a vector of statistics associated to the fitted null model:

nPar number of parameters of the base(null) model
cll value of the conditional log-likelihood, in this case just the log-likelihood
mll value of the marginal log-likelihood, in this case just the log-likelihood
cAIC conditional Akaike Information Criterion (cAIC), in this case simply the AIC.
mAIC marginal Akaike Information Criterion (mAIC), in this case simply the AIC.
Chi2 value of the \chi^2 statistic (y - M\theta)'S^{-1}(y - M\theta).
p-value p-value of the \chi^2 test for the null hypothesis that Chi2 follows a \chi^2 distribution with n - nPar degrees of freedom.

The third element, called "design", is a list including:

M A n \times K dimensional (design) matrix.
V A p \times K dimensional net-effect matrix.

Examples

rcts <- c("A->1", "B->1", "C->1", "D->1",
          "A->0", "B->0", "C->0", "D->0",
          "A->B", "A->C", "C->D") ## set of reactions
ctps <- head(LETTERS,4)
nC <- 3 ## number of clones
S <- 10 ## trajectory length
tau <- 1 ## for tau-leaping algorithm
u_1 <- c(.2, .15, .17, .09*5,
         .001, .007, .004, .002,
         .13, .15, .08)
u_2 <- c(.2, .15, .17, .09,
         .001, .007, .004, .002,
         .13, .15, .08)
u_3 <- c(.2, .15, .17*3, .09,
         .001, .007, .004, .002,
         .13, .15, .08)
theta_allcls <- cbind(u_1, u_2, u_3) ## clone-specific parameters
rownames(theta_allcls) <- rcts
s20 <- 1 ## additional noise
Y <- array(data = NA,
           dim = c(S + 1, length(ctps), nC),
           dimnames = list(seq(from = 0, to = S*tau, by = tau),
                           ctps,
                           1:nC)) ## empty array to store simulations
Y0 <- c(100,0,0,0) ## initial state
names(Y0) <- ctps
for (cl in 1:nC) { ## loop over clones
  Y[,,cl] <- get.sim.tl(Yt = Y0,
                        theta = theta_allcls[,cl],
                        S = S,
                        s2 = s20,
                        tau = tau,
                        rct.lst = rcts,
                        verbose = TRUE)
}
null.res <- fit.null(Y = Y,
                     rct.lst = rcts,
                     maxit = 0, ## needs to be increased (>=100) for real applications
                     lmm = 0, ## needs to be increased (>=5) for real applications
) ## null model fitting

Fit the random-effects model

Description

This function builds the design matrix of the random-effects model and returns the fitted values and the corresponding statistics.

Usage

fit.re(
  theta_0,
  Y,
  rct.lst,
  maxit = 10000,
  factr = 1e+07,
  pgtol = 1e-08,
  lmm = 100,
  maxemit = 100,
  eps = 1e-05,
  trace = TRUE,
  verbose = TRUE
)

Arguments

theta_0

A p-dimensional vector parameter as the initial guess for the inference.

Y

A 3-dimensional array whose dimensions are the time, the cell type and the clone respectively.

rct.lst

list of biochemical reactions. A differentiation move from cell type "A" to cell type "B" must be coded as "A->B" Duplication of cell "A" must be coded as "A->1" Death of cell "A" must be coded as "A->0"

maxit

maximum number of iterations for the optimization step. This argument is passed to optim() function. Details on "maxit" can be found in "optim()" documentation page.

factr

controls the convergence of the "L-BFGS-B" method. Convergence occurs when the reduction in the objective is within this factor of the machine tolerance. Default is 1e7, that is a tolerance of about 1e-8. This argument is passed to optim() function.

pgtol

helps control the convergence of the "L-BFGS-B" method. It is a tolerance on the projected gradient in the current search direction. This defaults to zero, when the check is suppressed. This argument is passed to optim() function.

lmm

is an integer giving the number of BFGS updates retained in the "L-BFGS-B" method, It defaults to 5. This argument is passed to optim() function.

maxemit

maximum number of iterations for the expectation-maximization algorithm.

eps

relative error for the value x and the objective function f(x) that has to be optimized in the expectation-maximization algorithm.

trace

Non-negative integer. If positive, tracing information on the progress of the optimization is produced. This parameter is also passed to the optim() function. Higher values may produce more tracing information: for method "L-BFGS-B" there are six levels of tracing. (To understand exactly what these do see the source code: higher levels give more detail.)

verbose

(defaults to TRUE) Logical value. If TRUE, then information messages on the progress of the algorithm are printed to the console.

Value

A 3-length list. First element is the output returned by "optim()" function (see "optim()" documentation for details) along with the conditional expectation E[u \vert y] and variance V[u \vert y] of the latent states u given the observed states y from the last step of the expectation-maximization algorithm. Second element is a vector of statistics associated to the fitted random-effects model:

nPar number of parameters of the base(null) model
cll value of the conditional log-likelihood, in this case just the log-likelihood
mll value of the marginal log-likelihood, in this case just the log-likelihood
cAIC conditional Akaike Information Criterion (cAIC), in this case simply the AIC.
mAIC marginal Akaike Information Criterion (mAIC), in this case simply the AIC.
Chi2 value of the \chi^2 statistic (y - M\theta)'S^{-1}(y - M\theta).
p-value p-value of the \chi^2 test for the null hypothesis that Chi2 follows a \chi^2 distribution with n - nPar degrees of freedom.
KLdiv Kullback-Leibler divergence of the random-effects model from the null model.
KLdiv/N Rescaled Kullback-Leibler divergence of the random-effects model from the null model.
BhattDist_nullCond Bhattacharyya distance between the random-effects model and the null model.
BhattDist_nullCond/N Rescaled Bhattacharyya distance between the random-effects model and the null model.

The third element, called "design", is a list including:

M A n \times K dimensional (design) matrix.
M_bdiag An \times Jp dimensional block-diagonal design matrix.
V A p \times K dimensional net-effect matrix.

Examples

rcts <- c("A->1", "B->1", "C->1", "D->1",
          "A->0", "B->0", "C->0", "D->0",
          "A->B", "A->C", "C->D") ## set of reactions
ctps <- head(LETTERS,4)
nC <- 3 ## number of clones
S <- 10 ## trajectory length
tau <- 1 ## for tau-leaping algorithm
u_1 <- c(.2, .15, .17, .09*5,
         .001, .007, .004, .002,
         .13, .15, .08)
u_2 <- c(.2, .15, .17, .09,
         .001, .007, .004, .002,
         .13, .15, .08)
u_3 <- c(.2, .15, .17*3, .09,
         .001, .007, .004, .002,
         .13, .15, .08)
theta_allcls <- cbind(u_1, u_2, u_3) ## clone-specific parameters
rownames(theta_allcls) <- rcts
s20 <- 1 ## additional noise
Y <- array(data = NA,
           dim = c(S + 1, length(ctps), nC),
           dimnames = list(seq(from = 0, to = S*tau, by = tau),
                           ctps,
                           1:nC)) ## empty array to store simulations
Y0 <- c(100,0,0,0) ## initial state
names(Y0) <- ctps
for (cl in 1:nC) { ## loop over clones
  Y[,,cl] <- get.sim.tl(Yt = Y0,
                        theta = theta_allcls[,cl],
                        S = S,
                        s2 = s20,
                        tau = tau,
                        rct.lst = rcts,
                        verbose = TRUE)
}
null.res <- fit.null(Y = Y,
                     rct.lst = rcts,
                     maxit = 0, ## needs to be increased (>=100) for real applications
                     lmm = 0, ## needs to be increased (>=5) for real applications
) ## null model fitting

re.res <- fit.re(theta_0 = null.res$fit$par,
                 Y = Y,
                 rct.lst = rcts,
                 maxit = 0, ## needs to be increased (>=100) for real applications
                 lmm = 0, ## needs to be increased (>=5) for real applications
                 maxemit = 1 ## needs to be increased (>= 100) for real applications
) ## random-effects model fitting

Design matrix M

Description

This function generates time-adjacent increments from a cell counts matrix y.

Usage

get.M(y, V, dT, get.h)

Arguments

y

clone-specific t \times p cell count matrix, where t is the number of time-points and p the number of cell types.

V

A p \times K dimensional net-effect matrix.

dT

A (t-1)-dimensional vector of the time-increments.

Value

A tp \times K dimensional (design) matrix.


Net-effect matrix

Description

This function builds the net-effect matrix V.

Usage

get.V(rct.lst)

Arguments

rct.lst

list of biochemical reactions. A differentiation move from cell type "A" to cell type "B" must be coded as "A->B" Duplication of cell "A" must be coded as "A->1" Death of cell "A" must be coded as "A->0"

Value

The net-effect matrix V


Stochastic covariance matrix W

Description

This function returns the stochastic covariance matrix W associated to a design matrix M, a net-effect matrix V, and a vector parameter theta.

Usage

get.W(M, V, VCNs, theta, nObs)

Arguments

M

A n \times K dimensional (design) matrix.

V

A p \times K dimensional net-effect matrix.

theta

p-dimensional vector parameter.

Value

A n \times n dimensional stochastic covariance matrix.


Clonal boxplots

Description

Draw clonal boxplots of a random-effects reaction network.

Usage

get.boxplots(re.res)

Arguments

re.res

output list returned by fit.re().

Details

This function generates the boxplots of the conditional expectations

w_k = E_{u\vert \Delta Y; \hat{\psi}}[u^k_{\alpha_{l}}] - E_{u\vert \Delta Y; \hat{\psi}}[u^k_{\delta_{l}}]

, computed from the estimated parameters \hat{\psi} for the clone-specific net-duplication in each cell lineage l (different colors). The whiskers extend to the data extremes.

Value

No return value.

Examples

rcts <- c("A->1", "B->1", "C->1", "D->1",
          "A->0", "B->0", "C->0", "D->0",
          "A->B", "A->C", "C->D") ## set of reactions
ctps <- head(LETTERS,4)
nC <- 3 ## number of clones
S <- 10 ## trajectory length
tau <- 1 ## for tau-leaping algorithm
u_1 <- c(.2, .15, .17, .09*5,
         .001, .007, .004, .002,
         .13, .15, .08)
u_2 <- c(.2, .15, .17, .09,
         .001, .007, .004, .002,
         .13, .15, .08)
u_3 <- c(.2, .15, .17*3, .09,
         .001, .007, .004, .002,
         .13, .15, .08)
theta_allcls <- cbind(u_1, u_2, u_3) ## clone-specific parameters
rownames(theta_allcls) <- rcts
s20 <- 1 ## additional noise
Y <- array(data = NA,
           dim = c(S + 1, length(ctps), nC),
           dimnames = list(seq(from = 0, to = S*tau, by = tau),
                           ctps,
                           1:nC)) ## empty array to store simulations
Y0 <- c(100,0,0,0) ## initial state
names(Y0) <- ctps
for (cl in 1:nC) { ## loop over clones
  Y[,,cl] <- get.sim.tl(Yt = Y0,
                        theta = theta_allcls[,cl],
                        S = S,
                        s2 = s20,
                        tau = tau,
                        rct.lst = rcts,
                        verbose = TRUE)
}
null.res <- fit.null(Y = Y,
                     rct.lst = rcts,
                     maxit = 0, ## needs to be increased (>=100) for real applications
                     lmm = 0, ## needs to be increased (>=5) for real applications
) ## null model fitting

re.res <- fit.re(theta_0 = null.res$fit$par,
                 Y = Y,
                 rct.lst = rcts,
                 maxit = 0, ## needs to be increased (>=100) for real applications
                 lmm = 0, ## needs to be increased (>=5) for real applications
                 maxemit = 1 ## needs to be increased (>= 100) for real applications
) ## random-effects model fitting

get.boxplots(re.res)

Time-adjacent increments of the cell counts

Description

This function generates time-adjacent increments from a cell counts matrix y.

Usage

get.dx(y)

Arguments

y

clone-specific t \times p cell count matrix, where t is the number of time-points and p the number of cell types.

Value

A (t-1) \times p dimensional vector containing the time-adjacent increments.


Gradient of the negative log-likelihood of the base (null) model

Description

Compute the gradient of the negative log-likelihood of the null model, namely the linear model without rondom effects at all.

Usage

get.gnl(theta, M, y, V, VCNs, nObs, dW)

Arguments

theta

p-dimensional vector parameter.

M

A n \times K dimensional (design) matrix.

y

n-dimensional vector of the time-adjacent cellular increments

V

A p \times K dimensional net-effect matrix.

dW

p-dimensional list of the partial derivatives of W w.r.t. theta.

Value

p-dimensional vector of the gradient of the negative log-likelihood.


Negative log-likelihood of the base (null) model

Description

Compute the negative log-likelihood of the null model, namely the linear model without rondom effects at all.

Usage

get.nl(theta, M, y, V, VCNs, nObs, dW)

Arguments

theta

p-dimensional vector parameter.

M

A n \times K dimensional (design) matrix.

y

n-dimensional vector of the time-adjacent cellular increments

V

A p \times K dimensional net-effect matrix.

dW

p-dimensional list of the partial derivatives of W w.r.t. theta.

Value

the value of the negative log-likelihood.


Rescaling a clonal tracking dataset

Description

Rescales a clonal tracking dataset based on the sequencing depth.

Usage

get.rescaled(Y)

Arguments

Y

A 3-dimensional array whose dimensions are the time, the cell type and the clone respectively.

Details

This function rescales a clonal tracking dataset Y according to the formula

Y_{ijk} \leftarrow Y_{ijk} \cdot \frac{min_{ij}\sum_cY_{ijc}}{\sum_cY_{ijc}}

Value

A rescaled clonal tracking dataset.

Examples

get.rescaled(Y_RM[["ZH33"]])

Clonal pie-chart

Description

Draw a clonal pie-chart of a random-effects reaction network.

Usage

get.scatterpie(re.res, txt = FALSE, legend = FALSE)

Arguments

re.res

output list returned by fit.re().

txt

logical (defaults to FALSE). If TRUE, barcode names will be printed on the pies.

legend

logical (defaults to FALSE). If TRUE, the legend of the pie-chart will be printed.

Details

This function generates a clonal pie-chart given a previously fitted random-effects model. In this representation each clone k is identified with a pie whose slices are lineage-specific and weighted with w_k, defined as the difference between the conditional expectations of the random-effects on duplication and death parameters, that is

w_k = E_{u\vert \Delta Y; \hat{\psi}}[u^k_{\alpha_{lin}}] - E_{u\vert \Delta Y; \hat{\psi}}[u^k_{\delta_{lin}}]

, where \texttt{lin} is a cell lineage. The diameter of the k-th pie is proportional to the euclidean 2-norm of w_k. Therefore, the larger the diameter, the more the corresponding clone is expanding into the lineage associated to the largest slice.

Value

No return value.

Examples

rcts <- c("A->1", "B->1", "C->1", "D->1",
          "A->0", "B->0", "C->0", "D->0",
          "A->B", "A->C", "C->D") ## set of reactions
ctps <- head(LETTERS,4)
nC <- 3 ## number of clones
S <- 10 ## trajectory length
tau <- 1 ## for tau-leaping algorithm
u_1 <- c(.2, .15, .17, .09*5,
         .001, .007, .004, .002,
         .13, .15, .08)
u_2 <- c(.2, .15, .17, .09,
         .001, .007, .004, .002,
         .13, .15, .08)
u_3 <- c(.2, .15, .17*3, .09,
         .001, .007, .004, .002,
         .13, .15, .08)
theta_allcls <- cbind(u_1, u_2, u_3) ## clone-specific parameters
rownames(theta_allcls) <- rcts
s20 <- 1 ## additional noise
Y <- array(data = NA,
           dim = c(S + 1, length(ctps), nC),
           dimnames = list(seq(from = 0, to = S*tau, by = tau),
                           ctps,
                           1:nC)) ## empty array to store simulations
Y0 <- c(100,0,0,0) ## initial state
names(Y0) <- ctps
for (cl in 1:nC) { ## loop over clones
  Y[,,cl] <- get.sim.tl(Yt = Y0,
                        theta = theta_allcls[,cl],
                        S = S,
                        s2 = s20,
                        tau = tau,
                        rct.lst = rcts,
                        verbose = TRUE)
}
null.res <- fit.null(Y = Y,
                     rct.lst = rcts,
                     maxit = 0, ## needs to be increased (>=100) for real applications
                     lmm = 0, ## needs to be increased (>=5) for real applications
) ## null model fitting

re.res <- fit.re(theta_0 = null.res$fit$par,
                 Y = Y,
                 rct.lst = rcts,
                 maxit = 0, ## needs to be increased (>=100) for real applications
                 lmm = 0, ## needs to be increased (>=5) for real applications
                 maxemit = 1 ## needs to be increased (>= 100) for real applications
) ## random-effects model fitting

get.scatterpie(re.res, txt = TRUE)

\tau-leaping simulation algorithm

Description

Simulate a trajectory of length S for a stochastic reaction network.

Usage

get.sim.tl(Yt, theta, S, s2 = 0, tau = 1, rct.lst, verbose = TRUE)

Arguments

Yt

starting point of the trajectory

theta

vector parameter for the reactions.

S

length of the simulated trajectory.

s2

noise variance (defaults to 0).

tau

time interval length (defaults to 1).

rct.lst

list of biochemical reactions.

verbose

(defaults to TRUE) Logical value. If TRUE, then information messages on the simulation progress are printed to the console.

Details

This function allows to simulate a trajectory of a single clone given an initial conditions Y_0 for the cell counts, and obeying to a particular cell differentiation network defined by a net-effect (stoichiometric) matrix V and an hazard function h(). The function allows to consider only three cellular events, such as cell duplication (Y_{it} \rightarrow 1), cell death (Y_{it} \rightarrow \emptyset) and cell differentiation (Y_{it} \rightarrow Y_{jt}) for a clone-specific time counting process

Y_t = (Y_{1t},\dots,Y_{Nt})

observed in $N$ distinct cell lineages. In particular, the cellular events of duplication, death and differentiation are respectively coded with the character labels \texttt{"A->1"}, \texttt{"A->0"}, and \texttt{"A->B"}, where \texttt{A} and \texttt{B} are two distinct cell types. The output is a 3-dimensional array Y whose ijk-entry Y_{ijk} is the number of cells of clone k for cell type j collected at time i. More mathematical details can be found in the vignette of this package.

Value

A S \times p dimensional matrix of the simulated trajectory.

Examples

rcts <- c("A->1", "B->1", "C->1", "D->1",
          "A->0", "B->0", "C->0", "D->0",
          "A->B", "A->C", "C->D") ## set of reactions
ctps <- head(LETTERS,4)
nC <- 3 ## number of clones
S <- 10 ## trajectory length
tau <- 1 ## for tau-leaping algorithm
u_1 <- c(.2, .15, .17, .09*5,
         .001, .007, .004, .002,
         .13, .15, .08)
u_2 <- c(.2, .15, .17, .09,
         .001, .007, .004, .002,
         .13, .15, .08)
u_3 <- c(.2, .15, .17*3, .09,
         .001, .007, .004, .002,
         .13, .15, .08)
theta_allcls <- cbind(u_1, u_2, u_3) ## clone-specific parameters
rownames(theta_allcls) <- rcts
s20 <- 1 ## additional noise
Y <- array(data = NA,
           dim = c(S + 1, length(ctps), nC),
           dimnames = list(seq(from = 0, to = S*tau, by = tau),
                           ctps,
                           1:nC)) ## empty array to store simulations
Y0 <- c(100,0,0,0) ## initial state
names(Y0) <- ctps
for (cl in 1:nC) { ## loop over clones
  Y[,,cl] <- get.sim.tl(Yt = Y0,
                        theta = theta_allcls[,cl],
                        S = S,
                        s2 = s20,
                        tau = tau,
                        rct.lst = rcts,
                        verbose = TRUE)
}
Y

Matrix determinant

Description

Compute the determinant of a matrix.

Usage

ldet(A, logarithm = TRUE)

Arguments

A

a matrix

logarithm

logical; if TRUE (default) return the logarithm of the modulus of the determinant.

Value

the determinant of A.


E-step function Q

Description

Negative E-step function -Q of the expectation-maximization algorithm

Usage

nQ(theta, euy_curr, vuy_curr, M, M_bdiag, y, V, VCNs, nObs, dW)

Arguments

theta

p-dimensional vector parameter.

euy_curr

current value of the conditional expectation E[u \vert y] of u given y, where u and y are the latent and observed states respectively.

vuy_curr

current value of the conditional variance V[u \vert y] of u given y, where u and y are the latent and observed states respectively.

M

A n \times K dimensional (design) matrix.

M_bdiag

An \times Jp dimensional block-diagonal design matrix. Each j-th block (j = 1,\dots,J) is a n_j \times p dimensional design matrix for the j-th clone.

y

n-dimensional vector of the time-adjacent cellular increments

V

A p \times K dimensional net-effect matrix.

VCNs

A n-dimensional vector including values of the vector copy number corresponding to the cell counts of y.

nObs

A K-dimensional vector including the frequencies of each clone k (k = 1,\dots,K).

dW

p-dimensional list of the partial derivatives of W w.r.t. theta.

Value

The current value of the negative E-step function -Q.


Gradient of the E-step function Q

Description

Gradient -\nabla Q of the negative E-step function -Q of the expectation-maximization algorithm

Usage

ngQ(theta, euy_curr, vuy_curr, M, M_bdiag, y, V, VCNs, nObs, dW)

Arguments

theta

p-dimensional vector parameter.

euy_curr

current value of the conditional expectation E[u \vert y] of u given y, where u and y are the latent and observed states respectively.

vuy_curr

current value of the conditional variance V[u \vert y] of u given y, where u and y are the latent and observed states respectively.

M

A n \times K dimensional (design) matrix.

M_bdiag

An \times Jp dimensional block-diagonal design matrix. Each j-th block (j = 1,\dots,J) is a n_j \times p dimensional design matrix for the j-th clone.

y

n-dimensional vector of the time-adjacent cellular increments

V

A p \times K dimensional net-effect matrix.

VCNs

A n-dimensional vector including values of the vector copy number corresponding to the cell counts of y.

nObs

A K-dimensional vector including the frequencies of each clone k (k = 1,\dots,K).

dW

p-dimensional list of the partial derivatives of W w.r.t. theta.

Value

p-dimensional vector of the gradient -\nabla Q of the negative E-step function -Q.


Euclidean 2-norm

Description

Compute the euclidean 2-norm of a vector x.

Usage

norm2(x)

Arguments

x

a vector.

Value

the euclidean 2-norm of x.


Fit base model

Description

Fit the base (null) model to the given data using a maximum likelihood approach.

Usage

nullModelFitting(
  theta_start,
  M,
  y,
  V,
  psiLB,
  psiUB,
  maxit,
  factr,
  pgtol,
  lmm,
  VCNs,
  nObs,
  trace = TRUE
)

Arguments

theta_start

p-dimensional vector parameter used as initial guess in the inference procedure.

M

A n \times K dimensional (design) matrix.

y

n-dimensional vector of the time-adjacent cellular increments

V

A p \times K dimensional net-effect matrix.

psiLB

p-dimensional vector of lower bound values for theta.

psiUB

p-dimensional vector of upper bound values for theta.

maxit

maximum number of iterations for the optimization step. This argument is passed to optim() function. Details on "maxit" can be found in "optim()" documentation page.

factr

controls the convergence of the "L-BFGS-B" method. Convergence occurs when the reduction in the objective is within this factor of the machine tolerance. Default is 1e7, that is a tolerance of about 1e-8. This argument is passed to optim() function.

pgtol

helps control the convergence of the "L-BFGS-B" method. It is a tolerance on the projected gradient in the current search direction. This defaults to zero, when the check is suppressed. This argument is passed to optim() function.

lmm

is an integer giving the number of BFGS updates retained in the "L-BFGS-B" method, It defaults to 5. This argument is passed to optim() function.

VCNs

A n-dimensional vector including values of the vector copy number corresponding to the cell counts of y.

nObs

A K-dimensional vector including the frequencies of each clone k (k = 1,\dots,K).

trace

Non-negative integer. If positive, tracing information on the progress of the optimization is produced. This parameter is also passed to the optim() function. Higher values may produce more tracing information: for method "L-BFGS-B" there are six levels of tracing. (To understand exactly what these do see the source code: higher levels give more detail.)

Value

A 3-length list. First element is the output returned by optim() function (see optim() documentation for details). Second element is a vector of statistics associated to the fitted null model:

nPar number of parameters of the base(null) model
cll value of the conditional log-likelihood, in this case just the log-likelihood
mll value of the marginal log-likelihood, in this case just the log-likelihood
cAIC conditional Akaike Information Criterion (cAIC), in this case simply the AIC.
mAIC marginal Akaike Information Criterion (mAIC), in this case simply the AIC.
Chi2 value of the \chi^2 statistic (y - M\theta)'S^{-1}(y - M\theta).
p-value p-value of the \chi^2 test for the null hypothesis that Chi2 follows a \chi^2 distribution with n - nPar degrees of freedom.

The third element, called design, is a list including:

M A n \times K dimensional (design) matrix.
V A p \times K dimensional net-effect matrix.

Conditional negative log-likelihood

Description

Conditional negative log-likelihood -l(y \vert u) of y given the random effects u

Usage

ny_u(theta, Z, X, y, V, VCNs, nObs)

Arguments

theta

p-dimensional vector parameter.

Z

An \times Jp dimensional block-diagonal design matrix.

X

A n \times K dimensional (design) matrix. Each j-th block (j = 1,\dots,J) is a n_j \times p dimensional design matrix for the j-th clone.

y

n-dimensional vector of the time-adjacent cellular increments

V

A p \times K dimensional net-effect matrix.

VCNs

A n-dimensional vector including values of the vector copy number corresponding to the cell counts of y.

nObs

A K-dimensional vector including the frequencies of each clone k (k = 1,\dots,K).

Value

Conditional negative log-likelihood -l(y \vert u) of y given the random effects u.


Multivariate relative error

Description

Compute the multivariate relative error between two vectors.

Usage

relErr(x, y)

Arguments

x

a vector.

y

a vector.

Value

the multivariate relative error between x and y.


Random generator from a Multivariate Normal distribution

Description

This function generate a random vector from a multivariate normal distribution.

Usage

rmvNorm(d, p, Mu, Sigma)

Arguments

d

sample d.

p

vector dimension.

Mu

mean vector.

Sigma

covariance matrix.

Value

A p \times d matrix. Each column is a p-variate random vector from a multivariate normal with mean vector Mu and covariance matrix Sigma.


Fit random-effects model

Description

Fit the random-effects model to the given data using an expectation-maximization algorithm.

Usage

rndEffModelFitting(
  theta_0,
  V,
  M,
  M_bdiag,
  y,
  VCNs,
  nObs,
  maxit,
  maxemit,
  eps = 1e-05,
  thetaLB,
  thetaUB,
  factr,
  pgtol,
  lmm,
  trace = TRUE,
  verbose = TRUE
)

Arguments

theta_0

p-dimensional vector parameter used as initial guess in the inference procedure.

V

A p \times K dimensional net-effect matrix.

M

A n \times K dimensional (design) matrix.

M_bdiag

An \times Jp dimensional block-diagonal design matrix. Each j-th block (j = 1,\dots,J) is a n_j \times p dimensional design matrix for the j-th clone.

y

n-dimensional vector of the time-adjacent cellular increments

VCNs

A n-dimensional vector including values of the vector copy number corresponding to the cell counts of y.

nObs

A K-dimensional vector including the frequencies of each clone k (k = 1,\dots,K).

maxit

maximum number of iterations for the optimization step. This argument is passed to optim() function. Details on "maxit" can be found in "optim()" documentation page.

maxemit

maximum number of iterations for the expectation-maximization algorithm.

eps

relative error for the value x and the objective function f(x) that has to be optimized in the expectation-maximization algorithm.

thetaLB

p-dimensional vector of lower bound values for theta.

thetaUB

p-dimensional vector of upper bound values for theta.

factr

controls the convergence of the "L-BFGS-B" method. Convergence occurs when the reduction in the objective is within this factor of the machine tolerance. Default is 1e7, that is a tolerance of about 1e-8. This argument is passed to optim() function.

pgtol

helps control the convergence of the "L-BFGS-B" method. It is a tolerance on the projected gradient in the current search direction. This defaults to zero, when the check is suppressed. This argument is passed to optim() function.

lmm

is an integer giving the number of BFGS updates retained in the "L-BFGS-B" method, It defaults to 5. This argument is passed to optim() function.

trace

Non-negative integer. If positive, tracing information on the progress of the optimization is produced. This parameter is passed to the optim() function. Higher values may produce more tracing information: for method "L-BFGS-B" there are six levels of tracing. (To understand exactly what these do see the source code: higher levels give more detail.)

verbose

(defaults to TRUE) Logical value. If TRUE, then information messages on the progress of the algorithm are printed to the console.

Value

The output returned by "optim()" function (see "optim()" documentation for details) along with the conditional expectation E[u \vert y] and variance V[u \vert y] of the latent states u given the observed states y from the last step of the expectation-maximization algorithm.


Base and random-effects model statistics

Description

Main statistics from the fitted base and random-effects models

Usage

rndEffModelStats(
  theta_null,
  theta_rndEff,
  V,
  M,
  M_bdiag,
  y,
  VCNs,
  nObs,
  verbose = TRUE
)

Arguments

theta_null

the estimated p-dimensional vector parameter for the base (null) model.

theta_rndEff

the estimated p-dimensional vector parameter for the random-effects model.

M

A n \times K dimensional (design) matrix.

M_bdiag

An \times Jp dimensional block-diagonal design matrix. Each j-th block (j = 1,\dots,J) is a n_j \times p dimensional design matrix for the j-th clone.

y

n-dimensional vector of the time-adjacent cellular increments

VCNs

A n-dimensional vector including values of the vector copy number corresponding to the cell counts of y.

nObs

A K-dimensional vector including the frequencies of each clone k (k = 1,\dots,K).

verbose

(defaults to TRUE) Logical value. If TRUE, then information messages on the progress of the algorithm are printed to the console.

Value

A vector of statistics associated to the fitted base and random-effects models:

nPar number of parameters of the base(null) model
cll value of the conditional log-likelihood, in this case just the log-likelihood
mll value of the marginal log-likelihood, in this case just the log-likelihood
cAIC conditional Akaike Information Criterion (cAIC), in this case simply the AIC.
mAIC marginal Akaike Information Criterion (mAIC), in this case simply the AIC.
Chi2 value of the \chi^2 statistic (y - M\theta)'S^{-1}(y - M\theta).
p-value p-value of the \chi^2 test for the null hypothesis that Chi2 follows a \chi^2 distribution with n - nPar degrees of freedom.
KLdiv Kullback-Leibler divergence of the random-effects model from the null model.
KLdiv/N Rescaled Kullback-Leibler divergence of the random-effects model from the null model.
BhattDist_nullCond Bhattacharyya distance between the random-effects model and the null model.
BhattDist_nullCond/N Rescaled Bhattacharyya distance between the random-effects model and the null model.

Matrix trace

Description

Trace of a matrix

Usage

tr(M)

Arguments

M

a matrix.

Value

the trace of M.

mirror server hosted at Truenetwork, Russian Federation.