EasyABC: a R
package to perform efficient approximate Bayesian computation sampling
schemesThe aim of this vignette is to present the features of the
EasyABC package. Section Overview of the package
EasyABC describes the different algorithms available in the package.
Section Installation and
requirements details how to install the package and the formatting
requirements. Sections The toy model and The trait model present two detailed worked
examples.
Four main types of ABC schemes are available in EasyABC: the standard
rejection algorithm of Pritchard et al. (1999), sequential schemes first
proposed by Sisson et al. (2007), coupled to MCMC sequential schemes
first proposed by Marjoram et al. (2003), and a Simulated Annealing
algorithm (SABC) suggested in Albert et al. (2014). Four different
sequential algorithms are available: the ones of Beaumont et al. (2009),
Drovandi and Pettitt (2011), Del Moral et al. (2012) and Lenormand et
al. (2012). Four different MCMC schemes are available: the ones of
Marjoram et al. (2003), Wegmann et al. (2009a), a modification of
Marjoram et al. (2003)’s algorithm in which the tolerance and proposal
range are determined by the algorithm, following the modifications of
Wegmann et al. (2009a). Details on how to implement these various
algorithms with EasyABC are given in the manual pages of
each function and two examples are detailed in Sections The toy model and The trait model. We provide below a short
presentation of each implemented algorithm.
This sampling scheme consists in drawing the model parameters in the
prior distributions, in using these model parameter values to launch a
simulation and in repeating this two-step procedure
nb_simul times. At the end of the nb_simul
simulations, the simulations closest to the target (or at a distance
smaller than a tolerance threshold) in the space of the summary
statistics are retained to form an approximate posterior distribution of
the model parameters.
Sequential algorithms for ABC have first been proposed by Sisson et
al. (2007). These algorithms aim at reducing the required number of
simulations to reach a given quality of the posterior approximation. The
underlying idea of these algorithms is to spend more time in the areas
of the parameter space where simulations are frequently close to the
target. Sequential algorithms consist in a first step of standard
rejection ABC, followed by a number of steps where the sampling of the
parameter space is not anymore performed according to the prior
distributions of parameter values. Various ways to perform this biased
sampling have been proposed, and four of them are implemented in the
package EasyABC.
The idea of ABC-MCMC algorithms proposed by Marjoram et al. (2003) is
to perform a Metropolis-Hastings algorithm to explore the parameter
space, and in replacing the step of likelihood ratio computation by
simulations of the model. The original algorithm of Marjoram et
al. (2003) is implemented in the method “Marjoram_original” in
EasyABC. Wegmann et al. (2009) later proposed a number of
improvements to the original scheme of Marjoram et al. (2003): they
proposed to perform a calibration step so that the algorithm
automatically determines the tolerance threshold, the scaling of the
summary statistics and the scaling of the jumps in the parameter space
during the MCMC. These improvements have been implemented in the method
“Marjoram”. Wegmann et al. (2009) also proposed additional
modifications, among which a PLS transformation of the summary
statistics. The complete Wegmann et al. (2009)’s algorithm is
implemented in the method “Wegmann”.
Inspired by Simulated Annealing algorithms used for optimization, the
SABC algorithm from Albert et al. (2014) propagates an ensemble of
particles in the product space of parameters and model outputs and
continuously lowers the tolerance between model outputs and the data so
that the parameter marginal converges to the posterior. The tolerance is
lowered adaptively so as to minimize entropy production, which serves as
a measure for computational waste. In EasyABC, SABC is
implemented in the function SABC.
A version of R greater than or equal to 2.15.0 is required. The
package has been tested on Windows 32 and Linux, but not on Mac. To
install the EasyABC package from R, simply
type:
install.packages("EasyABC")
Once the package is installed, it needs to be loaded in the current
R session to be used:
For online help on the package content, simply type:
help(package="EasyABC")
For online help on a particular command (such as the function
ABC_sequential), simply type:
help(ABC_sequential)
Users need to develop a simulation code with minimal compatibility
constraints. The code can either be a R function or a
binary executable file.
If the code is a R function, its argument must be a
vector of parameter values and it must return a vector of summary
statistics. If the option use_seed=TRUE is chosen, the
first parameter value passed to the simulation code corresponds to the
seed value to be used by the simulation code to initialize the
pseudo-random number generator. The following parameters are the model
parameters.
If the code is a binary executable file, it needs to read the parameter values in a file named ‘input’ in which each line contains one parameter value, and to output the summary statistics in a file named ‘output’ in which each summary statistics must be separated by a space or a tabulation.
If the code is a binary executable file, a wrapper R
function named ‘binary_model’ is available to interface the executable
file with the R functions of the EasyABC
package (see section The toy model
below).
Alternatively, users may prefer building a R function
calling their binary executable file. A short tutorial is provided in this section to call a C/C++
program.
NB: Currently, SABC ignores the use_seed option
and requires a function, whose first argument is the parameter
vector.
Users need to develop a simulation code with minimal compatibility
constraints. The code can either be a R function or a
binary executable file.
If the code is a R function, its argument must be a
vector of parameter values and it must return a vector of summary
statistics. The first parameter value passed to the simulation code
corresponds to the seed value to be used by the simulation code to
initialize the pseudo-random number generator. The following parameters
are the model parameters. This means that the option
use_seed must be turned to TRUE when using
EasyABC with multiple cores.
If the code is a binary executable file, it needs to have as its
single argument a positive integer k. It has to read the
parameter values in a file named ‘inputk’ (where k is the integer passed
as argument to the binary code: ‘input1’, ‘input2’…) in which each line
contains one parameter value, and to output the summary statistics in a
file named ‘outputk’ (where k is the integer passed as argument to the
binary code: ‘output1’, ‘output2’…) in which each summary statistics
must be separated by a space or a tabulation. This construction avoids
multiple cores to read/write in the same files. If the code is a binary
executable file, a wrapper R function named
‘binary_model_cluster’ is available to interface the executable file
with the R functions of the EasyABC package
(see section The trait model below).
Alternatively, users may prefer building a R function
calling their binary executable file. A short tutorial is provided in this section to call a C/C++
program.
NB: Currently, SABC does currently not support the usage of multiple cores.
To insure that stochastic simulations are independent, the simulation code must either possess an internal way of initializing the seeds of its pseudo-random number generators each time the simulation code is launched.
This can be achieved for instance by initializing the seed to the
clock value. It is often desirable though to have a way to re-run some
analyses with similar seed values. EasyABC offers this
possibility by default with the default option
use_seed=TRUE,seed_count=0 where seed_count
can be any integer number.
If this option is chosen, a seed value is provided in the input file as a first (additional) parameter, and incremented by 1 at each call of the simulation code.
This means that the simulation code must be designed so that the
first parameter is a seed initializing value. In the worked example
(Section The trait model), the simulation
code trait_model makes use of this package option, and in
the first example (Section The toy model),
the way this option can be used with a simple R function is
demonstrated.
NB: Note that when using multicores with the package
functions (n_cluster=x with x larger than 1),
the option use_seed=TRUE is forced, since the seed value is
also used to distribute the tasks to each core.
A list encoding the prior distributions used for each model parameter must be supplied by the user. Each element of the list corresponds to a model parameter and can be defined in two ways:
my_prior=list(c("unif",0,1),c("normal",1,2))
NB: Note that a fixed variable can be passed to the
simulation code by choosing for this fixed variable a uniform prior
distribution and a trivial range (with equal lower and upper bounds).
The EasyABC methods will not work properly if these fixed
variables are passed with other types of prior distributions (like a
normal distribution with a standard deviation equal to zero).
my_prior=list(c("unif",0,1))):my_prior=list(list(c("runif",1,0,1), c("dunif",0,1)))
NB: SABC requires the prior to be specified as a sampler and as a density (see the examples below).
To add constraints to prior distributions (for instance, parameter 1
< parameter 2), users need to use the parameter
prior_test in the ABC functions of the package (see their
online documentation). This parameter prior_test will be
evaluated as a logical expression, you can use all the logical operators
including "<", ">", … to define whether
a parameter set respects the constraint. Each parameter should be
designated with "X1", "X2", … in the same
order as in the prior definition.
Here is an example where the second parameter should be greater than the first one:
prior = list(c("unif",0,1),c("unif",0,10))
ABC_rejection(model=a_model,prior=prior,nb_simul=3, prior_test="X2 > X1")
A vector containing the summary statistics of the data must be
supplied. The statistics must be in the same order as in the simulation
outputs. The function SABC allows for a semi-automatic
generation of summary statistics according to Fearnhead et
al. (2012).
Intermediary results can be written in output files in the working
directory. Users solely need to choose the option
verbose=TRUE when launching the EasyABC
functions (otherwise, the default value for verbose is
FALSE). Intermediary results consist in the progressive
writing of simulation outputs for the functions
ABC_rejection and ABC_mcmc and in the writing
of intermediary results at the end of each step for the function
ABC_sequential. Additional details are provided in the help
files of the functions.
R function calling a C/C++ programUsers having a C/C++ simulation code may wish to
construct a R function calling their C/C++
program, instead of using the provided wrappers (see sections The simulation
code - for use on a single core and The simulation
code - for use with multiple cores). The procedure is abundantly
described in the ‘Writing R
Extensions’ manual.
In short, this can be done by:
extern "C" \{ … \} block. Here is an excerpt of the source
code of the trait model provided in this package, in the folder
src:extern "C" {
void trait_model(double *input,double *stat_to_return){
// compute output and fill the array stat_to_return
}
}
R CMD SHLIB command. In our example, the
command for compiling the trait model and the given output are:$ R CMD SHLIB trait_model_rc.cpp
g++ -I/usr/share/R/include -DNDEBUG -fpic -O2 -pipe -g -c trait_model_rc.cpp
-o trait_model_rc.o
g++ -shared -o trait_model_rc.so trait_model_rc.o -L/usr/lib/R/lib -lR
dyn.load function.> dyn.load("trait_model_rc.so")
.C function for calling your program, like
we’ve done in our trait_model function:trait_model <- function(input=c(1,1,1,1,1,1)) {
.C("trait_model",input=input,stat_to_return=array(0,4))$stat_to_return
}
Now, as our model will have two parameters with constant values (see examble The trait model), we can fix them as following:
trait_model <- function(input=c(1,1,1,1,1,1)) {
.C("trait_model",input=c(input[1], 500, input[2:3], 1, input[4:5]),
stat_to_return=array(0,4))$stat_to_return
}
fastsimcoalThis example is provided by an EasyABC user Albert Min-Shan Ko
(currently at the Department of genetics, Max Planck Institute of
Evolutionary Anthropology, Leipzig, Germany). The purpose is to plug a
third-party software related to population genetics into the EasyABC
workflow. This software needs input data in a given format, so the idea
is to wrap the call to the fastsimcoal software into a
script that will link EasyABC to fastsimcoal.
Here are the scripts as provided by courtesy of Albert Min-Shan Ko.
fastsimcoal (here named mod.input.r).r<-read.table('input',head=F)
sink('mod.input')
cat(paste('1','p1','unif',round(r[1,],0),round(r[1,],0),sep='\t'))
cat('\n')
cat(paste('1','p2','unif',round(r[2,],0),round(r[2,],0),sep='\t'))
cat('\n')
cat(paste('1','p3','unif',round(r[3,],0),round(r[3,],0),sep='\t'))
sink()
run_sim.sh)
invokes the latter R script and builds a parameter file for
fastsimcoal (sim.est), runs
fastsimcoal and computes some summary statistics with the
arlequin program.#!/bin/bash
rm -fr sim
Rscript mod.input.r
cat <(sed -n 1p template.est) <(sed -n '1,3'p mod.input) \
<(sed -n '5,\$'p template.est) > sim.est
until [ -f sim/arl_output ]; do
./fastsimcoal -t sim.tpl -e sim.est -E1 -n1 -q
./arlsumstat sim/sim_1_1.arp sim/arl_output 1 0 run_silent
done
cat sim/arl_output > output
Then, the user can invoke EasyABC like this :
prior=list(c("unif",500,1000),c("unif",100,500),c("unif",50,200))
ABC_sim<-ABC_rejection(model=binary_model('./run_sim.sh'),prior=prior,nb_simul=3)
If your model runs with a Java Virtual Machine (can be written in
Java, Scala, Groovy, …), you can of course use the
binary_model wrapper to run the JVM within your model. But,
you can achieve a tighter integration that will simplify the process and
save computing time. This section propose to use the R package
rJava. Let’s consider the toy model written in Java (in a
file named Model.java):
public class Model {
public static double[] run(double[] x) {
double[] result = new double[2];
result[0] = x[0] + x[1];
result[1] = x[0] * x[1];
return result;
}
}
We can compile it with the command: javac Model.java and
then define our wrapper in R:
mymodel <- function(x) {
library("rJava")
.jinit(classpath=".")
result = .jcall(J("Model"),"[D","run",.jarray(x))
result
}
Then, the user can invoke EasyABC like this :
prior=list(c("unif",0,1),c("normal",1,2))
ABC_sim<-ABC_rejection(model=mymodel,prior=prior,nb_simul=3)
We here consider a very simple stochastic model coded in the
R language:
toy_model<-function(x){
c( x[1] + x[2] + rnorm(1,0,0.1) , x[1] * x[2] + rnorm(1,0,0.1) )
}
We will use two different types of prior distribution for the two model parameters (\(x[1]\) and \(x[2]\)): a uniform distribution between 0 and 1 and a normal distribution with mean 1 and standard deviation 2.
toy_prior=list(c("unif",0,1),c("normal",1,2))
And we will consider an imaginary dataset of two summary statistics that the toy_model is aiming at fitting:
sum_stat_obs=c(1.5,0.5)
A standard ABC-rejection procedure can be simply performed with the
function ABC_rejection, in precising the number \(n\) of simulations to be performed and the
proportion of simulations which are to be retained \(p\):
set.seed(1)
n=10
p=0.2
ABC_rej<-ABC_rejection(model=toy_model, prior=toy_prior, nb_simul=n,
summary_stat_target=sum_stat_obs, tol=p)
ABC_rej
#> $param
#> [,1] [,2]
#> param 0.6927316 0.8877425
#> param 0.3162717 1.0934523
#>
#> $stats
#> [,1] [,2]
#> [1,] 1.564895 0.4678920
#> [2,] 1.386153 0.2915392
#>
#> $weights
#> [1] 0.5 0.5
#>
#> $stats_normalization
#> [1] 0.7266951 0.5603033
#>
#> $nsim
#> [1] 10
#>
#> $nrec
#> [1] 2
#>
#> $computime
#> [1] 0.01295018
Alternatively, ABC_rejection can be used to solely
launch the simulations and to store the simulation outputs without
performing the rejection step. This option enables the user to make use
of the R package abc (Csilléry et al. 2012)
which offers an array of more sophisticated post-processing treatments
than the simple rejection procedure:
# Run the ABC rejection on the model
set.seed(1)
n=10
ABC_rej<-ABC_rejection(model=toy_model, prior=toy_prior, nb_simul=n)
ABC_rej
#> $param
#> [,1] [,2]
#> param 0.2655087 0.3475333
#> param 0.6607978 1.6590155
#> param 0.7698414 0.9884657
#> param 0.2121425 1.7796865
#> param 0.8696908 0.1769783
#> param 0.6684667 2.6424424
#> param 0.7829328 1.2666727
#> param 0.6927316 0.8877425
#> param 0.3162717 1.0934523
#> param 0.3323947 1.7753432
#>
#> $stats
#> [,1] [,2]
#> [1,] 0.7460219 0.21951603
#> [2,] 2.2377665 1.14501671
#> [3,] 1.9987724 0.83732115
#> [4,] 1.9297049 0.15607719
#> [5,] 1.0718915 0.06472432
#> [6,] 3.3702993 1.85828258
#> [7,] 2.1300244 0.98600890
#> [8,] 1.5648945 0.46789202
#> [9,] 1.3861534 0.29153921
#> [10,] 2.1023574 0.45240868
#>
#> $weights
#> [1] 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1
#>
#> $stats_normalization
#> [1] 0.7266951 0.5603033
#>
#> $nsim
#> [1] 10
#>
#> $computime
#> [1] 0.0005409718
# Install if needed the "abc" package
install.packages("abc")
# Post-process the simulations outputs
library(abc)
rej<-abc(sum_stat_obs, ABC_rej$param, ABC_rej$stats, tol=0.2, method="rejection")
#> Warning in abc(sum_stat_obs, ABC_rej$param, ABC_rej$stats, tol = 0.2, method =
#> "rejection"): No parameter names are given, using P1, P2, ...
#> Warning in abc(sum_stat_obs, ABC_rej$param, ABC_rej$stats, tol = 0.2, method =
#> "rejection"): No summary statistics names are given, using S1, S2, ...
# simulations selected:
rej$unadj.values
#> [,1] [,2]
#> param 0.6927316 0.8877425
#> param 0.3162717 1.0934523
# their associated summary statistics:
rej$ss
#> [,1] [,2]
#> [1,] 1.564895 0.4678920
#> [2,] 1.386153 0.2915392
# their normalized euclidean distance to the data summary statistics:
rej$dist
#> [1] 1.6103923 1.9542368 1.2025193 1.0981535 1.2163667 4.6145013 1.5879393
#> [8] 0.1448057 0.4716867 1.2112846
Other functions of the EasyABC package are used in a
very similar manner. To perform the algorithm of Beaumont et al. (2009),
one needs to specify the sequence of tolerance levels \(tolerance\_tab\) and the number \(nb\_simul\) of simulations to obtain below
the tolerance level at each iteration:
n=10
tolerance=c(1.25,0.75)
ABC_Beaumont<-ABC_sequential(method="Beaumont", model=toy_model,
prior=toy_prior, nb_simul=n, summary_stat_target=sum_stat_obs,
tolerance_tab=tolerance)
ABC_Beaumont
#> $param
#> [,1] [,2]
#> [1,] 0.7800180 0.4830061
#> [2,] 0.3181763 2.3673583
#> [3,] 0.1811065 2.6700808
#> [4,] 0.8456229 0.2467030
#> [5,] 0.1418500 1.9097104
#> [6,] 0.3282295 2.8256064
#> [7,] 0.1976839 1.2863106
#> [8,] 0.3323167 2.3794734
#> [9,] 0.2252921 1.5824478
#> [10,] 0.6077307 0.4225812
#>
#> $stats
#> [,1] [,2]
#> [1,] 1.3838109 0.49279371
#> [2,] 2.6442826 0.65600856
#> [3,] 2.8649926 0.47168981
#> [4,] 1.0743703 0.19859866
#> [5,] 2.0575765 0.21200292
#> [6,] 3.0582520 0.80427662
#> [7,] 1.4783048 0.06284696
#> [8,] 2.6188539 0.64199282
#> [9,] 1.7743399 0.35304040
#> [10,] 0.8912953 0.24977382
#>
#> $weights
#> [1] 0.09597405 0.09105900 0.10164109 0.10169384 0.12145842 0.08442337
#> [7] 0.11758458 0.08987372 0.11027429 0.08601765
#>
#> $stats_normalization
#> [1] 2.290391 1.433019
#>
#> $epsilon
#> [1] 0.5079519
#>
#> $nsim
#> [1] 31
#>
#> $computime
#> [1] 0.004883766
To perform the algorithm of Drovandi and Pettitt (2011), one needs to specify four arguments: the initial number of simulations \(nb\_simul\), the final tolerance level \(tolerance\_tab\), the proportion \(\alpha\) of best-fit simulations to update the tolerance level at each step, and the target proportion \(c\) of unmoved particles during the MCMC jump. Note that default values \(alpha=0.5\) and \(c=0.01\) are used if not specified, following Drovandi and Pettitt (2011).
n=10
tolerance=0.75
c_drov=0.7
ABC_Drovandi<-ABC_sequential(method="Drovandi", model=toy_model,
prior=toy_prior, nb_simul=n, summary_stat_target=sum_stat_obs,
tolerance_tab=tolerance, c=c_drov)
ABC_Drovandi
#> $param
#> [,1] [,2]
#> [1,] 0.6988245 0.8190878
#> [2,] 0.6860284 0.1430693
#> [3,] 0.3080524 2.0202168
#> [4,] 0.4009210 0.8702183
#> [5,] 0.4295584 0.4770350
#> [6,] 0.6366469 0.8473816
#> [7,] 0.9931522 0.2698854
#> [8,] 0.3444874 0.5230873
#> [9,] 0.5484915 1.4070225
#> [10,] 0.3991026 0.7969931
#>
#> $stats
#> [,1] [,2]
#> [1,] 1.5335613 0.4986674
#> [2,] 0.9236163 0.1415199
#> [3,] 2.3118317 0.6644022
#> [4,] 1.3034399 0.4532500
#> [5,] 0.8410152 0.2013222
#> [6,] 1.5010774 0.4530792
#> [7,] 1.3363126 0.3626958
#> [8,] 1.0170970 0.2974128
#> [9,] 1.9362341 0.9295290
#> [10,] 1.2003607 0.1584102
#>
#> $weights
#> [1] 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1
#>
#> $stats_normalization
#> [1] 1.953418 1.214883
#>
#> $epsilon
#> [1] 0.1910321
#>
#> $nsim
#> [1] 45
#>
#> $computime
#> [1] 0.005883932
To perform the algorithm of Del Moral et al. (2012), one needs to specify five arguments: the initial number of simulations \(nb\_simul\), the number \(\alpha\) controlling the decrease in effective sample size of the particle set at each step, the number \(M\) of simulations performed for each particle, the minimal effective sample size \(nb\_threshold\) below which a resampling of particles is performed and the final tolerance level \(tolerance\_target\). Note that default values \(alpha=0.5\), \(M=1\) and \(nb\_threshold=nb\_simul/2\) are used if not specified.
n=10
alpha_delmo=0.5
tolerance=0.75
ABC_Delmoral<-ABC_sequential(method="Delmoral", model=toy_model,
prior=toy_prior, nb_simul=n, summary_stat_target=sum_stat_obs,
alpha=alpha_delmo, tolerance_target=tolerance)
ABC_Delmoral
#> $param
#> [,1] [,2]
#> [1,] 0.20305029 1.5083895
#> [2,] 0.36154201 2.4887627
#> [3,] 0.07295993 2.5253925
#> [4,] 0.34874081 1.2126412
#> [5,] 0.35066476 1.4262540
#> [6,] 0.31188902 0.8378975
#> [7,] 0.74261427 0.6621932
#> [8,] 0.17663039 1.4794365
#> [9,] 0.28307740 0.4146026
#> [10,] 0.30339615 1.2495415
#>
#> $stats
#> [,1] [,2]
#> [1,] 1.6007504 0.34104430
#> [2,] 2.8064978 0.74901158
#> [3,] 2.3885641 0.08381627
#> [4,] 1.5005152 0.39277779
#> [5,] 1.7063282 0.56293875
#> [6,] 1.2908525 0.23887366
#> [7,] 1.4279571 0.44951689
#> [8,] 1.6061781 0.41628474
#> [9,] 0.5003865 0.16883178
#> [10,] 1.6210776 0.39343083
#>
#> $weights
#> [1] 0.1428571 0.0000000 0.0000000 0.1428571 0.1428571 0.1428571 0.1428571
#> [8] 0.1428571 0.0000000 0.1428571
#>
#> $stats_normalization
#> [1] 1.8358809 0.9232688
#>
#> $epsilon
#> [1] 0.579182
#>
#> $nsim
#> [1] 33
#>
#> $computime
#> [1] 0.007021904
To perform the algorithm of Lenormand et al. (2012), one needs to specify three arguments: the initial number of simulations \(nb\_simul\), the proportion \(\alpha\) of best-fit simulations to update the tolerance level at each step, and the stopping criterion \(p\_acc\_min\). Note that default values \(alpha=0.5\) and \(p\_acc\_min=0.05\) are used if not specified, following Lenormand et al. (2012). Also note that the method “Lenormand” is only supported with uniform prior distributions (since it performs a Latin Hypercube sampling at the beginning). Here, we therefore need to alter the prior distribution of the second model parameter:
toy_prior2=list(c("unif",0,1),c("unif",0.5,1.5))
n=10
pacc=0.4
ABC_Lenormand<-ABC_sequential(method="Lenormand", model=toy_model,
prior=toy_prior2, nb_simul=10, summary_stat_target=sum_stat_obs,
p_acc_min=pacc)
ABC_Lenormand
#> $param
#> [,1] [,2]
#> [1,] 0.4551314 1.0652461
#> [2,] 0.4266732 1.0758588
#> [3,] 0.3618080 1.3258408
#> [4,] 0.5621324 0.6516160
#> [5,] 0.7931661 0.8690392
#>
#> $stats
#> [,1] [,2]
#> [1,] 1.496612 0.3921411
#> [2,] 1.468425 0.6092825
#> [3,] 1.619925 0.4034825
#> [4,] 1.365678 0.4030354
#> [5,] 1.703329 0.6694059
#>
#> $weights
#> [1] 0.2048501 0.2048501 0.2293697 0.2586764 0.1022538
#>
#> $stats_normalization
#> [1] 0.5378742 0.4894343
#>
#> $epsilon
#> [1] 0.2627045
#>
#> $nsim
#> [1] 20
#>
#> $computime
#> [1] 0.006312132
To perform the algorithm of Marjoram et al. (2003), one needs to
specify five arguments: the number of sampled points \(n\_rec\) in the Markov Chain, the number of
chain points between two sampled points \(n\_between\_sampling\), the maximal
distance accepted between simulations and data \(dist\_max\), a vector \(tab\_normalization\) precising the scale of
each summary statistics, and a vector \(proposal\_range\) precising the maximal
distances in each dimension of the parameter space for a jump of the
MCMC. All these arguments have default values (see the package help for
the function ABC_mcmc), so that ABC_mcmc will
work without user-defined values.
n=10
ABC_Marjoram_original<-ABC_mcmc(method="Marjoram_original", model=toy_model,
prior=toy_prior, summary_stat_target=sum_stat_obs, n_rec=n)
#> [1] "Warning: summary statistics are normalized by default through a division by the target summary statistics - it may not be appropriate to your case."
#> [1] "Consider providing normalization constants for each summary statistics in the option 'tab_normalization' or using the method 'Marjoram' which automatically determines these constants."
#> [1] "Warning: default values for proposal distributions are used - they may not be appropriate to your case."
#> [1] "Consider providing proposal range constants for each parameter in the option 'proposal_range' or using the method 'Marjoram' which automatically determines these constants."
#> [1] "Warning: a default value for the tolerance has been computed - it may not be appropriate to your case."
#> [1] "Consider providing a tolerance value in the option 'dist_max' or using the method 'Marjoram' which automatically determines this value."
ABC_Marjoram_original
#> $param
#> [,1] [,2]
#> [1,] 0.2398792 1.278654
#> [2,] 0.2321289 1.575667
#> [3,] 0.2672405 1.020733
#> [4,] 0.2672405 1.020733
#> [5,] 0.2846436 1.128116
#> [6,] 0.2455528 1.366674
#> [7,] 0.2223720 1.069631
#> [8,] 0.2223720 1.069631
#> [9,] 0.2223720 1.069631
#> [10,] 0.2878243 1.252653
#>
#> $stats
#> [,1] [,2]
#> [1,] 1.479906 0.4190810
#> [2,] 1.799097 0.5039863
#> [3,] 1.404520 0.4964135
#> [4,] 1.404520 0.4964135
#> [5,] 1.317378 0.4855191
#> [6,] 1.806580 0.4093119
#> [7,] 1.339412 0.3572251
#> [8,] 1.339412 0.3572251
#> [9,] 1.339412 0.3572251
#> [10,] 1.628605 0.4347523
#>
#> $dist
#> [1] 0.026370981 0.039823243 0.004103213 0.004103213 0.015661370 0.074671273
#> [7] 0.093000212 0.093000212 0.093000212 0.024379895
#>
#> $stats_normalization
#> [1] 1.5 0.5
#>
#> $epsilon
#> [1] 0.09300021
#>
#> $nsim
#> [1] 92
#>
#> $n_between_sampling
#> [1] 10
#>
#> $computime
#> [1] 0.008288145
To perform the algorithm of Marjoram et al. (2003) in which some of
the arguments (\(dist\_max\), \(tab\_normalization\) and \(proposal\_range\)) are automatically
determined by the algorithm via an initial calibration step, one needs
to specify three arguments: the number \(n\_calibration\) of simulations to perform
at the calibration step, the tolerance quantile \(tolerance\_quantile\) to be used for the
determination of \(dist\_max\) and the
scale factor \(proposal\_phi\) to
determine the proposal range. These modifications are drawn from the
algorithm of Wegmann et al. (2009a), without relying on PLS regressions.
The arguments are set by default to: \(n\_calibration=10000\), \(tolerance\_quantile=0.01\) and \(proposal\_phi=1\). This way of automatic
determination of \(dist\_max\), \(tab\_normalization\) and \(proposal\_range\) is strongly recommended,
compared to the crude automatic determination proposed in the method
Marjoram_original.
n=10
ABC_Marjoram<-ABC_mcmc(method="Marjoram", model=toy_model,
prior=toy_prior, summary_stat_target=sum_stat_obs, n_rec=n)
ABC_Marjoram
#> $param
#> [,1] [,2]
#> [1,] 0.8156467 0.6613202
#> [2,] 0.8132653 0.7117298
#> [3,] 0.8268796 0.7528865
#> [4,] 0.8704218 0.6560219
#> [5,] 0.7940921 0.6481498
#> [6,] 0.7758323 0.7067397
#> [7,] 0.8229934 0.7033363
#> [8,] 0.8797230 0.5543400
#> [9,] 0.8988029 0.6478492
#> [10,] 0.8988029 0.6478492
#>
#> $stats
#> [,1] [,2]
#> [1,] 1.542392 0.5205872
#> [2,] 1.559272 0.5470707
#> [3,] 1.492039 0.4694074
#> [4,] 1.448708 0.5497803
#> [5,] 1.530955 0.5530765
#> [6,] 1.481620 0.4812322
#> [7,] 1.427922 0.5121051
#> [8,] 1.452025 0.4737984
#> [9,] 1.430581 0.5268393
#> [10,] 1.430581 0.5268393
#>
#> $dist
#> [1] 0.0007339775 0.0024096080 0.0006730610 0.0023798899 0.0022121933
#> [6] 0.0003295032 0.0013637998 0.0010410004 0.0016757382 0.0016757382
#>
#> $stats_normalization
#> [1] 2.029893 1.192913
#>
#> $epsilon
#> [1] 0.002409608
#>
#> $nsim
#> [1] 10091
#>
#> $n_between_sampling
#> [1] 10
#>
#> $computime
#> [1] 0.3375671
To perform the algorithm of Wegmann et al. (2009a), one needs to specify four arguments: the number \(n\_calibration\) of simulations to perform at the calibration step, the tolerance quantile \(tolerance\_quantile\) to be used for the determination of \(dist\_max\), the scale factor \(proposal\_phi\) to determine the proposal range and the number of components \(numcomp\) to be used in PLS regressions. The arguments are set by default to: \(n\_calibration=10000\), \(tolerance\_quantile=0.01\), \(proposal\_phi=1\) and \(numcomp=0\), this last default value encodes a choice of a number of PLS components equal to the number of summary statistics.
n=10
ABC_Wegmann<-ABC_mcmc(method="Wegmann", model=toy_model,
prior=toy_prior, summary_stat_target=sum_stat_obs, n_rec=n)
ABC_Wegmann
#> $param
#> [,1] [,2]
#> [1,] 0.2722069 1.3531304
#> [2,] 0.2717984 1.3133540
#> [3,] 0.2947780 1.2116622
#> [4,] 0.2947780 1.2116622
#> [5,] 0.3748534 1.2900388
#> [6,] 0.3748534 1.2900388
#> [7,] 0.4720616 1.0156964
#> [8,] 0.5790331 0.9456075
#> [9,] 0.4862960 0.8426098
#> [10,] 0.5470196 0.9231012
#>
#> $stats
#> [,1] [,2]
#> [1,] 1.449920 0.4645954
#> [2,] 1.483816 0.4450861
#> [3,] 1.487523 0.4559809
#> [4,] 1.487523 0.4559809
#> [5,] 1.529353 0.5540701
#> [6,] 1.529353 0.5540701
#> [7,] 1.580289 0.5256440
#> [8,] 1.598992 0.5245697
#> [9,] 1.426881 0.5004935
#> [10,] 1.610406 0.4871551
#>
#> $dist
#> [1] 0.001559741 0.002398932 0.001538073 0.001538073 0.002464988 0.002464988
#> [7] 0.002007425 0.002740974 0.001237080 0.002931023
#>
#> $epsilon
#> [1] 0.002931023
#>
#> $nsim
#> [1] 10091
#>
#> $n_between_sampling
#> [1] 10
#>
#> $min_stats
#> [1] -6.007372 -4.873890
#>
#> $max_stats
#> [1] 9.273201 8.222326
#>
#> $lambda
#> [1] 0.6060606 0.6060606
#>
#> $geometric_mean
#> [1] 1.483339 1.406508
#>
#> $boxcox_mean
#> [1] 0.5238013 0.4351455
#>
#> $boxcox_sd
#> [1] 0.13159714 0.08956804
#>
#> $pls_transform
#> [,1] [,2]
#> [1,] -0.7122440 -0.7048392
#> [2,] -0.6566048 0.7542348
#>
#> $n_component
#> [1] 2
#>
#> $computime
#> [1] 70.3014
For the SABC algorithm by Albert et al. (2014) we need to provide the prior in the form of a sampler and a density:
r.prior <- function() c(runif(1,0,1),rnorm(1,1,2))
d.prior <- function(x) dunif(x[1],0,1)*dnorm(x[2],1,2)
Furthermore, we need to specify the size of the ensemble, the number of simulations and the initial tolerance
n.sample <- 300
iter.max <- n.sample * 30
eps.init <- 2
Since, for this example, the prior carries relevant information, we choose the method “informative”:
An approximate posterior parameter sample is contained in
ABC_Albert\$E[,1:2], e.g.
plot(ABC_Albert$E[,1:2])
The functions of the package EasyABC can launch the
simulations on multiple cores of a computer: users have to indicate the
number of cores they wish to use in the argument n_cluster
of the functions, and they have to use the option
use_seed=TRUE. Users also need to design their code in a
slightly different way so that it is compatible with the option
use_seed=TRUE (see Section The simulation
code - for use with multiple cores for additional details). For the
toy model above, the modifications needed are the following:
toy_model_parallel<-function(x){
set.seed(x[1]) # so that each core is initialized with a different seed value.
c( x[2] + x[3] + rnorm(1,0,0.1) , x[2] * x[3] + rnorm(1,0,0.1) )
}
set.seed(1)
n=10
p=0.2
ABC_rej<-ABC_rejection(model=toy_model_parallel, prior=toy_prior,
nb_simul=n, summary_stat_target=sum_stat_obs, tol=p, n_cluster=2,
use_seed=TRUE)
ABC_rej
#> $param
#> [,1] [,2]
#> [1,] 0.6870228 0.4105591
#> [2,] 0.2121425 1.7796865
#>
#> $stats
#> [,1] [,2]
#> [1,] 1.013496 0.4204994
#> [2,] 1.983370 0.4615872
#>
#> $weights
#> [1] 0.5 0.5
#>
#> $stats_normalization
#> [1] 1.420082 0.687775
#>
#> $nsim
#> [1] 10
#>
#> $nrec
#> [1] 2
#>
#> $computime
#> [1] 1.685923
We turn now to a stochastic ecological model hereafter called
trait_model to illustrate how to use EasyABC
with models not initially coded in the R language.
trait_model represents the stochastic dynamics of an
ecological community where each species is represented by a set of
traits (i.e. characteristics) which determine its competitive ability. A
detailed description and analysis of the model can be found in Jabot
(2010). The model requires four parameters: an immigration rate \(I\), and three additional parameters (\(h\), \(A\)
and \(\sigma\)) describing the way
traits determine species competitive ability. The model additionnally
requires two fixed variables: the total number of individuals in the
local community \(J\) and the number of
traits used \(n\_t\).
The model outputs four summary statistics: the species richness of the community \(S\), its Shannon’s index \(H\), the mean of the trait value among individuals \(MTV\) and the skewness of the trait value distribution \(STV\).
NB: Three parameters (\(I\), \(A\)
and \(\sigma\)) have non-uniform prior
distributions: instead, their log-transformed values have a uniform
prior distribution. The simulation code trait_model
therefore takes an exponential transform of the values proposed by
EasyABC for these parameters at the beginning of each
simulation.
In the following, we will use the values \(J=500\) and \(n\_t=1\), and uniform prior distributions
for \(ln(I)\) in \([3;5]\), \(h\) in [-25;125], \(ln(A)\) in \([ln(0.1);ln(5)]\) and \(ln(\sigma)\) in \([ln(0.5);ln(25)]\). The simulation code
trait_model reads sequentially \(J\), \(I\), \(A\), \(n\_t\), \(h\) and \(\sigma\).
NB: Note that the fixed variables \(J\) and \(n\_t\) have been fixed (see this section for details) into the function
trait_model. But if it didn’t, we would have included these
constants in the prior list using uniform distributions with a trivial
ranges, like c("unif",500,500) for example.
trait_prior=list(c("unif",3,5),c("unif",-2.3,1.6),
c("unif",-25,125), c("unif",-0.7,3.2))
We will consider an imaginary dataset whose summary statistics are \((S,H,MTV,STV) = (100,2.5,20,30000)\):
sum_stat_obs=c(100,2.5,20,30000)
A standard ABC-rejection procedure can be simply performed with the
function ABC_rejection, in precising the number \(n\) of simulations to be performed and the
proportion \(p\) of retained
simulations. Note that the option use_seed=TRUE is used,
since trait_model requires a seed initializing value for
its pseudo-random number generator:
set.seed(1)
n=10
p=0.2
ABC_rej<-ABC_rejection(model=trait_model, prior=trait_prior, nb_simul=n,
summary_stat_target=sum_stat_obs, tol=p, use_seed=TRUE)
ABC_rej
#> $param
#> [,1] [,2] [,3] [,4]
#> [1,] 4.435237 1.568434 32.00528 2.332036
#> [2,] 3.534441 -0.794155 -22.99145 0.791313
#>
#> $stats
#> [,1] [,2] [,3] [,4]
#> [1,] 116 4.104226 32.9882 3029.800
#> [2,] 84 3.994615 49.1732 1950.147
#>
#> $weights
#> [1] 0.5 0.5
#>
#> $stats_normalization
#> [1] 3.981680e+01 6.631974e-01 1.451333e+01 1.526571e+04
#>
#> $nsim
#> [1] 10
#>
#> $nrec
#> [1] 2
#>
#> $computime
#> [1] 2.71795
Alternatively, ABC_rejection can be used to solely
launch the simulations and to store the simulation outputs without
performing the rejection step. This option enables the user to make use
of the R package abc (Csilléry et al. 2012)
which offers an array of more sophisticated post-processing treatments
than the simple rejection procedure:
install.packages("abc")
library(abc)
set.seed(1)
n=10
p=0.2
ABC_rej<-ABC_rejection(model=trait_model, prior=trait_prior, nb_simul=n, use_seed=TRUE)
rej<-abc(sum_stat_obs, ABC_rej$param, ABC_rej$stats,
tol=0.2, method="rejection")
#> Warning in abc(sum_stat_obs, ABC_rej$param, ABC_rej$stats, tol = 0.2, method =
#> "rejection"): No parameter names are given, using P1, P2, ...
#> Warning in abc(sum_stat_obs, ABC_rej$param, ABC_rej$stats, tol = 0.2, method =
#> "rejection"): No summary statistics names are given, using S1, S2, ...
# simulations selected:
rej$unadj.values
#> [,1] [,2] [,3] [,4]
#> [1,] 4.435237 1.568434 32.00528 2.332036
#> [2,] 3.534441 -0.794155 -22.99145 0.791313
# their associated summary statistics:
rej$ss
#> [,1] [,2] [,3] [,4]
#> [1,] 116 4.104226 32.9882 3029.800
#> [2,] 84 3.994615 49.1732 1950.147
# their normalized euclidean distance to the data summary statistics:
rej$dist
#> [1] 6.030324 9.400513 5.613765 8.993603 3.847707 5.885814 4.960505 5.605648
#> [9] 8.706420 7.113637
Note that a simulation code My_simulation_code can be
passed to the function ABC_rejection in several ways
depending on its nature:
R function \
ABC_rejection(My_simulation_code, prior, nb_simul,...)ABC_rejection(binary_model("./My_simulation_code"), prior, nb_simul, use_seed=TRUE,...)ABC_rejection(binary_model_cluster("./My_simulation_code"), prior, nb_simul, n_cluster=2, use_seed=TRUE)Other functions of the EasyABC package are used in a
very similar manner. To perform the algorithm of Beaumont et al. (2009),
one needs to specify the sequence of tolerance levels \(tolerance\_tab\) and the number \(nb\_simul\) of simulations to obtain below
the tolerance level at each iteration:
n=10
tolerance=c(8,5)
ABC_Beaumont<-ABC_sequential(method="Beaumont", model=trait_model,
prior=trait_prior, nb_simul=n, summary_stat_target=sum_stat_obs,
tolerance_tab=tolerance, use_seed=TRUE)
ABC_Beaumont
#> $param
#> [,1] [,2] [,3] [,4]
#> [1,] 3.362110 -1.9163965 23.0654334 -0.599666827
#> [2,] 3.543818 0.9455070 17.2848491 0.716968486
#> [3,] 3.260380 -0.9270426 25.3968857 0.779964720
#> [4,] 3.398492 0.4703746 15.1099903 0.938328200
#> [5,] 3.017150 -1.8711144 26.1860829 -0.002416091
#> [6,] 4.053298 -1.7809923 14.1823387 -0.159038819
#> [7,] 4.398392 1.0919829 5.7191547 2.949045622
#> [8,] 4.358355 0.7294188 14.7202497 2.627312598
#> [9,] 4.694502 1.1509709 0.9577352 3.095833860
#> [10,] 3.672407 0.7490497 13.0817551 3.067217787
#>
#> $stats
#> [,1] [,2] [,3] [,4]
#> [1,] 60 2.406482 35.3478 15610.260
#> [2,] 52 1.891610 21.0018 12123.542
#> [3,] 50 1.762070 31.1942 10202.962
#> [4,] 45 2.162352 17.7182 11280.426
#> [5,] 46 1.682788 34.1128 12557.523
#> [6,] 119 3.615008 36.6096 13923.754
#> [7,] 122 4.144510 14.6302 17235.203
#> [8,] 118 4.079771 19.0042 12605.343
#> [9,] 149 4.212484 14.8760 20535.516
#> [10,] 86 3.506517 17.2494 6748.929
#>
#> $weights
#> [1] 0.11597293 0.06118753 0.07779398 0.05992568 0.15439436 0.07654255
#> [7] 0.11863352 0.08190165 0.18240961 0.07123820
#>
#> $stats_normalization
#> [1] 4.530833e+01 9.914929e-01 1.550687e+01 1.311587e+04
#>
#> $epsilon
#> [1] 4.782638
#>
#> $nsim
#> [1] 72
#>
#> $computime
#> [1] 15.67006
To perform the algorithm of Drovandi and Pettitt (2011), one needs to specify four arguments: the initial number of simulations \(nb\_simul\), the final tolerance level \(tolerance\_tab\), the proportion \(\alpha\) of best-fit simulations to update the tolerance level at each step, and the target proportion \(c\) of unmoved particles during the MCMC jump. Note that default values \(alpha=0.5\) and \(c=0.01\) are used if not specified, following Drovandi and Pettitt (2011).
n=10
tolerance=3
c_drov=0.7
ABC_Drovandi<-ABC_sequential(method="Drovandi", model=trait_model,
prior=trait_prior, nb_simul=n, summary_stat_target=sum_stat_obs,
tolerance_tab=tolerance, c=c_drov, use_seed=TRUE)
ABC_Drovandi
#> $param
#> [,1] [,2] [,3] [,4]
#> [1,] 4.372120 -0.001121019 5.4715372 -0.12568710
#> [2,] 4.051867 -0.125022011 12.1207510 -0.67154259
#> [3,] 4.357840 -0.158636879 4.0214488 -0.14801890
#> [4,] 4.003453 0.155106599 11.8676334 -0.43394316
#> [5,] 4.497001 0.296629792 0.4179174 0.04238219
#> [6,] 4.614254 -0.421605678 11.7121630 -0.13055283
#> [7,] 4.255693 -0.030613125 13.7085732 -0.09558018
#> [8,] 4.320190 0.177067473 15.5170178 0.26745481
#> [9,] 4.704030 -0.007652111 9.6886842 0.34755989
#> [10,] 4.202139 -0.031661415 9.6060166 -0.11315922
#>
#> $stats
#> [,1] [,2] [,3] [,4]
#> [1,] 96 2.038106 16.2344 24729.78
#> [2,] 76 1.519483 22.5206 27628.05
#> [3,] 95 2.213074 16.8628 31063.59
#> [4,] 65 1.960228 18.8398 21922.02
#> [5,] 93 2.712320 13.6572 37907.42
#> [6,] 110 2.657473 25.9604 20921.48
#> [7,] 91 1.819804 22.4596 19743.10
#> [8,] 85 2.351128 21.5434 15320.82
#> [9,] 124 2.648244 22.0450 26811.07
#> [10,] 85 2.017672 17.5428 20111.10
#>
#> $weights
#> [1] 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1
#>
#> $stats_normalization
#> [1] 45.687583 1.130895 18.184548 14175.858029
#>
#> $epsilon
#> [1] 1.204596
#>
#> $nsim
#> [1] 40
#>
#> $computime
#> [1] 6.751506
To perform the algorithm of Del Moral et al. (2012), one needs to specify five arguments: the initial number of simulations \(nb\_simul\), the number \(\alpha\) controlling the decrease in effective sample size of the particle set at each step, the number \(M\) of simulations performed for each particle, the minimal effective sample size \(nb\_threshold\) below which a resampling of particles is performed and the final tolerance level \(tolerance\_target\). Note that default values \(alpha=0.5\), \(M=1\) and \(nb\_threshold=nb\_simul/2\) are used if not specified.
n=10
alpha_delmo=0.5
tolerance=3
ABC_Delmoral<-ABC_sequential(method="Delmoral", model=trait_model,
prior=trait_prior, nb_simul=n, summary_stat_target=sum_stat_obs,
alpha=alpha_delmo, tolerance_target=tolerance, use_seed=TRUE)
ABC_Delmoral
#> $param
#> [,1] [,2] [,3] [,4]
#> [1,] 4.340312 0.27803253 12.362276 1.38257144
#> [2,] 4.142546 -0.85552836 8.562840 0.51965039
#> [3,] 4.137010 0.08351261 3.671017 2.36717400
#> [4,] 4.108617 -0.06937268 10.516238 -0.03101547
#> [5,] 4.254593 0.35733560 13.356750 1.44322454
#> [6,] 4.129874 0.13015864 10.201600 -0.10552508
#> [7,] 4.157252 -0.16565665 4.865025 1.99019890
#> [8,] 4.034561 -0.46067583 5.810586 1.22583149
#> [9,] 4.186188 -0.13743236 7.820983 1.35175359
#> [10,] 4.254593 0.35733560 13.356750 1.44322454
#>
#> $stats
#> [,1] [,2] [,3] [,4]
#> [1,] 101 3.233236 20.2856 22151.89
#> [2,] 93 2.580371 22.1176 22572.67
#> [3,] 107 3.402067 15.8006 32264.44
#> [4,] 77 1.631609 19.8354 25750.84
#> [5,] 87 3.080927 19.0860 17732.19
#> [6,] 81 2.024566 17.0106 18401.83
#> [7,] 97 3.042123 14.1450 26042.48
#> [8,] 87 3.104230 14.3686 28931.67
#> [9,] 97 2.808620 17.5424 28791.39
#> [10,] 87 3.080927 19.0860 17732.19
#>
#> $weights
#> [1] 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1
#>
#> $stats_normalization
#> [1] 32.616288 1.050966 14.887083 13081.213750
#>
#> $epsilon
#> [1] 1.37042
#>
#> $nsim
#> [1] 52
#>
#> $computime
#> [1] 8.676857
To perform the algorithm of Lenormand et al. (2012), one needs to specify three arguments: the initial number of simulations \(nb\_simul\), the proportion \(\alpha\) of best-fit simulations to update the tolerance level at each step, and the stopping criterion \(p\_acc\_min\). Note that default values \(alpha=0.5\) and \(p\_acc\_min=0.05\) are used if not specified, following Lenormand et al. (2012).
n=10
pacc=0.4
ABC_Lenormand<-ABC_sequential(method="Lenormand", model=trait_model,
prior=trait_prior, nb_simul=n, summary_stat_target=sum_stat_obs,
p_acc_min=pacc, use_seed=TRUE)
ABC_Lenormand
#> $param
#> [,1] [,2] [,3] [,4]
#> [1,] 4.872057 0.9787403 9.217730 -0.5341278
#> [2,] 3.292700 -1.5865848 -4.613586 2.9811316
#> [3,] 3.875684 0.8784209 47.125550 1.6683782
#> [4,] 4.216460 1.2599320 53.953265 1.7230472
#> [5,] 4.057230 -2.0734062 22.124426 0.2728228
#>
#> $stats
#> [,1] [,2] [,3] [,4]
#> [1,] 112 2.531360 19.2316 24271.635
#> [2,] 72 3.091854 20.3568 39848.829
#> [3,] 72 2.655304 48.0888 1081.154
#> [4,] 98 2.943912 53.2572 -1734.814
#> [5,] 112 3.995667 44.8408 10194.798
#>
#> $weights
#> [1] 0.22554134 0.22554134 0.22554134 0.22554134 0.09783465
#>
#> $stats_normalization
#> [1] 26.6426976 0.8538883 19.4103525 20397.9850272
#>
#> $epsilon
#> [1] 5.851488
#>
#> $nsim
#> [1] 15
#>
#> $computime
#> [1] 3.721657
To perform the algorithm of Marjoram et al. (2003), one needs to
specify five arguments: the number of sampled points \(n\_obs\) in the Markov Chain, the number of
chain points between two sampled points \(n\_between\_sampling\), the maximal
distance accepted between simulations and data \(dist\_max\), a vector \(tab\_normalization\) precising the scale of
each summary statistics, and a vector \(proposal\_range\) precising the maximal
distances in each dimension of the parameter space for a jump of the
MCMC. All these arguments have default values (see the package help for
the function ABC_mcmc), so that ABC_mcmc will
work without user-defined values.
n=10
ABC_Marjoram_original<-ABC_mcmc(method="Marjoram_original", model=trait_model,
prior=trait_prior, summary_stat_target=sum_stat_obs, n_rec=n, use_seed=TRUE)
#> [1] "Warning: summary statistics are normalized by default through a division by the target summary statistics - it may not be appropriate to your case."
#> [1] "Consider providing normalization constants for each summary statistics in the option 'tab_normalization' or using the method 'Marjoram' which automatically determines these constants."
#> [1] "Warning: default values for proposal distributions are used - they may not be appropriate to your case."
#> [1] "Consider providing proposal range constants for each parameter in the option 'proposal_range' or using the method 'Marjoram' which automatically determines these constants."
#> [1] "Warning: a default value for the tolerance has been computed - it may not be appropriate to your case."
#> [1] "Consider providing a tolerance value in the option 'dist_max' or using the method 'Marjoram' which automatically determines this value."
ABC_Marjoram_original
#> $param
#> [,1] [,2] [,3] [,4]
#> [1,] 4.199740 -1.1294670 8.906919 1.789854
#> [2,] 4.236296 -0.9263885 10.258952 2.143386
#> [3,] 4.434562 -0.7351417 6.881969 2.348461
#> [4,] 4.583995 -0.8471750 2.503599 2.339180
#> [5,] 4.378520 -0.9813868 14.280402 2.219657
#> [6,] 4.455056 -1.3638328 1.880114 2.338873
#> [7,] 4.587697 -1.6464353 -2.076588 2.112828
#> [8,] 4.486547 -1.7021830 -15.758385 2.296798
#> [9,] 4.401557 -2.1317335 -12.844467 2.782585
#> [10,] 4.272777 -2.2316310 -16.090006 2.630626
#>
#> $stats
#> [,1] [,2] [,3] [,4]
#> [1,] 118 3.933861 32.0702 22218.7174
#> [2,] 124 3.902011 27.1996 27035.4492
#> [3,] 130 4.091712 23.3602 25443.7013
#> [4,] 157 4.462166 26.2230 25485.6756
#> [5,] 129 4.120971 31.0466 21129.3689
#> [6,] 150 4.279180 30.3934 23215.4559
#> [7,] 171 4.680078 42.5138 4829.8046
#> [8,] 162 4.729817 49.8080 766.8731
#> [9,] 145 4.550159 50.9206 -2667.9723
#> [10,] 131 4.450486 50.5850 2537.2852
#>
#> $dist
#> [1] 0.7928533 0.5114523 0.5466613 1.0603728 0.8970070 1.0776786 3.2356483
#> [8] 4.3507541 4.4509867 3.8814095
#>
#> $stats_normalization
#> [1] 100.0 2.5 20.0 30000.0
#>
#> $epsilon
#> [1] 4.450987
#>
#> $nsim
#> [1] 92
#>
#> $n_between_sampling
#> [1] 10
#>
#> $computime
#> [1] 16.70014
To perform the algorithm of Marjoram et al. (2003) in which some of
the arguments (\(dist\_max\), \(tab\_normalization\) and \(proposal\_range\)) are automatically
determined by the algorithm via an initial calibration step, one needs
to specify three arguments: the number \(n\_calibration\) of simulations to perform
at the calibration step, the tolerance quantile \(tolerance\_quantile\) to be used for the
determination of \(dist\_max\) and the
scale factor \(proposal\_phi\) to
determine the proposal range. These modifications are drawn from the
algorithm of Wegmann et al. (2009a), without relying on PLS regressions.
The arguments are set by default to: \(n\_calibration=10000\), \(tolerance\_quantile=0.01\) and \(proposal\_phi=1\). This way of automatic
determination of \(dist\_max\), \(tab\_normalization\) and \(proposal\_range\) is strongly recommended,
compared to the crude automatic determination proposed in the method
Marjoram_original.
n=10
n_calib=10
tol_quant=0.2
ABC_Marjoram<-ABC_mcmc(method="Marjoram", model=trait_model, prior=trait_prior,
summary_stat_target=sum_stat_obs,
n_rec=n, n_calibration=n_calib, tolerance_quantile=tol_quant, use_seed=TRUE)
ABC_Marjoram
#> $param
#> [,1] [,2] [,3] [,4]
#> [1,] 4.377532 -0.43685036 16.00214 2.251974
#> [2,] 3.885844 -0.28414243 15.45805 2.256866
#> [3,] 3.713414 -0.40552403 16.51290 2.392034
#> [4,] 3.352807 -0.12041734 15.13927 2.190195
#> [5,] 3.629966 0.26055628 14.04613 2.270628
#> [6,] 3.838667 0.04460109 13.16472 2.415645
#> [7,] 3.798929 -0.59853210 13.47327 2.582163
#> [8,] 3.653076 -0.71307810 12.40725 2.439392
#> [9,] 4.478488 -0.68144386 11.64735 2.461565
#> [10,] 3.427418 -0.74796895 13.09694 2.421449
#>
#> $stats
#> [,1] [,2] [,3] [,4]
#> [1,] 132 4.216634 27.9314 24996.140
#> [2,] 93 3.174627 23.2592 20959.218
#> [3,] 84 3.409929 20.6658 8826.281
#> [4,] 44 2.013663 18.2780 8654.713
#> [5,] 65 3.174639 16.4488 13314.321
#> [6,] 80 3.381207 18.2916 19262.376
#> [7,] 91 3.496436 22.4586 17348.881
#> [8,] 77 3.167358 18.6004 13989.242
#> [9,] 150 4.480797 30.5738 21189.463
#> [10,] 83 3.047479 18.0238 17154.225
#>
#> $dist
#> [1] 3.000164 1.263669 5.606173 6.990029 4.119004 1.996978 2.456306 3.416704
#> [9] 5.274917 2.190825
#>
#> $stats_normalization
#> [1] 41.685196 1.270533 14.262361 9522.886173
#>
#> $epsilon
#> [1] 6.990029
#>
#> $nsim
#> [1] 101
#>
#> $n_between_sampling
#> [1] 10
#>
#> $computime
#> [1] 12.04817
To perform the algorithm of Wegmann et al. (2009a), one needs to specify four arguments: the number \(n\_calibration\) of simulations to perform at the calibration step, the tolerance quantile \(tolerance\_quantile\) to be used for the determination of \(dist\_max\), the scale factor \(proposal\_phi\) to determine the proposal range and the number of components \(numcomp\) to be used in PLS regressions. The arguments are set by default to: \(n\_calibration=10000\), \(tolerance\_quantile=0.01\), \(proposal\_phi=1\) and \(numcomp=0\), this last default value encodes a choice of a number of PLS components equal to the number of summary statistics.
n=10
n_calib=10
tol_quant=0.2
ABC_Wegmann<-ABC_mcmc(method="Wegmann", model=trait_model, prior=trait_prior,
summary_stat_target=sum_stat_obs,
n_rec=n, n_calibration=n_calib, tolerance_quantile=tol_quant, use_seed=TRUE)
ABC_Wegmann
#> $param
#> [,1] [,2] [,3] [,4]
#> [1,] 3.418997 -1.8441175 15.79981 -0.3531885
#> [2,] 3.424944 -1.4579560 15.24388 0.1872806
#> [3,] 3.460813 -1.7687966 14.61885 -0.4819984
#> [4,] 3.830361 -1.7979041 14.12632 0.9398673
#> [5,] 3.903148 -0.5012784 14.44541 0.6418533
#> [6,] 3.758177 -0.4700121 14.26322 0.2699922
#> [7,] 3.937060 1.1058046 14.35394 1.3264595
#> [8,] 4.531886 0.8582610 14.32158 1.2852569
#> [9,] 4.545157 1.0951576 14.80598 1.1290289
#> [10,] 4.559757 -0.6612092 15.26312 -0.1962148
#>
#> $stats
#> [,1] [,2] [,3] [,4]
#> [1,] 71 2.190161 27.5200 13954.961
#> [2,] 59 2.116203 30.9712 29472.847
#> [3,] 74 2.594792 26.9422 15934.574
#> [4,] 95 3.702872 32.6552 17190.443
#> [5,] 81 2.263678 23.9988 17508.712
#> [6,] 61 1.978142 21.4404 22695.913
#> [7,] 67 2.609104 17.6650 7290.894
#> [8,] 97 2.897983 19.4404 12758.479
#> [9,] 88 2.963642 20.0662 13630.234
#> [10,] 128 3.189369 30.9628 22055.391
#>
#> $dist
#> [1] 1.4784034 2.1311425 1.0755683 2.3644112 0.7036211 2.1148378 2.0914487
#> [8] 0.7833533 0.9541797 1.5737981
#>
#> $epsilon
#> [1] 2.364411
#>
#> $nsim
#> [1] 101
#>
#> $n_between_sampling
#> [1] 10
#>
#> $min_stats
#> [1] 43.000000 1.632943 18.051800 -36048.727962
#>
#> $max_stats
#> [1] 133.000000 4.417974 84.024000 21035.394747
#>
#> $lambda
#> [1] 0.6060606 0.6060606 1.8181818 0.6060606
#>
#> $geometric_mean
#> [1] 1.511961 1.539539 1.462830 1.483128
#>
#> $boxcox_mean
#> [1] 0.5767484 0.6038067 0.4759215 0.5468069
#>
#> $boxcox_sd
#> [1] 0.3602366 0.3218550 0.3699768 0.3741916
#>
#> $pls_transform
#> [,1] [,2] [,3] [,4]
#> [1,] -0.49792327 -0.46804745 -0.52229791 0.5123616
#> [2,] -0.49423098 -0.58543918 0.43815101 -0.4805024
#> [3,] 0.72267120 -0.68569737 0.06543188 0.1562168
#> [4,] -0.06593739 -0.06311256 0.76456955 0.6380458
#>
#> $n_component
#> [1] 4
#>
#> $computime
#> [1] 14.88913
For the SABC algorithm by Albert et al. (2014) we need to provide the prior in the form of a sampler and a density:
r.prior <- function() c(runif(1,3,5), runif(1,-2.3,1.6),
runif(1,-25,125),runif(1,-0.7,3.2),1)
d.prior <- function(x) dunif(x[1],3,5) * dunif(x[2],-2.3,1.6) *
dunif(x[3],-25,125) * dunif(x[4],-0.7,3.2)
Furthermore, we need to specify the size of the ensemble, the number of simulations and the initial tolerance
n.sample <- 30
iter.max <- n.sample * 30
eps.init <- 20
Since, for this example, the prior is flat, we choose the method “noninformative”:
We may plot the marginals of the posterior, using, e.g.,
hist(ABC_Albert$E[,3],breaks=100)
The functions of the package EasyABC can launch the
simulations on multiple cores of a computer: users only have to indicate
the number of cores they wish to use in the argument
n_cluster of the functions. The compatibility constraints
of the simulation code are slightly different when using multiple cores:
please refer to section The simulation
code - for use with multiple cores for more information.
Please send comments, suggestions and bug reports to nicolas.dumoulin@inrae.fr or franck.jabot@inrae.fr. Any new development of more efficient ABC schemes that could be included in the package is particularly welcome.
The EasyABC package makes use of a number of
R tools, among which:
R package lhs (Carnell 2012) for latin
hypercube sampling.R package MASS (Venables and Ripley
2002) for boxcox transformation.R package mnormt (Genz and Azzalini
2012) for multivariate normal generation.R package pls (Mevik and Wehrens 2011)
for partial least square regression.R script for the Wegmann et al. (2009a)’s algorithm
drawn from the ABCtoolbox documentation (Wegmann et
al. 2009b). We thank Sylvie Huet, Albert Ko, Matteo Fasiolo and Wim
Delva for their suggestions and inputs in the development of this
version.Albert C., K"unsch HR., Scheidegger A. (2014) A Simulated Annealing Approach to Approximate Bayes Computations. Stat. Comput., 1–16, arXiv:1208.2157 [stat.CO].
Beaumont, M. A., Cornuet, J., Marin, J., and Robert, C. P. (2009) Adaptive approximate Bayesian computation. Biometrika,96, 983–990.
Carnell, R. (2012) lhs: Latin Hypercube Samples. R package version 0.10. https://doi.org/10.32614/CRAN.package.lhs
Csilléry, K., Franois, O., and Blum, M.G.B. (2012) abc: an r package for approximate bayesian computation (abc). Methods in Ecology and Evolution, 3, 475–479.
Del Moral, P., Doucet, A., and Jasra, A. (2012) An adaptive sequential Monte Carlo method for approximate Bayesian computation. Statistics and Computing, 22, 1009–1020.
Drovandi, C. C. and Pettitt, A. N. (2011) Estimation of parameters for macroparasite population evolution using approximate Bayesian computation. Biometrics, 67, 225–233.
Fearnhead, P. and Prangle, D. (2012) Constructing summary statistics for approximate Bayesian computation: semiautomatic approximate Bayesian computation. J.Roy. Stat. Soc.: Series B 74.3, 419-474.
Genz, A., and Azzalini, A. (2012) mnormt: The multivariate normal and t distributions. R package version 1.4-5. https://doi.org/10.32614/CRAN.package.mnormt
Jabot, F. (2010) A stochastic dispersal-limited trait-based model of community dynamics. Journal of Theoretical Biology, 262, 650–661.
Lenormand, M., Jabot, F., Deffuant G. (2012) Adaptive approximate Bayesian computation for complex models. http://arxiv.org/pdf/1111.1308.pdf
Marjoram, P., Molitor, J., Plagnol, V. and Tavaré, S. (2003) Markov chain Monte Carlo without likelihoods. PNAS, 100, 15324–15328.
Mevik, B.-H., and Wehrens, R. (2011) pls: Partial Least Squares and Principal Component regression. R package version 2.3-0. https://doi.org/10.32614/CRAN.package.pls
Pritchard, J.K., and M.T. Seielstad and A. Perez-Lezaun and M.W. Feldman (1999) Population growth of human Y chromosomes: a study of Y chromosome microsatellites. Molecular Biology and Evolution, 16, 1791–1798.
Sisson, S.A., Fan, Y., and Tanaka, M.M. (2007) Sequential Monte Carlo without likelihoods. PNAS, 104, 1760–1765.
Venables, W.N., and Ripley, B.D. (2002) Modern Applied Statistics with S. Fourth Edition. Springer, New York.
Wegmann, D., Leuenberger, C. and Excoffier, L. (2009a) Efficient approximate Bayesian computation coupled with Markov chain Monte Carlo without likelihood. Genetics, 182, 1207-1218.
Wegmann, D., Leuenberger, C. and Excoffier, L. (2009b) Using ABCtoolbox. https://cmpg.unibe.ch/software/ABCtoolbox/ABCtoolbox_manual.pdf