Description: | Bayesian Network Structure Learning from Data with Missing Values. The package implements the Silander-Myllymaki complete search, the Max-Min Parents-and-Children, the Hill-Climbing, the Max-Min Hill-climbing heuristic searches, and the Structural Expectation-Maximization algorithm. Available scoring functions are BDeu, AIC, BIC. The package also implements methods for generating and using bootstrap samples, imputed data, inference. |
Type: | Package |
Title: | Bayesian Network Structure Learning from Data with Missing Values |
Version: | 1.0.15 |
Date: | 2024-01-09 |
Depends: | R (≥ 3.5.0), bitops, igraph, methods |
Suggests: | graph, Rgraphviz, qgraph, knitr, testthat |
License: | GPL-2 | GPL-3 | file LICENSE [expanded from: GPL (≥ 2) | file LICENSE] |
Encoding: | UTF-8 |
RoxygenNote: | 7.2.3 |
VignetteBuilder: | knitr |
NeedsCompilation: | yes |
Packaged: | 2024-01-09 18:29:57 UTC; alberto |
Author: | Francesco Sambo [aut], Alberto Franzin [aut, cre] |
Maintainer: | Alberto Franzin <afranzin@ulb.ac.be> |
Repository: | CRAN |
Date/Publication: | 2024-01-09 23:43:02 UTC |
BN class definition.
Description
Instantiate a BN
object.
Usage
## S4 method for signature 'BN'
initialize(.Object, dataset = NULL, ...)
BN(dataset = NULL, ...)
Arguments
.Object |
a BN |
dataset |
a |
... |
potential further arguments of methods. |
Details
The constructor may be invoked without parameters – in this case an empty network will be created, and its slots will be filled manually by the user. This is usually viable only if the user already has knowledge about the network structure.
Value
BN object.
Slots
name
:name of the network
num.nodes
:number of nodes in the network
variables
:names of the variables in the network
discreteness
:TRUE
if variable is discrete,FALSE
if variable is continuenode.sizes
:if variable
i
is discrete,node.sizes[i]
contains the cardinality ofi
, ifi
is instead discrete the value is the number of states variablei
takes when discretizedcpts
:list of conditional probability tables of the network
dag
:adjacency matrix of the network
wpdag
:weighted partially dag
scoring.func
:scoring function used in structure learning (when performed)
struct.algo
:algorithm used in structure learning (when performed)
num.time.steps
:number of instants in which the network is observed (1, unless it is a Dynamic Bayesian Network)
discreteness
:TRUE
if variable is discrete,FALSE
if variable is continue
Examples
## Not run:
net.1 <- BN()
dataset <- BNDataset()
dataset <- read.dataset(dataset, "file.header", "file.data")
net.2 <- BN(dataset)
## End(Not run)
BNDataset class.
Description
Contains the all of the data that can be extracted from a given dataset: raw data, imputed data, raw and imputed data with bootstrap.
Usage
BNDataset(data, discreteness, variables = NULL, node.sizes = NULL, ...)
## S4 method for signature 'BNDataset'
initialize(.Object)
Arguments
.Object |
an empty BNDataset. |
data |
raw data.frame or path/name of the file containing the raw dataset (see 'Details'). |
discreteness |
a vector of booleans indicating if the variables are discrete or continuous
( |
variables |
vector of variable names. |
node.sizes |
vector of variable cardinalities (for discrete variables) or quantization ranges (for continuous variables). |
... |
further arguments for reading a dataset from files (see documentation for |
Details
There are two ways to build a BNDataset: using two files containing respectively header informations and data, and manually providing the data table and the related header informations (variable names, cardinality and discreteness).
The key informations needed are: 1. the data; 2. the state of variables (discrete or continuous); 3. the names of the variables; 4. the cardinalities of the variables (if discrete), or the number of levels they have to be quantized into (if continuous). Names and cardinalities/leves can be guessed by looking at the data, but it is strongly advised to provide _all_ of the informations, in order to avoid problems later on during the execution.
Data can be provided in form of data.frame or matrix. It can contain NAs. By default, NAs are indicated with '?';
to specify a different character for NAs, it is possible to provide also the na.string.symbol
parameter.
The values contained in the data have to be numeric (real for continuous variables, integer for discrete ones).
The default range of values for a discrete variable X
is [1,|X|]
, with |X|
being
the cardinality of X
. The same applies for the levels of quantization for continuous variables.
If the value ranges for the data are different from the expected ones, it is possible to specify a different
starting value (for the whole dataset) with the starts.from
parameter. E.g. by starts.from=0
we assume that the values of the variables in the dataset have range [0,|X|-1]
.
Please keep in mind that the internal representation of bnstruct starts from 1,
and the original starting values are then lost.
It is possible to use two files, one for the data and one for the metadata,
instead of providing manually all of the info.
bnstruct requires the data files to be in a format subsequently described.
The actual data has to be in (a text file containing data in) tabular format, one tuple per row,
with the values for each variable separated by a space or a tab. Values for each variable have to be
numbers, starting from 1
in case of discrete variables.
Data files can have a first row containing the names of the corresponding variables.
In addition to the data file, a header file containing additional informations can also be provided.
An header file has to be composed by three rows of tab-delimited values:
1. list of names of the variables, in the same order of the data file;
2. a list of integers representing the cardinality of the variables, in case of discrete variables,
or the number of levels each variable has to be quantized in, in case of continuous variables;
3. a list that indicates, for each variable, if the variable is continuous
(c
or C
), and thus has to be quantized before learning,
or discrete (d
or D
).
In case of need of more advanced options when reading a dataset from files, please refer to the
documentation of the read.dataset
method. Imputation and bootstrap are also available
as separate routines (impute
and bootstrap
, respectively).
In case of an evolving system to be modeled as a Dynamic Bayesian Network, it is possible to specify
only the description of the variables of a single instant; the information will be replicated for all
the num.time.steps
instants that compose the dataset, where num.time.steps
needs to be
set as parameter. In this case, it is assumed that the N variables v1, v2, ..., vN of a single instant
appear in the dataset as v1_t1, v2_t1, ..., vN_t1, v1_t2, v2_t2, ..., in this exact order.
The user can however provide information for all the variables in all the instants; if it is not the case,
the name of the variables will be edited to include the instant. In case of an evolving system, the
num.variables
slots refers anyway to the total number of variables observed in all the instants
(the number of columns in the dataset), and not to a single instant.
Value
BNDataset object.
a BNDataset object.
Slots
name
:name of the dataset
header.file
:name and location of the header file
data.file
:name and location of the data file
variables
:names of the variables in the network
node.sizes
:cardinality of each variable of the network
num.variables
:number of variables (columns) in the dataset
discreteness
:TRUE
if variable is discrete,FALSE
if variable is continuequantiles
:list of vectors containing the quantiles, one vector per variable. Each vector is
NULL
if the variable is discrete, and contains the quantiles if it is continuousnum.items
:number of observations (rows) in the dataset
has.raw.data
:TRUE if the dataset contains data read from a file
has.imputed.data
:TRUE if the dataset contains imputed data (computed from raw data)
raw.data
:matrix containing raw data
imputed.data
:matrix containing imputed data
has.boots
:dataset has bootstrap samples
boots
:list of bootstrap samples
has.imputed.boots
:dataset has imputed bootstrap samples
imp.boots
:list of imputed bootstrap samples
num.boots
:number of bootstrap samples
num.time.steps
:number of instants in which the network is observed (1, unless it is a dynamic system)
See Also
read.dataset, impute, bootstrap
Examples
## Not run:
# create from files
dataset <- BNDataset("file.data", "file.header")
# other way: create from raw dataset and metadata
data <- matrix(c(1:16), nrow = 4, ncol = 4)
dataset <- BNDataset(data = data,
discreteness = rep('d',4),
variables = c("a", "b", "c", "d"),
node.sizes = c(4,8,12,16))
## End(Not run)
InferenceEngine class.
Description
InferenceEngine class.
Constructor method of InferenceEngine
class.
constructor for InferenceEngine
object
Usage
## S4 method for signature 'InferenceEngine'
initialize(.Object, ...)
InferenceEngine(bn = NULL, observations = NULL, interventions = NULL, ...)
Arguments
.Object |
an empty InferenceEngine object. |
... |
potential further arguments of methods. |
bn |
a |
observations |
a list of observations composed by the two following vectors:
|
interventions |
a list of interventions composed of the following two vectors:
|
Value
an InferenceEngine object.
InferenceEngine object.
Slots
junction.tree
:junction tree adjacency matrix.
num.nodes
:number of nodes in the junction tree.
cliques
:list of cliques composing the nodes of the junction tree.
triangulated.graph
:adjacency matrix of the original triangulated graph.
jpts
:inferred joint probability tables.
bn
:original Bayesian Network (as object of class
BN
) as provided by the user, or learnt from a dataset.NULL
if missing.updated.bn
:Bayesian Network (as object of class
BN
) as modified by a belief propagation computation. In particular, it will have different conditional probability tables with respect to its original version.NULL
if missing.observed.vars
:list of observed variables, by name or number.
observed.vals
:list of observed values for the corresponding variables in
observed.vars
.intervention.vars
:list of manipulated variables, by name or number.
intervention.vals
:list of specified values for the corresponding variables in
intervention.vars
.
Examples
## Not run:
dataset <- BNDataset()
dataset <- read.dataset(dataset, "file.header", "file.data")
bn <- BN(dataset)
eng <- InferenceEngine(bn)
obs <- list(c("A","G,"X),c(1,2,1))
eng.2 <- InferenceEngine(bn, obs)
## End(Not run)
add further evidence to an existing list of observations of an InferenceEngine
.
Description
Add a list of observations to an InferenceEngine that already has observations, using a list composed by the two following vectors:
observed.vars
vector of observed variables;
observed.vals
vector of values observed for the variables in
observed.vars
in the corresponding position.
Usage
add.observations(x) <- value
## S4 replacement method for signature 'InferenceEngine'
add.observations(x) <- value
Arguments
x |
an |
value |
the list of observations of the |
Details
In case of multiple observations of the same variable, the last observation is the one used, as the most recent.
See Also
load Asia
dataset.
Description
Wrapper for a loader for the Asia
dataset, with only raw data.
Usage
asia()
Details
The dataset has 10000 items, no missing data, so no imputation needs to be performed.
Value
a BNDataset containing the Child
dataset.
See Also
Examples
dataset <- asia()
print(dataset)
Asia
dataset.
Description
The Asia
dataset contains 10000 complete (no missing data, no latent variables) randomly generated items of the Asia
Bayesian Network.
No imputation needs to be performed, so only raw data is present.
Format
a BNDataset
with raw data slow filled.
Details
The data the BNDataset object is built from is located in files pkg_folder/extdata/asia_10000.header
and pkg_folder/extdata/asia_10000.data
.
References
S. Lauritzen, D. Spiegelhalter. Local Computation with Probabilities on Graphical Structures and their Application to Expert Systems (with discussion). Journal of the Royal Statistical Society: Series B (Statistical Methodology), 50(2):157-224, 1988.
See Also
load a two-layers dataset derived from the Asia
dataset.
Description
Wrapper for a loader for a 2-layers dataset derived from the Asia
dataset, with only raw data.
Usage
asia_2_layers()
Details
The dataset has 100 items, no missing data, so no imputation needs to be performed.
Value
a BNDataset containing the Child
dataset.
See Also
Examples
dataset <- asia_2_layers()
print(dataset)
perform belief propagation.
Description
Perform belief propagation for the network of an InferenceEngine, given a set of observations.
In the current version of bnstruct
, belief propagation can be computed only over a junction tree.
Usage
belief.propagation(ie, observations = NULL, return.potentials = FALSE)
## S4 method for signature 'InferenceEngine'
belief.propagation(ie, observations = NULL, return.potentials = FALSE)
Arguments
ie |
an |
observations |
list of observations, consisting in two vector, |
return.potentials |
if TRUE only the potentials are returned, instead of the default |
Value
updated InferenceEngine
object.
Examples
## Not run:
dataset <- BNDataset("file.header", "file.data")
bn <- BN(dataset)
ie <- InferenceEngine(bn)
ie <- belief.propagation(ie)
observations(ie) <- list("observed.vars"=("A","G","X"), "observed.vals"=c(1,2,1))
belief.propagation(ie)
## End(Not run)
get the BN
object contained in an InferenceEngine
.
Description
Return a network contained in an InferenceEngine.
Usage
bn(x)
## S4 method for signature 'InferenceEngine'
bn(x)
Arguments
x |
an |
Value
the BN
object contained in an InferenceEngine
.
set the original BN
object contained in an InferenceEngine
.
Description
Add an original network to an InferenceEngine.
Usage
bn(x) <- value
## S4 replacement method for signature 'InferenceEngine'
bn(x) <- value
Arguments
x |
an |
value |
the |
get selected element of bootstrap list.
Description
Given a BNDataset
, return the sample corresponding to given index.
Usage
boot(dataset, index, use.imputed.data = FALSE)
## S4 method for signature 'BNDataset,numeric'
boot(dataset, index, use.imputed.data = FALSE)
Arguments
dataset |
a |
index |
the index of the requested sample. |
use.imputed.data |
|
See Also
bootstrap
Examples
## Not run:
dataset <- BNDataset("file.data", "file.header")
dataset <- bootstrap(dataset, num.boots = 1000)
for (i in 1:num.boots(dataset))
print(boot(dataset, i))
## End(Not run)
get list of bootstrap samples of a BNDataset
.
Description
Return the list of samples computed from raw data of a dataset.
Usage
boots(x)
## S4 method for signature 'BNDataset'
boots(x)
Arguments
x |
a |
Value
the list of bootstrap samples.
See Also
has.boots
, has.imputed.boots
, imp.boots
set list of bootstrap samples of a BNDataset
.
Description
Add to a dataset a list of samples from raw data computed using bootstrap.
Usage
boots(x) <- value
## S4 replacement method for signature 'BNDataset'
boots(x) <- value
Arguments
x |
a |
value |
the list of bootstrap samples. |
Perform bootstrap.
Description
Create a list of num.boots
samples of the original dataset.
Usage
bootstrap(object, num.boots = 100, seed = 0, imputation = FALSE, k.impute = 10)
## S4 method for signature 'BNDataset'
bootstrap(object, num.boots = 100, seed = 0, imputation = FALSE, k.impute = 10)
Arguments
object |
the |
num.boots |
number of sampled datasets for bootstrap. |
seed |
random seed. |
imputation |
|
k.impute |
number of neighbours to be used; for discrete variables we use mode, for continuous variables the median value is instead taken (useful only if imputation == TRUE). |
Examples
## Not run:
dataset <- BNDataset("file.data", "file.header")
dataset <- bootstrap(dataset, num.boots = 1000)
## End(Not run)
build a JunctionTree.
Description
Starting from the adjacency matrix of the directed acyclic graph of the network contained in an InferenceEngine, build a JunctionTree for the network and store it into an InferenceEngine.
Usage
build.junction.tree(object, ...)
## S4 method for signature 'InferenceEngine'
build.junction.tree(object, ...)
Arguments
object |
an |
... |
potential further arguments for methods. |
See Also
InferenceEngine
Examples
## Not run:
dataset <- BNDataset("file.header", "file.data")
net <- BN(dataset)
eng <- InferenceEngine()
eng <- build.junction.tree(eng)
## End(Not run)
load Child
dataset.
Description
Wrapper for a loader for the Child
raw dataset; also perform imputation.
Usage
child()
Details
The dataset has 5000 items, with random missing values (no latent variables). BNDataset object contains the raw dataset and imputed dataset, with k=10
(see impute
for related explanation).
Value
a BNDataset containing the Child
dataset.
See Also
Examples
dataset <- child()
print(dataset)
Child
dataset.
Description
The Child
dataset contains 5000 randomly generated items with missing data (no latent variables) of the Child
Bayesian Network.
Imputation is performed, so both raw and imputed data is present.
Format
a BNDataset
with a raw and imputed data slow filled with 5000 items.
Details
The data the BNDataset object is built from is located in files pkg_folder/extdata/extdata/Child_data_na_5000.header
and pkg_folder/extdata/extdata/Child_data_na_5000.data
.
References
D. J. Spiegelhalter, R. G. Cowell (1992). Learning in probabilistic expert systems. In Bayesian Statistics 4 (J. M. Bernardo, J. 0. Berger, A. P. Dawid and A. F. M. Smith, eds.) 447-466. Clarendon Press, Oxford.
See Also
Subset a BNDataset
to get only complete cases.
Description
Given a BNDataset
, return a copy of the original object where
the raw.data
consists only in the observations that do not contain missing values.
Usage
complete(x, complete.vars = seq_len(num.variables(x)))
## S4 method for signature 'BNDataset'
complete(x, complete.vars = seq_len(num.variables(x)))
Arguments
x |
a |
complete.vars |
vector containing the indices of the variables to be considered
for the subsetting; variables not included in the vector can still contain |
Details
Non-missingness can be required on a subset of variables (by default, on all variables).
If present, imputed data and bootstrap samples are eliminated from the
new BNDataset
, as using this method *after* using impute
or bootstrap
, there may likely be a loss of correspondence between
the subsetted raw.data
and the previously generated imputed.data
and bootstrap
samples.
Value
a copy of the original BNDataset
containing only complete observations.
get the list of conditional probability tables of a BN
.
Description
Return the list of conditional probability tables of the variables of a BN
object.
Each probability table is associated to the corresponding variable, and its dimensions are named according
to the variable they represent.
Usage
cpts(x)
## S4 method for signature 'BN'
cpts(x)
Arguments
x |
an object. |
Details
Each conditional probability table is represented as a multidimensional array.
The ordering of the dimensions of each variable is not guaranteed to follow the actual conditional distribution.
E.g. dimensions for conditional probability P(C|A,B)
can be either (C,A,B)
or (A,B,C)
, depending on
if some operations have been performed, or how the probability table has been computed.
Users should not rely on dimension numbers, but should instead select the dimensions using their names.
Value
list of the conditional probability tables of the desired object.
set the list of conditional probability tables of a network.
Description
Set the list of conditional probability tables of a BN
object.
Usage
cpts(x) <- value
## S4 replacement method for signature 'BN'
cpts(x) <- value
Arguments
x |
an object. |
value |
list of the conditional probability tables of the object. |
Details
Each conditional probability table is represented as a multidimensional array. To retrieve single dimensions (e.g. to compute marginals), users should provide dimensions names.
get adjacency matrix of a network.
Description
Return the adjacency matrix of the directed acyclic graph representing the structure of a network.
Usage
dag(x)
## S4 method for signature 'BN'
dag(x)
Arguments
x |
an object. |
Value
matrix containing the adjacency matrix of the directed acyclic graph representing the structure of the object.
set adjacency matrix of an object.
Description
Set the adjacency matrix of the directed acyclic graph representing the structure of a network.
Usage
dag(x) <- value
## S4 replacement method for signature 'BN'
dag(x) <- value
Arguments
x |
an object. |
value |
matrix containing the adjacency matrix of the directed acyclic graph representing the structure of the object. |
convert a DAG to a CPDAG
Description
Convert the adjacency matrix representing the DAG of a BN
into the adjacency matrix representing a CPDAG for the network.
Usage
dag.to.cpdag(dag.adj.matrix, layering = NULL, layer.struct = NULL)
Arguments
dag.adj.matrix |
the adjacency matrix representing the DAG of a |
layering |
vector containing the layers where each node belongs. |
layer.struct |
layer.struct |
Value
the adjacency matrix representing a CPDAG for the network.
See Also
Examples
## Not run:
net <- learn.network(dataset, layering=layering, layer.struct=layer.struct)
pdag <- dag.to.cpdag(dag(net), layering, layer.struct)
wpdag(net) <- pdag
## End(Not run)
get data file of a BNDataset
.
Description
Return the data filename of a dataset (with the path to its position, as given by the user). The data filename may contain a header in the first row, containing the list of names of the variables, in the same order as in the header file. After the header, if present, the file contains a data.frame with the observations, one item per row.
Usage
data.file(x)
## S4 method for signature 'BNDataset'
data.file(x)
Arguments
x |
a |
Value
data filename of the dataset.
See Also
set data file of a BNDataset
.
Description
Set the data filename of a dataset (with the path to its position, as given by the user). The data filename may contain a header in the first row, containing the list of names of the variables, in the same order as in the header file. After the header, if present, the file contains a data.frame with the observations, one item per row.
Usage
data.file(x) <- value
## S4 replacement method for signature 'BNDataset'
data.file(x) <- value
Arguments
x |
a |
value |
data filename. |
See Also
get status (discrete or continuous) of the variables of an object.
Description
Get a vector representing the status of the variables (with their names) of a BN
or BNDataset
.
Elements of the vector are c
if the variable is continue, and d
if the variable is discrete.
Usage
discreteness(x)
## S4 method for signature 'BN'
discreteness(x)
## S4 method for signature 'BNDataset'
discreteness(x)
Arguments
x |
an object. |
Value
vector contaning, for each variable of the desired object,
c
if the variable is continue, and d
if the variable is discrete.
set status (discrete or continuous) of the variables of an object.
Description
Set the list of variable status for the variables in a network or a dataset.
Usage
discreteness(x) <- value
## S4 replacement method for signature 'BN'
discreteness(x) <- value
## S4 replacement method for signature 'BNDataset'
discreteness(x) <- value
Arguments
x |
an object. |
value |
a vector of elements in { |
counts the edges in a WPDAG with their directionality
Description
Given a BN
with a WPDAG
, it counts the edges, with
their directionality.
Usage
edge.dir.wpdag(x, use.node.names = TRUE)
Arguments
x |
the |
use.node.names |
use node names rather than number ( |
Value
a matrix containing the node pairs with the count of the edges
between them in the WPDAG
.
expectation-maximization algorithm.
Description
Learn parameters of a network using the Expectation-Maximization algorithm.
Usage
em(x, dataset, threshold = 0.001, max.em.iterations = 10, ess = 1)
## S4 method for signature 'InferenceEngine,BNDataset'
em(x, dataset, threshold = 0.001, max.em.iterations = 10, ess = 1)
Arguments
x |
an |
dataset |
observed dataset with missing values for the Bayesian Network of |
threshold |
threshold for convergence, used as stopping criterion. |
max.em.iterations |
maximum number of iterations to run in case of no convergence. |
ess |
Equivalent Sample Size value. |
Value
a list containing: an InferenceEngine
with a new updated network ("InferenceEngine"
),
and the imputed dataset ("BNDataset"
).
Examples
## Not run:
em(x, dataset)
## End(Not run)
compute the most probable values to be observed.
Description
Return an array containing the values that each variable of the network is more likely to take, according to the CPTS. In case of ties take the first value.
Usage
get.most.probable.values(x, prev.values = NULL)
## S4 method for signature 'BN'
get.most.probable.values(x, prev.values = NULL)
## S4 method for signature 'InferenceEngine'
get.most.probable.values(x, prev.values = NULL)
Arguments
x |
a |
prev.values |
vector of size |
Value
array containing, in each position, the most probable value for the corresponding variable.
Examples
## Not run:
# try with a BN object x
get.most.probable.values(x)
# now build an InferenceEngine object
eng <- InferenceEngine(x)
get.most.probable.values(eng)
## End(Not run)
check whether a BNDataset
has bootstrap samples or not.
Description
Return TRUE
if the given dataset contains samples for bootstrap, FALSE
otherwise.
Usage
has.boots(x)
## S4 method for signature 'BNDataset'
has.boots(x)
Arguments
x |
a |
Value
TRUE
if dataset has bootstrap samples.
See Also
has.imputed.boots
, boots
, imp.boots
check whether a BNDataset
has bootstrap samples from imputed data or not.
Description
Return TRUE
if the given dataset contains samples for bootstrap from inputed dataset, FALSE
otherwise.
Usage
has.imputed.boots(x)
## S4 method for signature 'BNDataset'
has.imputed.boots(x)
Arguments
x |
a |
Value
TRUE
if dataset has bootstrap samples from imputed data.
See Also
check if a BNDataset contains impited data.
Description
Check whether a BNDataset
object actually contains imputed data.
Usage
has.imputed.data(x)
## S4 method for signature 'BNDataset'
has.imputed.data(x)
Arguments
x |
a |
See Also
has.raw.data
, raw.data
, imputed.data
Examples
## Not run:
x <- BNDataset()
has.imputed.data(x) # FALSE
x <- read.dataset(x, "file.header", "file.data")
has.imputed.data(x) # FALSE, since read.dataset() actually reads raw data.
x <- impute(x)
has.imputed.data(x) # TRUE
## End(Not run)
check if a BNDataset contains raw data.
Description
Check whether a BNDataset
object actually contains raw data.
Usage
has.raw.data(x)
## S4 method for signature 'BNDataset'
has.raw.data(x)
Arguments
x |
a |
See Also
has.imputed.data
, raw.data
, imputed.data
Examples
## Not run:
x <- BNDataset()
has.raw.data(x) # FALSE
x <- read.dataset(x, "file.header", "file.data")
has.raw.data(x) # TRUE, since read.dataset() actually reads raw data.
## End(Not run)
get header file of a BNDataset
.
Description
Return the header filename of a dataset (with the path to its position, as given by the user), present if the dataset has been read from a file and not manually inserted. The header file contains three rows:
list of names of the variables, in the same order as in the data file;
list of cardinalities of the variables, if discrete, or levels for quantization if continuous;
list of status of the variables:
c
for continuous variables,d
for discrete ones.
Usage
header.file(x)
## S4 method for signature 'BNDataset'
header.file(x)
Arguments
x |
a |
Value
header filename of the dataset.
See Also
set header file of a BNDataset
.
Description
Set the header filename of a dataset (with the path to its position, as given by the user). The header file has to contain three rows:
list of names of the variables, in the same order as in the data file;
list of cardinalities of the variables, if discrete, or levels for quantization if continuous;
list of status of the variables:
c
for continuous variables,d
for discrete ones.
Further rows are ignored.
Usage
header.file(x) <- value
## S4 replacement method for signature 'BNDataset'
header.file(x) <- value
Arguments
x |
a |
value |
header filename. |
See Also
get list of bootstrap samples from imputed data of a BNDataset
.
Description
Return the list of samples computed from raw data of a dataset.
Usage
imp.boots(x)
## S4 method for signature 'BNDataset'
imp.boots(x)
Arguments
x |
a |
Value
the list of bootstrap samples from imputed data.
See Also
has.boots
, has.imputed.boots
, boots
set list of bootstrap samples from imputed data of a BNDataset
.
Description
Add to a dataset a list of samples from imputed data computed using bootstrap.
Usage
imp.boots(x) <- value
## S4 replacement method for signature 'BNDataset'
imp.boots(x) <- value
Arguments
x |
a |
value |
the list of bootstrap samples from imputed data. |
Impute a BNDataset
raw data with missing values.
Description
Impute a BNDataset
raw data with missing values.
Usage
impute(object, k.impute = 10)
## S4 method for signature 'BNDataset'
impute(object, k.impute = 10)
Arguments
object |
the |
k.impute |
number of neighbours to be used; for discrete variables we use mode, for continuous variables the median value is instead taken. |
Examples
## Not run:
dataset <- BNDataset("file.data", "file.header")
dataset <- impute(dataset)
## End(Not run)
get imputed data of a BNDataset.
Description
Return imputed data contained in a BNDataset
object, if any.
Usage
imputed.data(x)
## S4 method for signature 'BNDataset'
imputed.data(x)
Arguments
x |
a |
See Also
has.raw.data
, has.imputed.data
, raw.data
add imputed data.
Description
Insert imputed data in a BNDataset
object.
Usage
imputed.data(x) <- value
## S4 replacement method for signature 'BNDataset'
imputed.data(x) <- value
Arguments
x |
a |
value |
a matrix of integers containing a dataset. |
See Also
has.imputed.data
, imputed.data
, read.dataset
get the list of interventions of an InferenceEngine
.
Description
Return the list of interventions added to an InferenceEngine.
Usage
interventions(x)
## S4 method for signature 'InferenceEngine'
interventions(x)
Arguments
x |
an |
Details
Output is a list in the following format:
intervention.vars
vector of manipulated variables;
intervention.vals
vector of values for the variables in
observed.vars
in the corresponding position.
Value
the list of interventions of the InferenceEngine
.
set the list of interventions for an InferenceEngine
.
Description
Add a list of interventions to an InferenceEngine, using a list composed by the two following vectors:
intervention.vars
vector of the variables we manipulate;
intervention.vals
vector of values for the variables in
observed.vars
in the corresponding position.
Usage
interventions(x) <- value
## S4 replacement method for signature 'InferenceEngine'
interventions(x) <- value
Arguments
x |
an |
value |
the list of interventions of the |
Details
An intervention can be applied only when building an InferenceEngine
.
In case of multiple interventions of the same variable, the last intervention is the one used.
get the list of joint probability tables compiled by an InferenceEngine
.
Description
Return the list of joint probability tables for the cliques of the junction tree obtained after belief propagation has been performed.
Usage
jpts(x)
## S4 method for signature 'InferenceEngine'
jpts(x)
Arguments
x |
an |
Details
Each joint probability table is represented as a multidimensional array. To retrieve single dimensions (e.g. to compute marginals), users should not rely on dimension numbers, but should instead select the dimensions using their names.
Value
the list of joint probability tables compiled by the InferenceEngine
.
set the list of joint probability tables compiled by an InferenceEngine
.
Description
Add a list of joint probability tables for the cliques of the junction tree.
Usage
jpts(x) <- value
## S4 replacement method for signature 'InferenceEngine'
jpts(x) <- value
Arguments
x |
an |
value |
the list of joint probability tables compiled by the |
Details
Each joint probability table is represented as a multidimensional array. To retrieve single dimensions (e.g. to compute marginals), users should provide dimension names.
get the list of cliques of the junction tree of an InferenceEngine
.
Description
Return the list of cliques containing the variables associated to each node of a junction tree.
Usage
jt.cliques(x)
## S4 method for signature 'InferenceEngine'
jt.cliques(x)
Arguments
x |
an |
Value
the list of cliques of the junction tree contained in the InferenceEngine
.
set the list of cliques of the junction tree of an InferenceEngine
.
Description
Add to the InferenceEngine a list containing the cliques of variables composing the nodes of the junction tree.
Usage
jt.cliques(x) <- value
## S4 replacement method for signature 'InferenceEngine'
jt.cliques(x) <- value
Arguments
x |
an |
value |
the list of cliques of the junction tree contained in the |
get the junction tree of an InferenceEngine
.
Description
Return the adjacency matrix representing the junction tree computed for a network.
Usage
junction.tree(x)
## S4 method for signature 'InferenceEngine'
junction.tree(x)
Arguments
x |
an |
Details
Rows and columns are named after the (variables in the) cliques that each node of the junction tree represent.
Value
the junction tree contained in the InferenceEngine
.
See Also
set the junction tree of an InferenceEngine
.
Description
Set the adjacency matrix of the junction tree computed for a network.
Usage
junction.tree(x) <- value
## S4 replacement method for signature 'InferenceEngine'
junction.tree(x) <- value
Arguments
x |
an |
value |
the junction tree to be inserted in the |
Perform imputation of a data frame using k-NN.
Description
Perform imputation of missing data in a data frame using the k-Nearest Neighbour algorithm. For discrete variables we use the mode, for continuous variables the median value is instead taken.
Usage
knn.impute(
data,
k = 10,
cat.var = 1:ncol(data),
to.impute = 1:nrow(data),
using = 1:nrow(data)
)
Arguments
data |
a numerical matrix. |
k |
number of neighbours to be used; for categorical variables the mode of the neighbours is used, for continuous variables the median value is used instead. Default: 10. |
cat.var |
vector containing the indices of the variables to be considered as categorical. Default: all variables. |
to.impute |
vector indicating which rows of the dataset are to be imputed. Default: impute all rows. |
using |
vector indicating which rows of the dataset are to be used to search for neighbours. Default: use all rows. |
Value
imputed matrix.
return the layering of the nodes.
Description
Compute the topological ordering of the nodes of a network, in order to divide the network in layers.
Usage
layering(x)
## S4 method for signature 'BN'
layering(x)
Arguments
x |
a |
Value
a vector containing layers the nodes can be divided into.
Examples
## Not run:
dataset <- BNDataset("file.header", "file.data")
x <- BN(dataset)
x <- learn.network(x, dataset)
layering(x)
## End(Not run)
learn a dynamic network (structure and parameters) of a BN from a BNDataset.
Description
Learn a dynamic network (structure and parameters) of a BN from a BNDataset (see the Details
section).
This method is a wrapper for learn.network
to simplify the learning of a dynamic network.
It provides an automated generation of the layering
required to represent the set of time constraints
encoded in a dynamic network. In this function, it is assumed that the dataset contains the observations for each variable
in all the time steps:
V_1^{t_1}, V_2^{t_1}, V_n^{t_1}, V_1^{t_2}, ... , V_n^{t_k}
.
Variables in time step j
can be parents for any variable in time steps k>=j
, but not for variables i<j
.
If a layering is provided for a time step, it is valid in each time step, and not throughout the whole dynamic network;
a global layering can however be provided.
Usage
learn.dynamic.network(x, ...)
## S4 method for signature 'BN'
learn.dynamic.network(
x,
y = NULL,
num.time.steps = num.time.steps(y),
algo = "mmhc",
scoring.func = "BDeu",
initial.network = NULL,
alpha = 0.05,
ess = 1,
bootstrap = FALSE,
layering = c(),
max.fanin = num.variables(y) - 1,
max.fanin.layers = NULL,
max.parents = num.variables(y) - 1,
max.parents.layers = NULL,
layer.struct = NULL,
cont.nodes = c(),
use.imputed.data = FALSE,
use.cpc = TRUE,
mandatory.edges = NULL,
...
)
## S4 method for signature 'BNDataset'
learn.dynamic.network(
x,
num.time.steps = num.time.steps(x),
algo = "mmhc",
scoring.func = "BDeu",
initial.network = NULL,
alpha = 0.05,
ess = 1,
bootstrap = FALSE,
layering = c(),
max.fanin = num.variables(x) - 1,
max.fanin.layers = NULL,
max.parents = num.variables(x) - 1,
max.parents.layers = NULL,
layer.struct = NULL,
cont.nodes = c(),
use.imputed.data = FALSE,
use.cpc = TRUE,
mandatory.edges = NULL,
...
)
Arguments
x |
can be a |
... |
potential further arguments for methods. |
y |
|
num.time.steps |
the number of time steps to be represented in the dynamic BN. |
algo |
the algorithm to use. Currently, one among |
scoring.func |
the scoring function to use. Currently, one among
|
initial.network |
network structure to be used as starting point for structure search.
Can take different values:
a |
alpha |
confidence threshold (only for |
ess |
Equivalent Sample Size value. |
bootstrap |
|
layering |
vector containing the layers each node belongs to. |
max.fanin |
maximum number of parents for each node (only for |
max.fanin.layers |
matrix of available parents in each layer (only for |
max.parents |
maximum number of parents for each node (for |
max.parents.layers |
matrix of available parents in each layer (only for |
layer.struct |
|
cont.nodes |
vector containing the index of continuous variables. |
use.imputed.data |
|
use.cpc |
(when using |
mandatory.edges |
binary matrix, where a |
Details
The other parameters available are the ones of learn.network
, refer to the documentation of that function
for more details. See also the documentation for learn.structure
and learn.params
for more informations.
Value
new BN
object with structure (DAG) and conditional probabilities
as learnt from the given dataset.
See Also
learn.network learn.structure learn.params
Examples
## Not run:
mydataset <- BNDataset("data.file", "header.file")
net <- learn.dynamic.network(mydataset, num.time.steps=2)
## End(Not run)
learn a network (structure and parameters) of a BN from a BNDataset.
Description
Learn a network (structure and parameters) of a BN from a BNDataset (see the Details
section).
Usage
learn.network(x, ...)
## S4 method for signature 'BN'
learn.network(
x,
y = NULL,
algo = "mmhc",
scoring.func = "BDeu",
initial.network = NULL,
alpha = 0.05,
ess = 1,
bootstrap = FALSE,
layering = c(),
max.fanin = num.variables(y) - 1,
max.fanin.layers = NULL,
max.parents = num.variables(y) - 1,
max.parents.layers = NULL,
layer.struct = NULL,
cont.nodes = c(),
use.imputed.data = FALSE,
use.cpc = TRUE,
mandatory.edges = NULL,
...
)
## S4 method for signature 'BNDataset'
learn.network(
x,
algo = "mmhc",
scoring.func = "BDeu",
initial.network = NULL,
alpha = 0.05,
ess = 1,
bootstrap = FALSE,
layering = c(),
max.fanin = num.variables(x) - 1,
max.fanin.layers = NULL,
max.parents = num.variables(x) - 1,
max.parents.layers = NULL,
layer.struct = NULL,
cont.nodes = c(),
use.imputed.data = FALSE,
use.cpc = TRUE,
mandatory.edges = NULL,
...
)
Arguments
x |
can be a |
... |
potential further arguments for methods. |
y |
|
algo |
the algorithm to use. Currently, one among |
scoring.func |
the scoring function to use. Currently, one among
|
initial.network |
network structure to be used as starting point for structure search.
Can take different values:
a |
alpha |
confidence threshold (only for |
ess |
Equivalent Sample Size value. |
bootstrap |
|
layering |
vector containing the layers each node belongs to. |
max.fanin |
maximum number of parents for each node (only for |
max.fanin.layers |
matrix of available parents in each layer (only for |
max.parents |
maximum number of parents for each node (for |
max.parents.layers |
matrix of available parents in each layer (only for |
layer.struct |
|
cont.nodes |
vector containing the index of continuous variables. |
use.imputed.data |
|
use.cpc |
(when using |
mandatory.edges |
binary matrix, where a |
Details
Learn the structure (the directed acyclic graph) of a BN
object according to a BNDataset
.
We provide five algorithms for learning the structure of the network, that can be chosen with the algo
parameter.
The first one is the Silander-Myllym\"aki (sm
)
exact search-and-score algorithm, that performs a complete evaluation of the search space in order to discover
the best network; this algorithm may take a very long time, and can be inapplicable when discovering networks
with more than 25–30 nodes. Even for small networks, users are strongly encouraged to provide
meaningful parameters such as the layering of the nodes, or the maximum number of parents – refer to the
documentation in package manual for more details on the method parameters.
The second method is the constraint-based Max-Min Parents-and-Children (mmpc
), that returns the skeleton of the network.
Given the possible presence of loops, due to the non-directionality of the edges discovered, no parameter learning
is possible using this algorithm. Also note that in the case of a very dense network and lots of obsevations, the statistical evaluation
of the search space may take a long time. Also for this algorithm there are parameters that may need to be tuned,
mainly the confidence threshold of the statistical pruning. Please refer to the rest of this documentation for their explanation.
The third algorithm is another heuristic, the Hill-Climbing (hc
). It can start from the complete space of possibilities
(default) or from a reduced subset of possible edges, using the cpc
argument.
The fourth algorithm (and the default one) is the Max-Min Hill-Climbing heuristic (mmhc
), that performs a statistical
sieving of the search space followed by a greedy evaluation, by combining the MMPC and the HC algorithms.
It is considerably faster than the complete method, at the cost of a (likely)
lower quality. As for MMPC, the computational time depends on the density of the network, the number of observations and
the tuning of the parameters.
The fifth method is the Structural Expectation-Maximization (sem
) algorithm,
for learning a network from a dataset with missing values. It iterates a sequence of Expectation-Maximization (in order to “fill in”
the holes in the dataset) and structure learning from the guessed dataset, until convergence. The structure learning used inside SEM,
due to computational reasons, is MMHC. Convergence of SEM can be controlled with the parameters struct.threshold
and param.threshold
, for the structure and the parameter convergence, respectively.
Search-and-score methods also need a scoring function to compute an estimated measure of each configuration of nodes.
We provide three of the most popular scoring functions, BDeu
(Bayesian-Dirichlet equivalent uniform, default),
AIC
(Akaike Information Criterion) and BIC
(Bayesian Information Criterion). The scoring function
can be chosen using the scoring.func
parameter.
Structure learning sets the dag
field of the BN
under study, unless bootstrap or the mmpc
algorithm
are employed. In these cases, given the possible presence of loops, the wpdag
field is set.
In case of missing data, the default behaviour (with no other indication from the user)
is to learn the structure using mmhc
starting from the raw dataset, using only the
available cases with no imputation.
In case of learning from a dataset containing observations of a dynamic system, learn.dynamic.network
will be employed.
Then, the parameters of the network are learnt using MAP (Maximum A Posteriori) estimation (when not using bootstrap or mmpc
).
See documentation for learn.structure
and learn.params
for more informations.
Value
new BN
object with structure (DAG) and conditional probabilities
as learnt from the given dataset.
See Also
learn.structure learn.params learn.dynamic.network
Examples
## Not run:
mydataset <- BNDataset("data.file", "header.file")
# starting from a BN
net <- BN(mydataset)
net <- learn.network(net, mydataset)
# start directly from the dataset
net <- learn.network(mydataset)
## End(Not run)
learn the parameters of a BN.
Description
Learn the parameters of a BN object according to a BNDataset using MAP (Maximum A Posteriori) estimation.
Usage
learn.params(bn, dataset, ess = 1, use.imputed.data = F)
## S4 method for signature 'BN,BNDataset'
learn.params(bn, dataset, ess = 1, use.imputed.data = FALSE)
Arguments
bn |
a |
dataset |
a |
ess |
Equivalent Sample Size value. |
use.imputed.data |
use imputed data. |
Details
Parameter learning is not possible in case of networks learnt using the mmpc
algorithm,
or from bootstrap samples, as there may be loops.
Value
new BN
object with conditional probabilities.
See Also
learn.network
Examples
## Not run:
## first create a BN and learn its structure from a dataset
dataset <- BNDataset("file.header", "file.data")
bn <- BN(dataset)
bn <- learn.structure(bn, dataset)
bn <- learn.params(bn, dataset, ess=1)
## End(Not run)
learn the structure of a network.
Description
Learn the structure (the directed acyclic graph) of a BN
object according to a BNDataset
.
Usage
learn.structure(
bn,
dataset,
algo = "mmhc",
scoring.func = "BDeu",
initial.network = NULL,
alpha = 0.05,
ess = 1,
bootstrap = FALSE,
layering = c(),
max.fanin = num.variables(dataset),
max.fanin.layers = NULL,
max.parents = num.variables(dataset),
max.parents.layers = NULL,
layer.struct = NULL,
cont.nodes = c(),
use.imputed.data = FALSE,
use.cpc = TRUE,
mandatory.edges = NULL,
...
)
## S4 method for signature 'BN,BNDataset'
learn.structure(
bn,
dataset,
algo = "mmhc",
scoring.func = "BDeu",
initial.network = NULL,
alpha = 0.05,
ess = 1,
bootstrap = FALSE,
layering = c(),
max.fanin = num.variables(dataset) - 1,
max.fanin.layers = NULL,
max.parents = num.variables(dataset) - 1,
max.parents.layers = NULL,
layer.struct = NULL,
cont.nodes = c(),
use.imputed.data = FALSE,
use.cpc = TRUE,
mandatory.edges = NULL,
...
)
Arguments
bn |
a |
dataset |
a |
algo |
the algorithm to use. Currently, one among |
scoring.func |
the scoring function to use. Currently, one among |
initial.network |
network srtructure to be used as starting point for structure search.
Can take different values:
a |
alpha |
confidence threshold (only for |
ess |
Equivalent Sample Size value. |
bootstrap |
|
layering |
vector containing the layers each node belongs to (only for |
max.fanin |
maximum number of parents for each node (only for |
max.fanin.layers |
matrix of available parents in each layer (only for |
max.parents |
maximum number of parents for each node (for |
max.parents.layers |
matrix of available parents in each layer (only for |
layer.struct |
|
cont.nodes |
vector containing the index of continuous variables. |
use.imputed.data |
|
use.cpc |
(when using |
mandatory.edges |
binary matrix, where a |
... |
potential further arguments for method. |
Details
We provide three algorithms in order to learn the structure of the network, that can be chosen with the algo
parameter.
The first is the Silander-Myllym\"aki (sm
)
exact search-and-score algorithm, that performs a complete evaluation of the search space in order to discover
the best network; this algorithm may take a very long time, and can be inapplicable when discovering networks
with more than 25–30 nodes. Even for small networks, users are strongly encouraged to provide
meaningful parameters such as the layering of the nodes, or the maximum number of parents – refer to the
documentation in package manual for more details on the method parameters.
The second method is the constraint-based Max-Min Parents-and-Children (mmpc
), that returns the skeleton of the network.
Given the possible presence of loops, due to the non-directionality of the edges discovered, no parameter learning
is possible using this algorithm. Also note that in the case of a very dense network and lots of obsevations, the statistical evaluation
of the search space may take a long time. Also for this algorithm there are parameters that may need to be tuned,
mainly the confidence threshold of the statistical pruning. Please refer to the rest of this documentation for their explanation.
The third algorithm is another heuristic, the Hill-Climbing (hc
). It can start from the complete space of possibilities
(default) or from a reduced subset of possible edges, using the cpc
argument.
The fourth algorithm (and the default one) is the Max-Min Hill-Climbing heuristic (mmhc
), that performs a statistical
sieving of the search space followed by a greedy evaluation, by combining the MMPC and the HC algorithms.
It is considerably faster than the complete method, at the cost of a (likely)
lower quality. As for MMPC, the computational time depends on the density of the network, the number of observations and
the tuning of the parameters.
The fifth method is the Structural Expectation-Maximization (sem
) algorithm,
for learning a network from a dataset with missing values. It iterates a sequence of Expectation-Maximization (in order to “fill in”
the holes in the dataset) and structure learning from the guessed dataset, until convergence. The structure learning used inside SEM,
due to computational reasons, is MMHC. Convergence of SEM can be controlled with the parameters struct.threshold
and param.threshold
, for the structure and the parameter convergence, respectively.
for learning a network from a dataset with missing values. It iterates a sequence of Expectation-Maximization (in order to “fill in”
the holes in the dataset) and structure learning from the guessed dataset, until convergence. The structure learning used inside SEM,
due to computational reasons, is MMHC. Convergence of SEM can be controlled with the parameters struct.threshold
and param.threshold
, for the structure and the parameter convergence, respectively.
Search-and-score methods also need a scoring function to compute an estimated measure of each configuration of nodes.
We provide three of the most popular scoring functions, BDeu
(Bayesian-Dirichlet equivalent uniform, default),
AIC
(Akaike Information Criterion) and BIC
(Bayesian Information Criterion). The scoring function
can be chosen using the scoring.func
parameter.
Structure learning sets the dag
field of the BN
under study, unless bootstrap or the mmpc
algorithm
are employed. In these cases, given the possible presence of loops, the wpdag
field is set.
In case of missing data, the default behaviour (with no other indication from the user)
is to learn the structure using mmhc
starting from the raw dataset.
Value
new BN
object with DAG.
See Also
learn.network learn.dynamic.network
Examples
## Not run:
dataset <- BNDataset("file.header", "file.data")
bn <- BN(dataset)
# use MMHC
bn <- learn.structure(bn, dataset, alpha=0.05, ess=1, bootstrap=FALSE)
# now use Silander-Myllymaki
layers <- layering(bn)
mfl <- as.matrix(read.table(header=F,
text='0 1 1 1 1 0 1 1 1 1 0 0 8 7 7 0 0 0 14 6 0 0 0 0 19'))
bn <- learn.structure(bn, dataset, algo='sm', max.fanin=3, cont.nodes=c(),
layering=layers, max.fanin.layers=mfl, use.imputed.data=FALSE)
## End(Not run)
compute the list of inferred marginals of a BN.
Description
Given an InferenceEngine
, it returns a list containing the marginals for the variables
in the network, according to the propagated beliefs.
Usage
marginals(x, ...)
## S4 method for signature 'InferenceEngine'
marginals(x, ...)
Arguments
x |
|
... |
potential further arguments of methods. |
Value
a list containing the marginals of each variable, as probability tables.
Examples
## Not run:
eng <- InferenceEngine(net)
marginals(eng)
## End(Not run)
get name of an object.
Description
Return the name of an object, of class BN
or BNDataset
.
Usage
name(x)
## S4 method for signature 'BN'
name(x)
## S4 method for signature 'BNDataset'
name(x)
Arguments
x |
an object. |
Value
name of the object.
set name of an object.
Description
Set the name
slot of an object of type BN
or BNDataset
.
Usage
name(x) <- value
## S4 replacement method for signature 'BN'
name(x) <- value
## S4 replacement method for signature 'BNDataset'
name(x) <- value
Arguments
x |
an object. |
value |
the new name of the object. |
get size of the variables of an object.
Description
Return a list containing the size of the variables of an object. It is the actual cardinality of discrete variables, and the cardinality of the discretized variable for continuous variables.
Usage
node.sizes(x)
## S4 method for signature 'BN'
node.sizes(x)
## S4 method for signature 'BNDataset'
node.sizes(x)
Arguments
x |
an object. |
Value
vector contaning the size of each variable of the desired object.
set the size of variables of an object.
Description
Set the size of the variables of a BN or BNDataset object. It represents the actual cardinality of discrete variables, and the cardinality of the discretized variable for continuous variables.
Usage
node.sizes(x) <- value
## S4 replacement method for signature 'BN'
node.sizes(x) <- value
## S4 replacement method for signature 'BNDataset'
node.sizes(x) <- value
Arguments
x |
an object. |
value |
vector contaning the size of each variable of the object. |
get number of bootstrap samples of a BNDataset
.
Description
Return the number of bootstrap samples computed from a dataset.
Usage
num.boots(x)
## S4 method for signature 'BNDataset'
num.boots(x)
Arguments
x |
a |
Value
the number of bootstrap samples.
set number of bootstrap samples of a BNDataset
.
Description
Set the length of the list of samples of a dataset computed using bootstrap.
Usage
num.boots(x) <- value
## S4 replacement method for signature 'BNDataset'
num.boots(x) <- value
Arguments
x |
a |
value |
the number of bootstrap samples. |
get number of items of a BNDataset
.
Description
Return the number of items in a dataset, that is, the number of rows in its data slot.
Usage
num.items(x)
## S4 method for signature 'BNDataset'
num.items(x)
Arguments
x |
a |
Value
number of items of the desired dataset.
set number of items of a BNDataset
.
Description
Set the number of observed items (rows) in a dataset.
Usage
num.items(x) <- value
## S4 replacement method for signature 'BNDataset'
num.items(x) <- value
Arguments
x |
a |
value |
number of items of the desired dataset. |
get number of nodes of an object.
Description
Return the name of an object, of class BN
or InferenceEngine
.
Usage
num.nodes(x)
## S4 method for signature 'BN'
num.nodes(x)
## S4 method for signature 'InferenceEngine'
num.nodes(x)
Arguments
x |
an object. |
Value
number of nodes of the desired object.
set number of nodes of an object.
Description
Set the number of nodes of an object of type BN
(number of nodes of the network)
or InferenceEngine
(where parameter contains the number of nodes of the junction tree).
Usage
num.nodes(x) <- value
## S4 replacement method for signature 'BN'
num.nodes(x) <- value
## S4 replacement method for signature 'InferenceEngine'
num.nodes(x) <- value
Arguments
x |
an object. |
value |
the number of nodes in the object. |
get number of time steps observed in a BN
or a BNDataset
.
Description
Return the number of time steps observed in a dataset.
Usage
num.time.steps(x)
## S4 method for signature 'BN'
num.time.steps(x)
## S4 method for signature 'BNDataset'
num.time.steps(x)
Arguments
x |
Value
the number of time steps.
set number of time steps of a BN
or a BNDataset
.
Description
Set the number of time steps of a dataset.
Usage
num.time.steps(x) <- value
## S4 replacement method for signature 'BN'
num.time.steps(x) <- value
## S4 replacement method for signature 'BNDataset'
num.time.steps(x) <- value
Arguments
x |
|
value |
the number of time steps. |
get number of variables of a BNDataset
.
Description
Return the number of the variables contained in a dataset. This value corresponds to the value
of num.nodes
of a network built upon the same dataset.
Usage
num.variables(x)
## S4 method for signature 'BNDataset'
num.variables(x)
## S4 method for signature 'BNDataset'
num.variables(x)
Arguments
x |
a |
Value
number of variables of the desired dataset.
See Also
set number of variables of a BNDataset
.
Description
Set the number of variables observed in a dataset.
Usage
num.variables(x) <- value
## S4 replacement method for signature 'BNDataset'
num.variables(x) <- value
Arguments
x |
a |
value |
number of variables of the dataset. |
get the list of observations of an InferenceEngine
.
Description
Return the list of observations added to an InferenceEngine.
Usage
observations(x)
## S4 method for signature 'InferenceEngine'
observations(x)
Arguments
x |
an |
Details
Output is a list in the following format:
observed.vars
vector of observed variables;
observed.vals
vector of values observed for the variables in
observed.vars
in the corresponding position.
Value
the list of observations of the InferenceEngine
.
set the list of observations of an InferenceEngine
.
Description
Add a list of observations to an InferenceEngine, using a list of observations composed by the two following vectors:
observed.vars
vector of observed variables;
observed.vals
vector of values observed for the variables in
observed.vars
in the corresponding position.
Usage
observations(x) <- value
## S4 replacement method for signature 'InferenceEngine'
observations(x) <- value
Arguments
x |
an |
value |
the list of observations of the |
Details
Replace previous list of observations, if present. In order to add evidence, and not just replace it,
one must use the add.observations<-
method.
In case of multiple observations of the same variable, the last observation is the one used, as the most recent.
See Also
plot a BN
as a picture.
Description
plot a BN
as a picture.
Usage
## S3 method for class 'BN'
plot(
x,
method = "default",
use.node.names = TRUE,
frac = 0.2,
max.weight = max(dag(x)),
node.size.lab = 14,
node.col = rep("white", num.nodes(x)),
plot.wpdag = FALSE,
...
)
Arguments
x |
a |
method |
either |
use.node.names |
|
frac |
minimum fraction [0,1] of presence of an edge to be plotted (used in case of |
max.weight |
maximum possible weight of an edge (used in case of |
node.size.lab |
font size for the node labels in the default mode. |
node.col |
list of ( |
plot.wpdag |
if |
... |
potential further arguments when using |
print a BN
, BNDataset
or InferenceEngine
to stdout
.
Description
print a BN
, BNDataset
or InferenceEngine
to stdout
.
Usage
## S3 method for class 'BN'
print(x, ...)
## S3 method for class 'BNDataset'
print(x, show.raw.data = FALSE, show.imputed.data = FALSE, ...)
## S3 method for class 'InferenceEngine'
print(x, engine = "jt", ...)
Arguments
x |
a |
... |
potential other arguments. |
show.raw.data |
if |
show.imputed.data |
if |
engine |
if |
get the list of quantiles of an object.
Description
Return the list of quantiles of a BN
or a BNDataset
. It is set when a discretization needs to be performed.
Usage
quantiles(x)
## S4 method for signature 'BN'
quantiles(x)
## S4 method for signature 'BNDataset'
quantiles(x)
Arguments
x |
a list of vectors. |
Details
Output is a list of num.nodes
vectors, one per variable. Each vector is NULL
if the corresponding variable is discrete in the original dataset, and contains the cut points for the quantiles
if the corresponding variable is continuous.
Value
the list of quantiles of the BN
of BNDataset
.
set the list of quantiles of an object.
Description
Set the list of quantiles of a BN
or a BNDataset
.
Usage
quantiles(x) <- value
## S4 replacement method for signature 'BN'
quantiles(x) <- value
## S4 replacement method for signature 'BNDataset'
quantiles(x) <- value
Arguments
x |
|
value |
a list of vectors. |
Details
It is used when a discretization needs to be performed.
See Also
get raw data of a BNDataset.
Description
Return raw data contained in a BNDataset
object, if any.
Usage
raw.data(x)
## S4 method for signature 'BNDataset'
raw.data(x)
Arguments
x |
a |
See Also
has.raw.data
, has.imputed.data
add raw data.
Description
Insert raw data in a BNDataset
object.
Usage
raw.data(x) <- value
## S4 replacement method for signature 'BNDataset'
raw.data(x) <- value
Arguments
x |
a |
value |
a matrix of integers containing a dataset. |
See Also
has.raw.data
, raw.data
, read.dataset
Read a network from a .bif
file.
Description
Read a network described in a .bif
-formatted file, and
build a BN
object.
Usage
read.bif(x)
## S4 method for signature 'character'
read.bif(x)
Arguments
x |
the |
Details
The method relies on a coherent ordering of variable values and parameters in the file.
Value
a BN
object.
Read a dataset from file.
Description
There are two ways to build a BNDataset: using two files containing respectively header informations and data, and manually providing the data table and the related header informations (variable names, cardinality and discreteness).
Usage
read.dataset(
object,
data.file,
header.file,
data.with.header = FALSE,
na.string.symbol = "?",
sep.symbol = "",
starts.from = 1,
num.time.steps = 1
)
## S4 method for signature 'BNDataset,character,character'
read.dataset(
object,
data.file,
header.file,
data.with.header = FALSE,
na.string.symbol = "?",
sep.symbol = "",
starts.from = 1,
num.time.steps = 1
)
Arguments
object |
the |
data.file |
the |
header.file |
the |
data.with.header |
|
na.string.symbol |
character that denotes |
sep.symbol |
separator among values in the dataset. |
starts.from |
starting value for entries in the dataset (observed values, default is 1). |
num.time.steps |
number of instants composing the observations (1, unless it is a dynamic system). |
Details
The key informations needed are: 1. the data; 2. the state of variables (discrete or continuous); 3. the names of the variables; 4. the cardinalities of the variables (if discrete), or the number of levels they have to be quantized into (if continuous). Names and cardinalities/leves can be guessed by looking at the data, but it is strongly advised to provide _all_ of the informations, in order to avoid problems later on during the execution.
Data can be provided in form of data.frame or matrix. It can contain NAs. By default, NAs are indicated with '?';
to specify a different character for NAs, it is possible to provide also the na.string.symbol
parameter.
The values contained in the data have to be numeric (real for continuous variables, integer for discrete ones).
The default range of values for a discrete variable X
is [1,|X|]
, with |X|
being
the cardinality of X
. The same applies for the levels of quantization for continuous variables.
If the value ranges for the data are different from the expected ones, it is possible to specify a different
starting value (for the whole dataset) with the starts.from
parameter. E.g. by starts.from=0
we assume that the values of the variables in the dataset have range [0,|X|-1]
.
Please keep in mind that the internal representation of bnstruct starts from 1,
and the original starting values are then lost.
It is possible to use two files, one for the data and one for the metadata,
instead of providing manually all of the info.
bnstruct requires the data files to be in a format subsequently described.
The actual data has to be in (a text file containing data in) tabular format, one tuple per row,
with the values for each variable separated by a space or a tab. Values for each variable have to be
numbers, starting from 1
in case of discrete variables.
Data files can have a first row containing the names of the corresponding variables.
In addition to the data file, a header file containing additional informations can also be provided.
An header file has to be composed by three rows of tab-delimited values:
1. list of names of the variables, in the same order of the data file;
2. a list of integers representing the cardinality of the variables, in case of discrete variables,
or the number of levels each variable has to be quantized in, in case of continuous variables;
3. a list that indicates, for each variable, if the variable is continuous
(c
or C
), and thus has to be quantized before learning,
or discrete (d
or D
).
See Also
BNDataset
Examples
## Not run:
dataset <- BNDataset()
dataset <- read.dataset(dataset, "file.data", "file.header")
## End(Not run)
Read a network from a .dsc
file.
Description
Read a network described in a .dsc
-formatted file, and
build a BN
object.
Usage
read.dsc(x)
## S4 method for signature 'character'
read.dsc(x)
Arguments
x |
the |
Details
The method relies on a coherent ordering of variable values and parameters in the file.
Value
a BN
object.
Read a network from a .net
file.
Description
Read a network described in a .net
-formatted file, and
build a BN
object.
Usage
read.net(x)
## S4 method for signature 'character'
read.net(x)
Arguments
x |
the |
Details
The method relies on a coherent ordering of variable values and parameters in the file.
Value
a BN
object.
sample a BNDataset
from a network of an inference engine.
Description
sample a BNDataset
from a network of an inference engine.
Usage
sample.dataset(x, n = 100, mar = 0)
## S4 method for signature 'BN'
sample.dataset(x, n = 100, mar = 0)
## S4 method for signature 'InferenceEngine'
sample.dataset(x, n = 100)
Arguments
x |
a |
n |
number of items to sample. |
mar |
fraction [0,1] of missing values in the sampled dataset (missing at random), default value is 0 (no missing values). |
Value
sample a row vector of values for a network.
Description
sample a row vector of values for a network.
Usage
sample.row(x, mar = 0)
## S4 method for signature 'BN'
sample.row(x, mar = 0)
Arguments
x |
a |
mar |
fraction [0,1] of missing values in the sampled vector (missing at random), default value is 0 (no missing values). |
Value
a vector of values.
save a BN
picture as .eps
file.
Description
Save an image of a Bayesian Network as an .eps
file.
Usage
save.to.eps(x, filename, ...)
## S4 method for signature 'BN,character'
save.to.eps(x, filename, ...)
Arguments
x |
a |
filename |
name (with path, if needed) of the file to be created |
... |
parameters for the |
See Also
Examples
## Not run:
save.to.eps(x, "out.eps")
## End(Not run)
Read the scoring function used to learn the structure of a network.
Description
Read the scoring function used in the learn.structure
method.
Outcome is meaningful only if the structure of a network has been learnt.
Usage
scoring.func(x)
## S4 method for signature 'BN'
scoring.func(x)
Arguments
x |
the |
Value
the scoring function used.
Set the scoring function used to learn the structure of a network.
Description
Set the scoring function used in the learn.structure
method.
Usage
scoring.func(x) <- value
## S4 replacement method for signature 'BN'
scoring.func(x) <- value
Arguments
x |
the |
value |
the scoring function used. |
Value
updated BN.
compute the Structural Hamming Distance between two adjacency matrices.
Description
Compute the Structural Hamming Distance between two adjacency matrices, that is,
the distance, in terms of edges, between two network structures. The lower the shd
,
the more similar are the two network structures.
Usage
shd(g1, g2)
Arguments
g1 |
first adjacency matrix. |
g2 |
second adjacency matrix. |
Show method for objects.
Description
The show
method allows to provide a custom aspect for the output that is generated
when the name of an instance is gives as command in an R session.
Usage
show(object)
Arguments
object |
an object. |
Read the algorithm used to learn the structure of a network.
Description
Read the algorithm used in the learn.structure
method.
Outcome is meaningful only if the structure of a network has been learnt.
Usage
struct.algo(x)
## S4 method for signature 'BN'
struct.algo(x)
Arguments
x |
the |
Value
the structure learning algorithm used.
Set the algorithm used to learn the structure of a network.
Description
Set the algorithm used in the learn.structure
method.
Usage
struct.algo(x) <- value
## S4 replacement method for signature 'BN'
struct.algo(x) <- value
Arguments
x |
the |
value |
the scoring function used. |
Value
updated BN.
check if an updated BN
is present in an InferenceEngine
.
Description
Check if an InferenceEngine actually contains an updated network, in order to provide the chance of
a fallback and use the original network if no belief propagation has been performed.
An InferenceEngine
built specifying a set of interventions will contain
an updated BN
with altered structure and no conditional probability tables
(unless they are computed by a belief propagation operation.)
Usage
test.updated.bn(x)
## S4 method for signature 'InferenceEngine'
test.updated.bn(x)
Arguments
x |
an |
Value
TRUE
if an updated network is contained in the InferenceEngine, FALSE
otherwise.
Examples
## Not run:
dataset <- BNDataset("file.header", "file.data")
bn <- BN(dataset)
ie <- InferenceEngine(bn)
test.updated.bn(ie) # FALSE
observations(ie) <- list("observed.vars"=("A","G","X"), "observed.vals"=c(1,2,1))
ie <- belief.propagation(ie)
test.updated.bn(ie) # TRUE
interventions <- list("intervention.vars"=("A","G","X"), "intervention.vals"=c(1,2,1))
ie2 <- InferenceEngine(bn, interventions = interventions)
test.updated.bn(ie2) # TRUE
## End(Not run)
tune the parameter k of the knn algorithm used in imputation.
Description
tune the parameter k of the knn algorithm used in imputation.
Usage
tune.knn.impute(
data,
cat.var = 1:ncol(data),
k.min = 1,
k.max = 20,
frac.miss = 0.1,
n.iter = 20,
seed = 0
)
Arguments
data |
a numerical matrix. |
cat.var |
vector containing the categorical variables |
k.min |
minimum value for k |
k.max |
maximum value for k |
frac.miss |
fraction of missing values to add |
n.iter |
number of iterations for each k |
seed |
random seed |
Value
matrix of error distributions
get the updated BN
object contained in an InferenceEngine
.
Description
Return an updated network contained in an InferenceEngine.
Usage
updated.bn(x)
## S4 method for signature 'InferenceEngine'
updated.bn(x)
Arguments
x |
an |
Value
the updated BN
object contained in an InferenceEngine
.
set the updated BN
object contained in an InferenceEngine
.
Description
Add an updated network to an InferenceEngine.
Usage
updated.bn(x) <- value
## S4 replacement method for signature 'InferenceEngine'
updated.bn(x) <- value
Arguments
x |
an |
value |
the updated |
get variables of an object.
Description
Get the list of variables (with their names) of a BN
or BNDataset
.
Usage
variables(x)
## S4 method for signature 'BN'
variables(x)
## S4 method for signature 'BNDataset'
variables(x)
Arguments
x |
an object. |
Value
vector of the variables names of the desired object.
set variables of an object.
Description
Set the list of variable names in a BN
or BNDataset
object.
Usage
variables(x) <- value
## S4 replacement method for signature 'BN'
variables(x) <- value
## S4 replacement method for signature 'BNDataset'
variables(x) <- value
Arguments
x |
an object. |
value |
vector containing the variable names of the object.
Overwrites |
get the WPDAG of an object.
Description
Return the weighted partially directed acyclic graph of a network, when available (e.g. when bootstrap on dataset is performed).
Usage
wpdag(x)
## S4 method for signature 'BN'
wpdag(x)
Arguments
x |
an object. |
Value
matrix contaning the WPDAG of the object.
set WPDAG of the object.
Description
Set the weighted partially directed acyclic graph of a network (e.g. in case bootstrap on dataset is performed).
Usage
wpdag(x) <- value
## S4 replacement method for signature 'BN'
wpdag(x) <- value
Arguments
x |
an object. |
value |
matrix contaning the WPDAG of the object. |
Initialize a WPDAG from a DAG.
Description
Given a BN
object with a dag
, return a network
with its wpdag
set as the CPDAG computed starting from the dag
.
Usage
wpdag.from.dag(x, layering = NULL, layer.struct = NULL)
## S4 method for signature 'BN'
wpdag.from.dag(x, layering = NULL, layer.struct = NULL)
Arguments
x |
a |
layering |
vector containing the layers each node belongs to. |
layer.struct |
|
Value
a BN
object with an initialized wpdag
.
See Also
Examples
## Not run:
net <- learn.network(dataset, layering=layering, layer.struct=layer.struct)
wp.net <- wpdag.from.dag(net, layering, layer.struct=layer.struct)
## End(Not run)
Write a network saving it in a .dsc
file.
Description
Write a network on disk, saving it in a .dsc
-formatted file.
Usage
write.dsc(x, path = "./")
## S4 method for signature 'BN'
write.dsc(x, path = "./")
Arguments
x |
the |
path |
the relative or absolute path of the directory of the created file. |
Write a network saving it in an XGMML
file.
Description
Write a network on disk, saving it in an XGMML
file,
for importing it in Cytoscape.
Usage
write_xgmml(
x,
path = "./network",
write.wpdag = FALSE,
node.col = rep("white", num.nodes(x)),
frac = 0.2,
max.weight = max(wpdag(x))
)
## S4 method for signature 'BN'
write_xgmml(
x,
path = "./network",
write.wpdag = FALSE,
node.col = rep("white", num.nodes(x)),
frac = 0.2,
max.weight = max(wpdag(x))
)
Arguments
x |
the |
path |
file name with relative or absolute path to be written. |
write.wpdag |
write the weighted PDAG computed using bootstrap samples or the MMPC structure algorithm, instead of the normaldag (default FALSE). |
node.col |
vector of colors for each node of the network (in R colornames). |
frac |
minimum fraction [0,1] of presence of an edge to be plotted (used in case of |
max.weight |
maximum possible weight of an edge (used in case of |