Title: | Dynamic Ensembles for Time Series Forecasting |
Version: | 0.1.0 |
Author: | Vitor Cerqueira [aut, cre], Luis Torgo [ctb], Carlos Soares [ctb] |
Maintainer: | Vitor Cerqueira <cerqueira.vitormanuel@gmail.com> |
Description: | A framework for dynamically combining forecasting models for time series forecasting predictive tasks. It leverages machine learning models from other packages to automatically combine expert advice using metalearning and other state-of-the-art forecasting combination approaches. The predictive methods receive a data matrix as input, representing an embedded time series, and return a predictive ensemble model. The ensemble use generic functions 'predict()' and 'forecast()' to forecast future values of the time series. Moreover, an ensemble can be updated using methods, such as 'update_weights()' or 'update_base_models()'. A complete description of the methods can be found in: Cerqueira, V., Torgo, L., Pinto, F., and Soares, C. "Arbitrated Ensemble for Time Series Forecasting." to appear at: Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Springer International Publishing, 2017; and Cerqueira, V., Torgo, L., and Soares, C.: "Arbitrated Ensemble for Solar Radiation Forecasting." International Work-Conference on Artificial Neural Networks. Springer, 2017 <doi:10.1007/978-3-319-59153-7_62>. |
Imports: | xts, zoo, RcppRoll, methods, ranger, glmnet, earth, kernlab, Cubist, gbm, pls, monmlp, doParallel, foreach, xgboost, softImpute |
Suggests: | testthat |
License: | GPL-2 | GPL-3 [expanded from: GPL (≥ 2)] |
Encoding: | UTF-8 |
LazyData: | true |
RoxygenNote: | 7.1.1 |
URL: | https://github.com/vcerqueira/tsensembler |
NeedsCompilation: | no |
Packaged: | 2020-10-26 09:31:22 UTC; root |
Repository: | CRAN |
Date/Publication: | 2020-10-27 14:00:02 UTC |
Arbitrated Dynamic Ensemble
Description
Arbitrated Dynamic Ensemble (ADE) is an ensemble approach for adaptively combining forecasting models. A metalearning strategy is used that specializes base models across the time series. Each meta-learner is specifically designed to model how apt its base counterpart is to make a prediction for a given test example. This is accomplished by analysing how the error incurred by a given learning model relates to the characteristics of the data. At test time, the base-learners are weighted according to their degree of competence in the input observation, estimated by the predictions of the meta-learners.
Usage
ADE(
form,
data,
specs,
lambda = 50,
omega = 0.5,
select_best = FALSE,
all_models = FALSE,
aggregation = "linear",
sequential_reweight = FALSE,
meta_loss_fun = ae,
meta_model_type = "randomforest",
num_cores = 1
)
quickADE(
form,
data,
specs,
lambda = 50,
omega = 0.5,
select_best = FALSE,
all_models = FALSE,
aggregation = "linear",
sequential_reweight = FALSE,
meta_loss_fun = ae,
meta_model_type = "randomforest",
num_cores = 1
)
Arguments
form |
formula; |
data |
data to train the base models |
specs |
object of class |
lambda |
window size. Number of observations to compute the recent performance of the base models, according to the committee ratio omega. Essentially, the top omega models are selected and weighted at each prediction instance, according to their performance in the last lambda observations. Defaults to 50 according to empirical experiments; |
omega |
committee ratio size. Essentially, the top omega * 100 percent of models are selected and weighted at each prediction instance, according to their performance in the last lambda observations. Defaults to .5 according to empirical experiments; |
select_best |
Logical. If true, at each prediction time, a single base model is picked to make a prediction. The picked model is the one that has the lowest loss prediction from the meta models. Defaults to FALSE; |
all_models |
Logical. If true, at each prediction time,
all base models are picked to make a prediction. The
models are weighted according to their predicted loss
and the |
aggregation |
Type of aggregation used to combine the predictions of the base models. The options are:
|
sequential_reweight |
Besides ensemble heterogeneity we encourage diversity explicitly during the aggregation of the output of experts. This is achieved by taking into account not only predictions of performance produced by the arbiters, but also the correlation among experts in a recent window of observations. |
meta_loss_fun |
Besides |
meta_model_type |
meta model to use – defaults to random forest |
num_cores |
A numeric value to specify the number of cores used to train base and meta models. num_cores = 1 leads to sequential training of models. num_cores > 1 splits the training of the base models across num_cores cores. |
References
Cerqueira, Vitor; Torgo, Luis; Pinto, Fabio; and Soares, Carlos. "Arbitrated Ensemble for Time Series Forecasting" to appear at: Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Springer International Publishing, 2017.
V. Cerqueira, L. Torgo, and C. Soares, “Arbitrated ensemble for solar radiation forecasting,” in International Work-Conference on Artificial Neural Networks. Springer, Cham, 2017, pp. 720–732
See Also
model_specs-class
for setting up the ensemble parameters
for an ADE model;
predict
for the method that predicts new held out observations;
update_weights
for the method used to update the
weights of an ADE model between successive predict or forecast calls;
update_ade_meta
for updating (retraining) the meta models
of an ADE model; update_base_models
for
the updating (retraining) the base models of an ADE ensemble (and respective
weights); ade_hat-class
for the object that results from
predicting with an ADE model; and update_ade
to update an ADE
model, combining functions update_base_models, update_meta_ade, and
update_weights.
Examples
specs <- model_specs(
learner = c("bm_ppr", "bm_glm", "bm_mars"),
learner_pars = list(
bm_glm = list(alpha = c(0, .5, 1)),
bm_svr = list(kernel = c("rbfdot", "polydot"),
C = c(1, 3)),
bm_ppr = list(nterms = 4)
)
)
data("water_consumption")
train <- embed_timeseries(water_consumption, 5)
train <- train[1:300, ] # toy size for checks
model <- ADE(target ~., train, specs)
Arbitrated Dynamic Ensemble
Description
Arbitrated Dynamic Ensemble (ADE) is an ensemble approach for adaptively combining forecasting models. A metalearning strategy is used that specializes base models across the time series. Each meta-learner is specifically designed to model how apt its base counterpart is to make a prediction for a given test example. This is accomplished by analysing how the error incurred by a given learning model relates to the characteristics of the data. At test time, the base-learners are weighted according to their degree of competence in the input observation, estimated by the predictions of the meta-learners.
Slots
base_ensemble
object of class
base_ensemble-class
. It contains the base models used that can be used for predicting new data or forecasting future values;meta_model
a list containing the meta models, one for each base model. The meta-models are random forests;
form
formula;
specs
object of class
model_specs-class
. Contains the parameter setting information for training the base models;lambda
window size. Number of observations to compute the recent performance of the base models, according to the committee ratio omega. Essentially, the top omega models are selected and weighted at each prediction instance, according to their performance in the last lambda observations. Defaults to 50 according to empirical experiments;
omega
committee ratio size. Essentially, the top omega * 100 percent of models are selected and weighted at each prediction instance, according to their performance in the last lambda observations. Defaults to .5 according to empirical experiments;
select_best
Logical. If true, at each prediction time, a single base model is picked to make a prediction. The picked model is the one that has the lowest loss prediction from the meta models. Defaults to FALSE;
all_models
Logical. If true, at each prediction time, all base models are picked to make a prediction. The models are weighted according to their predicted loss and the
aggregation
function. Defaults to FALSE;aggregation
Type of aggregation used to combine the predictions of the base models. The options are:
- softmax
default
- erfc
the complementary Gaussian error function
- linear
a linear scaling
sequential_reweight
Besides ensemble heterogeneity we encourage diversity explicitly during the aggregation of the output of experts. This is achieved by taking into account not only predictions of performance produced by the arbiters, but also the correlation among experts in a recent window of observations.
recent_series
the most recent
lambda
observations.out_of_bag
Out of bag observations used to train arbiters.
meta_model_type
meta model to use – defaults to random forest
References
Cerqueira, Vitor; Torgo, Luis; Pinto, Fabio; and Soares, Carlos. "Arbitrated Ensemble for Time Series Forecasting" to appear at: Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Springer International Publishing, 2017.
V. Cerqueira, L. Torgo, and C. Soares, “Arbitrated ensemble for solar radiation forecasting,” in International Work-Conference on Artificial Neural Networks. Springer, Cham, 2017, pp. 720–732
See Also
model_specs-class
for setting up the ensemble parameters
for an ADE model;
predict
for the method that predicts new held out observations;
update_weights
for the method used to update the
weights of an ADE model between successive predict or forecast calls;
update_ade_meta
for updating (retraining) the meta models
of an ADE model; update_base_models
for
the updating (retraining) the base models of an ADE ensemble (and respective
weights); ade_hat-class
for the object that results from
predicting with an ADE model; and update_ade
to update an ADE
model, combining functions update_base_models, update_meta_ade, and
update_weights.
Examples
specs <- model_specs(
learner = c("bm_ppr", "bm_glm", "bm_mars"),
learner_pars = list(
bm_glm = list(alpha = c(0, .5, 1)),
bm_svr = list(kernel = c("rbfdot", "polydot"),
C = c(1, 3)),
bm_ppr = list(nterms = 4)
)
)
data("water_consumption")
train <- embed_timeseries(water_consumption, 5)
train <- train[1:300, ] # toy size for checks
model <- ADE(target ~., train, specs)
Dynamic Ensemble for Time Series
Description
A Dynamic Ensemble for Time Series (DETS). The DETS ensemble method we present settles on individually pre-trained models which are dynamically combined at run-time to make a prediction. The combination rule is reactive to changes in the environment, rendering an online combined model. The main properties of the ensemble are:
- heterogeneity
Heterogeneous ensembles are those comprised of different types of base learners. By employing models that follow different learning strategies, use different features and/or data observations we expect that individual learners will disagree with each other, introducing a natural diversity into the ensemble that helps in handling different dynamic regimes in a time series forecasting setting;
- responsiveness
We promote greater responsiveness of heterogeneous ensembles in time series tasks by making the aggregation of their members' predictions time-dependent. By tracking the loss of each learner over time, we weigh the predictions of individual learners according to their recent performance using a non-linear function. This strategy may be advantageous for better detecting regime changes and also to quickly adapt the ensemble to new regimes.
Usage
DETS(
form,
data,
specs,
lambda = 50,
omega = 0.5,
select_best = FALSE,
num_cores = 1
)
Arguments
form |
formula; |
data |
data frame to train the base models; |
specs |
object of class |
lambda |
window size. Number of observations to compute the recent performance of the base models, according to the committee ratio omega. Essentially, the top omega models are selected and weighted at each prediction instance, according to their performance in the last lambda observations. Defaults to 50 according to empirical experiments; |
omega |
committee ratio size. Essentially, the top omega models are selected and weighted at each prediction instance, according to their performance in the last lambda observations. Defaults to .5 according to empirical experiments; |
select_best |
Logical. If true, at each prediction time, a single base model is picked to make a prediction. The picked model is the one that has the lowest loss prediction from the meta models. Defaults to FALSE; |
num_cores |
A numeric value to specify the number of cores used to train base and meta models. num_cores = 1 leads to sequential training of models. num_cores > 1 splits the training of the base models across num_cores cores. |
References
Cerqueira, Vitor; Torgo, Luis; Oliveira, Mariana, and Bernhard Pfahringer. "Dynamic and Heterogeneous Ensembles for Time Series Forecasting." Data Science and Advanced Analytics (DSAA), 2017 IEEE International Conference on. IEEE, 2017.
See Also
model_specs-class
for setting up the ensemble parameters
for an DETS model;
predict
for the method that predicts new held out observations;
update_weights
for the method used to update the
weights of an DETS model between successive predict or forecast calls;
update_base_models
for the updating (retraining)
the base models of an DETS ensemble (and respective
weights); and dets_hat-class
for the object that results from
predicting with an DETS model.
Examples
specs <- model_specs(
c("bm_ppr", "bm_svr"),
list(bm_ppr = list(nterms = c(2, 4)),
bm_svr = list(kernel = c("vanilladot", "polydot"), C = c(1,5)))
)
data("water_consumption");
train <- embed_timeseries(water_consumption, 5);
model <- DETS(target ~., train, specs, lambda = 30, omega = .2)
Dynamic Ensemble for Time Series
Description
A Dynamic Ensemble for Time Series (DETS). The DETS ensemble method we present settles on individually pre-trained models which are dynamically combined at run-time to make a prediction. The combination rule is reactive to changes in the environment, rendering an online combined model. The main properties of the ensemble are:
- heterogeneity
Heterogeneous ensembles are those comprised of different types of base learners. By employing models that follow different learning strategies, use different features and/or data observations we expect that individual learners will disagree with each other, introducing a natural diversity into the ensemble that helps in handling different dynamic regimes in a time series forecasting setting;
- responsiveness
We promote greater responsiveness of heterogeneous ensembles in time series tasks by making the aggregation of their members' predictions time-dependent. By tracking the loss of each learner over time, we weigh the predictions of individual learners according to their recent performance using a non-linear function. This strategy may be advantageous for better detecting regime changes and also to quickly adapt the ensemble to new regimes.
Slots
base_ensemble
object of class
base_ensemble-class
. It contains the base models used that can be used for predicting new data or forecasting future values;form
formula;
specs
object of class
model_specs-class
. Contains the parameter setting information for training the base models;lambda
window size. Number of observations to compute the recent performance of the base models, according to the committee ratio omega. Essentially, the top omega models are selected and weighted at each prediction instance, according to their performance in the last lambda observations. Defaults to 50 according to empirical experiments;
omega
committee ratio size. Essentially, the top omega models are selected and weighted at each prediction instance, according to their performance in the last lambda observations. Defaults to .5 according to empirical experiments;
select_best
Logical. If true, at each prediction time, a single base model is picked to make a prediction. The picked model is the one that has the lowest loss prediction from the meta models. Defaults to FALSE;
recent_series
the most recent
lambda
observations.
References
Cerqueira, Vitor; Torgo, Luis; Oliveira, Mariana, and Bernhard Pfahringer. "Dynamic and Heterogeneous Ensembles for Time Series Forecasting." Data Science and Advanced Analytics (DSAA), 2017 IEEE International Conference on. IEEE, 2017.
See Also
model_specs-class
for setting up the ensemble parameters
for an DETS model;
predict
for the method that predicts new held out observations;
update_weights
for the method used to update the
weights of an DETS model between successive predict or forecast calls;
update_base_models
for the updating (retraining)
the base models of an DETS ensemble (and respective
weights); and dets_hat-class
for the object that results from
predicting with an DETS model.
Examples
specs <- model_specs(
c("bm_ppr", "bm_svr"),
list(bm_ppr = list(nterms = c(2, 4)),
bm_svr = list(kernel = c("vanilladot"), C = c(1,5)))
)
data("water_consumption")
train <- embed_timeseries(water_consumption, 5)[1:500,]
model <- DETS(target ~., train, specs, lambda = 30, omega = .2)
Weighting Base Models by their Moving Average Squared Error
Description
This function computes the weights of the learning models using the Moving Average Squared Error (MASE) function This method provides a simple way to quantify the recent performance of each base learner and adapt the combined model accordingly.
Usage
EMASE(loss, lambda, pre_weights)
Arguments
loss |
Squared error of the models at each test point; |
lambda |
Number of periods to average over when computing MASE; |
pre_weights |
pre-weights of the base models computed in the train set. |
Value
The weights of the models in test time.
See Also
Other weighting base models:
build_committee()
,
get_top_models()
,
model_recent_performance()
,
model_weighting()
,
select_best()
Predictions by an ADE ensemble
Description
Predictions produced by a ADE-class
object.
It contains y_hat, the combined predictions,
Y_hat, the predictions of each base model,
Y_committee, the base models used for prediction
at each time point, and E_hat, the loss predictions
by each meta-model.
Usage
ade_hat(y_hat, Y_hat, Y_committee, E_hat)
Arguments
y_hat |
combined predictions of the ensemble
|
Y_hat |
a matrix containing the predictions made by individual models; |
Y_committee |
a list describing the models selected for predictions at each time point (according to lambda and omega); |
E_hat |
predictions of error of each base model, estimated by their respective meta model associate; |
See Also
ADE-class
for generating an ADE ensemble.
Other ensemble predictions:
ade_hat-class
,
dets_hat-class
,
dets_hat
Predictions by an ADE ensemble
Description
Predictions produced by a ADE-class
object.
It contains y_hat, the combined predictions,
Y_hat, the predictions of each base model,
Y_committee, the base models used for prediction
at each time point, and E_hat, the loss predictions
by each meta-model.
Slots
y_hat
combined predictions of the ensemble
ADE-class
. A numeric vector;Y_hat
a matrix containing the predictions made by individual models;
Y_committee
a list describing the models selected for predictions at each time point (according to lambda and omega);
E_hat
predictions of error of each base model, estimated by their respective meta model associate;
See Also
ADE
for generating an ADE ensemble.
Other ensemble predictions:
ade_hat
,
dets_hat-class
,
dets_hat
Computing the absolute error
Description
Element-wise computation of the absolute error loss function.
Usage
ae(y, y_hat)
Arguments
y |
A numeric vector representing the actual values. |
y_hat |
A numeric vector representing the forecasted values. |
See Also
Other error/performance functions:
mse()
,
se()
base_ensemble
Description
base_ensemble is a S4 class that contains the base models
comprising the ensemble. Besides the base learning algorithms –
base_models
– base_ensemble class contains information
about other meta-data used to compute predictions for new upcoming data.
Usage
base_ensemble(base_models, pre_weights, form, colnames)
Arguments
base_models |
a list comprising the base models; |
pre_weights |
normalized relative weights of the base learners according to their performance on the available data; |
form |
formula; |
colnames |
names of the columns of the data used to train the base_models; |
base_ensemble-class
Description
base_ensemble is a S4 class that contains the base models
comprising the ensemble. Besides the base learning algorithms –
base_models
– base_ensemble class contains information
about other meta-data used to compute predictions for new upcoming data.
Slots
base_models
a list comprising the base models;
pre_weights
Normalized relative weights of the base learners according to their performance on the available data;
form
formula;
colnames
names of the columns of the data used to train the base_models;
N
number of base models;
model_distribution
base learner distribution with respect to the type of learner. That is, the number of Decision Trees, SVMs, etc.
Computing the error of base models
Description
Computing the error of base models
Usage
base_models_loss(Y_hat, Y, lfun = se)
Arguments
Y_hat |
predictions of the base models ("@Y_hat" slot)
from |
Y |
true values from the time series; |
lfun |
loss function to compute. Defaults to |
Get best PLS/PCR model
Description
Get best PLS/PCR model
Usage
best_mvr(obj, form, validation_data)
Arguments
obj |
PLS/PCR model object |
form |
formula |
validation_data |
validation data used for predicting performances of the model by number of principal components |
Prequential Procedure in Blocks
Description
Prequential Procedure in Blocks
Usage
blocked_prequential(x, nfolds, FUN, .rbind = TRUE, ...)
Arguments
x |
data to split into |
nfolds |
number of blocks to split data into; |
FUN |
to apply to train/test; |
.rbind |
logical. If TRUE, the results from FUN are rbinded; |
... |
further parameters to FUN |
See Also
intraining_estimations
function to use as FUN parameter.
Fit Cubist models (M5)
Description
Learning a M5 model from training data Parameter setting can vary in committees and neighbors parameters.
Usage
bm_cubist(form, data, lpars)
Arguments
form |
formula |
data |
training data for building the predictive model |
lpars |
a list containing the learning parameters |
Details
See cubist
for a comprehensive description.
Imports learning procedure from Cubist package.
See Also
other learning models: bm_mars
;
bm_ppr
; bm_gbm
;
bm_glm
; bm_gaussianprocess
;
bm_randomforest
; bm_pls_pcr
;
bm_ffnn
; bm_svr
Other base learning models:
bm_ffnn()
,
bm_gaussianprocess()
,
bm_gbm()
,
bm_glm()
,
bm_mars()
,
bm_pls_pcr()
,
bm_ppr()
,
bm_randomforest()
,
bm_svr()
Fit Feedforward Neural Networks models
Description
Learning a Feedforward Neural Network model from training data.
Usage
bm_ffnn(form, data, lpars)
Arguments
form |
formula |
data |
training data for building the predictive model |
lpars |
a list containing the learning parameters |
Details
Parameter setting can vary in size, maxit, and decay parameters.
See nnet
for a comprehensive description.
Imports learning procedure from nnet package.
See Also
other learning models: bm_mars
;
bm_ppr
; bm_gbm
;
bm_glm
; bm_cubist
;
bm_randomforest
; bm_pls_pcr
;
bm_gaussianprocess
; bm_svr
Other base learning models:
bm_cubist()
,
bm_gaussianprocess()
,
bm_gbm()
,
bm_glm()
,
bm_mars()
,
bm_pls_pcr()
,
bm_ppr()
,
bm_randomforest()
,
bm_svr()
Fit Gaussian Process models
Description
Learning a Gaussian Process model from training
data. Parameter setting can vary in kernel
and tolerance. See gausspr
for a comprehensive description.
Usage
bm_gaussianprocess(form, data, lpars)
Arguments
form |
formula |
data |
training data for building the predictive model |
lpars |
a list containing the learning parameters |
Details
Imports learning procedure from kernlab package.
Value
A list containing Gaussian Processes models
See Also
other learning models: bm_mars
;
bm_ppr
; bm_gbm
;
bm_glm
; bm_cubist
;
bm_randomforest
; bm_pls_pcr
;
bm_ffnn
; bm_svr
Other base learning models:
bm_cubist()
,
bm_ffnn()
,
bm_gbm()
,
bm_glm()
,
bm_mars()
,
bm_pls_pcr()
,
bm_ppr()
,
bm_randomforest()
,
bm_svr()
Fit Generalized Boosted Regression models
Description
Learning a Boosted Tree Model from training data. Parameter setting can vary in interaction.depth, n.trees, and shrinkage parameters.
Usage
bm_gbm(form, data, lpars)
Arguments
form |
formula |
data |
training data for building the predictive model |
lpars |
a list containing the learning parameters |
Details
See gbm
for a comprehensive description.
Imports learning procedure from gbm package.
See Also
other learning models: bm_mars
;
bm_ppr
; bm_gaussianprocess
;
bm_glm
; bm_cubist
;
bm_randomforest
; bm_pls_pcr
;
bm_ffnn
; bm_svr
Other base learning models:
bm_cubist()
,
bm_ffnn()
,
bm_gaussianprocess()
,
bm_glm()
,
bm_mars()
,
bm_pls_pcr()
,
bm_ppr()
,
bm_randomforest()
,
bm_svr()
Fit Generalized Linear Models
Description
Learning a Generalized Linear Model
from training data. Parameter setting
can vary in alpha.
See glmnet
for a comprehensive description.
Usage
bm_glm(form, data, lpars)
Arguments
form |
formula |
data |
training data for building the predictive model |
lpars |
a list containing the learning parameters |
Details
Imports learning procedure from glmnet package.
See Also
other learning models: bm_mars
;
bm_ppr
; bm_gbm
;
bm_gaussianprocess
; bm_cubist
;
bm_randomforest
; bm_pls_pcr
;
bm_ffnn
; bm_svr
Other base learning models:
bm_cubist()
,
bm_ffnn()
,
bm_gaussianprocess()
,
bm_gbm()
,
bm_mars()
,
bm_pls_pcr()
,
bm_ppr()
,
bm_randomforest()
,
bm_svr()
Fit Multivariate Adaptive Regression Splines models
Description
Learning a Multivariate Adaptive Regression Splines model from training data.
Usage
bm_mars(form, data, lpars)
Arguments
form |
formula |
data |
training data for building the predictive model |
lpars |
a list containing the learning parameters |
Details
Parameter setting can vary in nk, degree, and thresh parameters.
See earth
for a comprehensive description.
Imports learning procedure from earth package.
See Also
other learning models: bm_gaussianprocess
;
bm_ppr
; bm_gbm
;
bm_glm
; bm_cubist
;
bm_randomforest
; bm_pls_pcr
;
bm_ffnn
; bm_svr
Other base learning models:
bm_cubist()
,
bm_ffnn()
,
bm_gaussianprocess()
,
bm_gbm()
,
bm_glm()
,
bm_pls_pcr()
,
bm_ppr()
,
bm_randomforest()
,
bm_svr()
Fit PLS/PCR regression models
Description
Learning aPartial Least Squares or Principal Components Regression from training data
Usage
bm_pls_pcr(form, data, lpars)
Arguments
form |
formula |
data |
data to train the model |
lpars |
parameter setting: For this multivariate regression model the main parameter is "method". The available options are "kernelpls", "svdpc", "cppls", "widekernelpls", and "simpls" |
Details
Parameter setting can vary in method
See mvr
for a comprehensive description.
Imports learning procedure from pls package.
See Also
other learning models: bm_mars
;
bm_ppr
; bm_gbm
;
bm_glm
; bm_cubist
;
bm_randomforest
; bm_gaussianprocess
;
bm_ffnn
; bm_svr
Other base learning models:
bm_cubist()
,
bm_ffnn()
,
bm_gaussianprocess()
,
bm_gbm()
,
bm_glm()
,
bm_mars()
,
bm_ppr()
,
bm_randomforest()
,
bm_svr()
Fit Projection Pursuit Regression models
Description
Learning a Projection Pursuit Regression
model from training data. Parameter setting
can vary in nterms and sm.method
parameters. See ppr
for a comprehensive description.
Usage
bm_ppr(form, data, lpars)
Arguments
form |
formula |
data |
training data for building the predictive model |
lpars |
a list containing the learning parameters |
Details
Imports learning procedure from stats package.
See Also
other learning models: bm_mars
;
bm_gaussianprocess
; bm_gbm
;
bm_glm
; bm_cubist
;
bm_randomforest
; bm_pls_pcr
;
bm_ffnn
; bm_svr
Other base learning models:
bm_cubist()
,
bm_ffnn()
,
bm_gaussianprocess()
,
bm_gbm()
,
bm_glm()
,
bm_mars()
,
bm_pls_pcr()
,
bm_randomforest()
,
bm_svr()
Fit Random Forest models
Description
Learning a Random Forest Model from training data. Parameter setting can vary in num.trees and mtry parameters.
Usage
bm_randomforest(form, data, lpars)
Arguments
form |
formula |
data |
training data for building the predictive model |
lpars |
a list containing the learning parameters |
Details
See ranger
for a comprehensive description.
Imports learning procedure from ranger package.
See Also
other learning models: bm_mars
;
bm_ppr
; bm_gbm
;
bm_glm
; bm_cubist
;
bm_gaussianprocess
; bm_pls_pcr
;
bm_ffnn
; bm_svr
Other base learning models:
bm_cubist()
,
bm_ffnn()
,
bm_gaussianprocess()
,
bm_gbm()
,
bm_glm()
,
bm_mars()
,
bm_pls_pcr()
,
bm_ppr()
,
bm_svr()
Fit Support Vector Regression models
Description
Learning a Support Vector Regression model from training data.
Usage
bm_svr(form, data, lpars)
Arguments
form |
formula |
data |
training data for building the predictive model |
lpars |
a list containing the learning parameters |
Details
Parameter setting can vary in kernel, C, and epsilon parameters.
See ksvm
for a comprehensive description.
Imports learning procedure from kernlab package.
See Also
other learning models: bm_mars
;
bm_ppr
; bm_gbm
;
bm_glm
; bm_cubist
;
bm_randomforest
; bm_pls_pcr
;
bm_ffnn
; bm_gaussianprocess
Other base learning models:
bm_cubist()
,
bm_ffnn()
,
bm_gaussianprocess()
,
bm_gbm()
,
bm_glm()
,
bm_mars()
,
bm_pls_pcr()
,
bm_ppr()
,
bm_randomforest()
Base model for XGBoost
Description
Base model for XGBoost
Usage
bm_xgb(form, data, lpars)
Arguments
form |
formula |
data |
Training data |
lpars |
list of parameters–deprecated |
Wrapper for creating an ensemble
Description
Using the parameter specifications from
model_specs-class
, this function trains
a set of regression models.
Usage
build_base_ensemble(form, data, specs, num_cores = 1)
Arguments
form |
formula; |
data |
data.frame for training the predictive models; |
specs |
object of class |
num_cores |
number of cores |
Value
An S4 class with the following slots: base_models, a list containing the trained models; pre_weights, a numeric vector describing the weights of the base models according to their performance in the training data; and colnames, the column names of the data, used for reference.
Examples
data("water_consumption")
dataset <- embed_timeseries(water_consumption, 5)
specs <- model_specs(c("bm_ppr","bm_svr"), NULL)
M <- build_base_ensemble(target ~., dataset, specs, 1)
Building a committee for an ADE model
Description
Building a committee for an ADE model
Usage
build_committee(Y_hat, Y, lambda, omega)
Arguments
Y_hat |
A data.frame containing the predictions of base models; |
Y |
True values of the time interval for which to compute the committee; |
lambda |
Window size. Number of observations to take into account to build the committee; |
omega |
Committee ratio – ratio of models to dynamically weight across the data; |
See Also
Other weighting base models:
EMASE()
,
get_top_models()
,
model_recent_performance()
,
model_weighting()
,
select_best()
Combining the predictions of several models
Description
This function simply applies a weighted average, where the predictions of the base models Y_hat are weighted according to their weights W. If a committee is specified, only models in the committee are weighted.
Usage
combine_predictions(Y_hat, W, committee = NULL)
Arguments
Y_hat |
a data.frame with the predictions of the base models; |
W |
a matrix or data.frame with the weights of the base models; |
committee |
A list containing the ids of the models in the committee. |
Compute the predictions of base models
Description
This function is used to predict new observations using the predictive models comprising an ensemble. It calls on the respective method based on the type of model, and returns the predictions as a list.
Usage
compute_predictions(M, form, data)
Arguments
M |
list of base models; |
form |
formula; |
data |
new data to predict; |
Predictions by an DETS ensemble
Description
Predictions by an DETS ensemble
Usage
dets_hat(y_hat, Y_hat, Y_committee, W)
Arguments
y_hat |
combined predictions of the ensemble
|
Y_hat |
a matrix containing the predictions made by individual models; |
Y_committee |
a list describing the models selected for predictions at each time point (according to lambda and omega); |
W |
a matrix with the weights of the base models at each prediction time. |
Value
Set of results from predicting with a DETS
ensemble
See Also
Other ensemble predictions:
ade_hat-class
,
ade_hat
,
dets_hat-class
Predictions by an DETS ensemble
Description
Predictions by an DETS ensemble
Slots
y_hat
combined predictions of the ensemble
DETS-class
. A numeric vector;Y_hat
a matrix containing the predictions made by individual models;
Y_committee
a list describing the models selected for predictions at each time point (according to lambda and omega);
W
a matrix with the weights of the base models at each prediction time.
See Also
Other ensemble predictions:
ade_hat-class
,
ade_hat
,
dets_hat
Embedding a Time Series
Description
This function embeds a time series into an Euclidean space.
This implementation is based on the function embed
of
stats package and has theoretical backgroung on
reconstruction of attractors (see Takens, 1981).
This shape transformation of the series allows for
the use of any regression tool available to learn
the time series. The assumption is that there are no long-term
dependencies in the data.
Usage
embed_timeseries(timeseries, embedding.dimension)
Arguments
timeseries |
a time series of class \"xts\". |
embedding.dimension |
an integer specifying the embedding dimension. |
Value
An embedded time series
See Also
embed
for the details of the embedding procedure.
Examples
## Not run:
require(xts)
ts <- as.xts(rnorm(100L), order.by = Sys.Date() + rnorm(100L))
embedded.ts <- embed.timeseries(ts, 20L)
## End(Not run)
Get the target from a formula
Description
Get the target from a formula
Usage
get_target(form)
Arguments
form |
formula |
Value
the target variable as character
Extract top learners from their weights
Description
This function extracts the top learners at each test point from a score matrix, according to the committee ratio omega.
Usage
get_top_models(scores, omega)
Arguments
scores |
data frame containing the weights; |
omega |
committee ratio of top base learners |
Value
A list containing the top base models
See Also
Other weighting base models:
EMASE()
,
build_committee()
,
model_recent_performance()
,
model_weighting()
,
select_best()
Get the response values from a data matrix
Description
Given a formula and a data set, get_y
function retrieves
the response values.
Usage
get_y(data, form)
Arguments
data |
data set with the response values; |
form |
formula |
Holdout
Description
Holdout
Usage
holdout(x, beta, FUN, ...)
Arguments
x |
data to split into |
beta |
ratio of observations for training |
FUN |
function to apply to train/test split |
... |
further arguments to FUN |
Out-of-bag loss estimations
Description
A pipeline for retrieving out-of-bag loss estimations
Usage
intraining_estimations(train, test, form, specs, lfun, num_cores)
Arguments
train |
train set from the training set; |
test |
test set from the training set; |
form |
formula; |
specs |
object of class |
lfun |
loss function for metalearning. Defaults to ae – absolute error. |
num_cores |
A numeric value to specify the number of cores used to train base and meta models. num_cores = 1 leads to sequential training of models. num_cores > 1 splits the training of the base models across num_cores cores. |
Value
A list containing two objects:
- mloss
loss of base models in test
- oob
out-of-bag test samples
- Y_hat
predictions by base models
See Also
Other out-of-bag functions:
intraining_predictions()
Out-of-bag predictions
Description
A pipeline for retrieving out-of-bag predictions from the base models
Usage
intraining_predictions(train, test, form, specs)
Arguments
train |
train set from the training set; |
test |
test set from the training set; |
form |
formula; |
specs |
object of class |
See Also
Other out-of-bag functions:
intraining_estimations()
Applying lapply on the rows
Description
Wrapper function used to compute lapply on the rows of a data.frame
Usage
l1apply(obj, FUN, ...)
Arguments
obj |
a data.frame object to apply the function. |
FUN |
function to apply to each row of |
... |
Further parameters to |
Training the base models of an ensemble
Description
This function uses train to build a set of predictive models, according to specs
Usage
learning_base_models(train, form, specs, num_cores)
Arguments
train |
training set to build the predictive models; |
form |
formula; |
specs |
object of class |
num_cores |
A numeric value to specify the number of cores used to train base and meta models. num_cores = 1 leads to sequential training of models. num_cores > 1 splits the training of the base models across num_cores cores. |
Value
A series of predictive models (base_model
), and
the weights of the models computed in the training
data (preweights
).
See Also
Examples
data("water_consumption")
dataset <- embed_timeseries(water_consumption, 5)
specs <- model_specs(c("bm_ppr","bm_svr"), NULL)
M <- build_base_ensemble(target ~., dataset, specs, 1)
Training an arbiter
Description
Training an arbiter
Usage
loss_meta_learn(form, data, meta_model)
Arguments
form |
form |
data |
data |
meta_model |
learning algorithm – either a "randomforest", a "lasso", or a "gaussianprocess". |
Training a RBR arbiter
Description
Training a RBR arbiter
Usage
meta_cubist(form, data)
Arguments
form |
formula |
data |
data |
Arbiter predictions via Cubist
Description
Arbiter predictions via Cubist
Usage
meta_cubist_predict(meta_model, newdata)
Arguments
meta_model |
arbiter – a ranger object |
newdata |
new data to predict |
Training a Gaussian prosadacess arbiter
Description
Training a Gaussian prosadacess arbiter
Usage
meta_ffnn(form, data)
Arguments
form |
form |
data |
data |
Arbiter predictions via linear ssmodel
Description
Arbiter predictions via linear ssmodel
Usage
meta_ffnn_predict(model, newdata)
Arguments
model |
arbiter – a Gaussian process model |
newdata |
new data to predict loss |
Training a Gaussian process arbiter
Description
Training a Gaussian process arbiter
Usage
meta_gp(form, data)
Arguments
form |
form |
data |
data |
Arbiter predictions via linear model
Description
Arbiter predictions via linear model
Usage
meta_gp_predict(model, newdata)
Arguments
model |
arbiter – a Gaussian process model |
newdata |
new data to predict loss |
Training a LASSO arbiter
Description
Training a LASSO arbiter
Usage
meta_lasso(form, data)
Arguments
form |
form |
data |
data |
Arbiter predictions via linear model
Description
Arbiter predictions via linear model
Usage
meta_lasso_predict(meta_model, newdata)
Arguments
meta_model |
arbiter – a glmnet object |
newdata |
new data to predict |
Training a meta_mars process arbiter
Description
Training a meta_mars process arbiter
Usage
meta_mars(form, data)
Arguments
form |
form |
data |
data |
Arbiter predictions via mars model
Description
Arbiter predictions via mars model
Usage
meta_mars_predict(model, newdata)
Arguments
model |
arbiter – a Gaussian process model |
newdata |
new data to predict loss |
Training a pls process arbiter
Description
Training a pls process arbiter
Usage
meta_pls(form, data)
Arguments
form |
form |
data |
data |
Arbiter predictions via pls model
Description
Arbiter predictions via pls model
Usage
meta_pls_predict(model, newdata)
Arguments
model |
arbiter – a Gaussian process model |
newdata |
new data to predict loss |
Training a meta_mars process arbiter
Description
Training a meta_mars process arbiter
Usage
meta_ppr(form, data)
Arguments
form |
form |
data |
data |
Arbiter predictions via ppr model
Description
Arbiter predictions via ppr model
Usage
meta_ppr_predict(model, newdata)
Arguments
model |
arbiter – a Gaussian process model |
newdata |
new data to predict loss |
Predicting loss using arbiter
Description
Predicting loss using arbiter
Usage
meta_predict(model, newdata, meta_model)
Arguments
model |
arbiter model |
newdata |
new data to predict loss |
meta_model |
learning algorithm – either a "randomforest", a "lasso", or a "gaussianprocess". |
Training a random forest arbiter
Description
Training a random forest arbiter
Usage
meta_rf(form, data)
Arguments
form |
formula |
data |
data |
Arbiter predictions via ranger
Description
Arbiter predictions via ranger
Usage
meta_rf_predict(meta_model, newdata)
Arguments
meta_model |
arbiter – a ranger object |
newdata |
new data to predict |
Training a Gaussian process arbiter
Description
Training a Gaussian process arbiter
Usage
meta_svr(form, data)
Arguments
form |
form |
data |
data |
Arbiter predictions via linear model
Description
Arbiter predictions via linear model
Usage
meta_svr_predict(model, newdata)
Arguments
model |
arbiter – a Gaussian process model |
newdata |
new data to predict loss |
Training a xgb arbiter
Description
Training a xgb arbiter
Usage
meta_xgb(form, data)
Arguments
form |
formula |
data |
data |
Arbiter predictions via xgb
Description
Arbiter predictions via xgb
Usage
meta_xgb_predict(meta_model, newdata)
Arguments
meta_model |
arbiter – a ranger object |
newdata |
new data to predict |
Recent performance of models using EMASE
Description
This function computes EMASE, Erfc Moving Average Squared Error, to quantify the recent performance of the base models.
Usage
model_recent_performance(Y_hat, Y, lambda, omega, pre_weights)
Arguments
Y_hat |
A |
Y |
know true values from past data to compare the predictions to; |
lambda |
Window size. Number of periods to average over when computing MASE; |
omega |
Ratio of top models in the committee; |
pre_weights |
The initial weights of the models, computed in the available data during the learning phase; |
Value
A list containing two objects:
- model_scores
The weights of the models in each time point
- top_models
Models in the committee in each time point
See Also
Other weighting base models:
EMASE()
,
build_committee()
,
get_top_models()
,
model_weighting()
,
select_best()
Setup base learning models
Description
This class sets up the base learning models and respective parameters setting to learn the ensemble.
Usage
model_specs(learner, learner_pars = NULL)
Arguments
learner |
character vector with the base learners to be trained. Currently available models are:
|
learner_pars |
a list with parameter setting for the learner. For each model, a inner list should be created with the specified parameters. Check each implementation to see the possible variations of parameters (also examplified below). |
Examples
# A PPR model and a GLM model with default parameters
model_specs(learner = c("bm_ppr", "bm_glm"), learner_pars = NULL)
# A PPR model and a SVR model. The listed parameters are combined
# with a cartesian product.
# With these specifications an ensemble with 6 predictive base
# models will be created. Two PPR models, one with 2 nterms
# and another with 4; and 4 SVR models, combining the kernel
# and C parameters.
specs <- model_specs(
c("bm_ppr", "bm_svr"),
list(bm_ppr = list(nterms = c(2, 4)),
bm_svr = list(kernel = c("vanilladot", "polydot"), C = c(1,5)))
)
# All parameters currently available (parameter values can differ)
model_specs(
learner = c("bm_ppr", "bm_svr", "bm_randomforest",
"bm_gaussianprocess", "bm_cubist", "bm_glm",
"bm_gbm", "bm_pls_pcr", "bm_ffnn", "bm_mars"
),
learner_pars = list(
bm_ppr = list(
nterms = c(2,4),
sm.method = "supsmu"
),
bm_svr = list(
kernel = "rbfdot",
C = c(1,5),
epsilon = .01
),
bm_glm = list(
alpha = c(1, 0)
),
bm_randomforest = list(
num.trees = 500
),
bm_gbm = list(
interaction.depth = 1,
shrinkage = c(.01, .005),
n.trees = c(100)
),
bm_mars = list(
nk = 15,
degree = 3,
thresh = .001
),
bm_ffnn = list(
size = 30,
decay = .01
),
bm_pls_pcr = list(
method = c("kernelpls", "simpls", "cppls")
),
bm_gaussianprocess = list(
kernel = "vanilladot",
tol = .01
),
bm_cubist = list(
committees = 50,
neighbors = 0
)
)
)
Setup base learning models
Description
This class sets up the base learning models and respective parameters setting to learn the ensemble.
Slots
learner
character vector with the base learners to be trained. Currently available models are:
- bm_gaussianprocess
Gaussian Process models, from the kernlab package. See
gausspr
for a complete description and possible parametrization. Seebm_gaussianprocess
for the function implementation.- bm_ppr
Projection Pursuit Regression models, from the stats package. See
ppr
for a complete description and possible parametrization. Seebm_ppr
for the function implementation.- bm_glm
Generalized Linear Models, from the glmnet package. See
glmnet
for a complete description and possible parametrization. Seebm_glm
for the function implementation.- bm_gbm
Generalized Boosted Regression models, from the gbm package. See
gbm
for a complete description and possible parametrization. Seebm_gbm
for the function implementation.- bm_randomforest
Random Forest models, from the ranger package. See
ranger
for a complete description and possible parametrization. Seebm_randomforest
for the function implementation.- bm_cubist
M5 tree models, from the Cubist package. See
cubist
for a complete description and possible parametrization. Seebm_cubist
for the function implementation.- bm_mars
Multivariate Adaptive Regression Splines models, from the earth package. See
earth
for a complete description and possible parametrization. Seebm_mars
for the function implementation.- bm_svr
Support Vector Regression models, from the kernlab package. See
ksvm
for a complete description and possible parametrization. Seebm_svr
for the function implementation.- bm_ffnn
Feedforward Neural Network models, from the nnet package. See
nnet
for a complete description and possible parametrization. Seebm_ffnn
for the function implementation.- bm_pls_pcr
Partial Least Regression and Principal Component Regression models, from the pls package. See
mvr
for a complete description and possible parametrization. Seebm_pls_pcr
for the function implementation.
learner_pars
a list with parameter setting for the learner. For each model, a inner list should be created with the specified parameters.
Check each implementation to see the possible variations of parameters (also examplified below).
Examples
# A PPR model and a GLM model with default parameters
model_specs(learner = c("bm_ppr", "bm_glm"), learner_pars = NULL)
# A PPR model and a SVR model. The listed parameters are combined
# with a cartesian product.
# With these specifications an ensemble with 6 predictive base
# models will be created. Two PPR models, one with 2 nterms
# and another with 4; and 4 SVR models, combining the kernel
# and C parameters.
specs <- model_specs(
c("bm_ppr", "bm_svr"),
list(bm_ppr = list(nterms = c(2, 4)),
bm_svr = list(kernel = c("vanilladot", "polydot"), C = c(1,5)))
)
# All parameters currently available (parameter values can differ)
model_specs(
learner = c("bm_ppr", "bm_svr", "bm_randomforest",
"bm_gaussianprocess", "bm_cubist", "bm_glm",
"bm_gbm", "bm_pls_pcr", "bm_ffnn", "bm_mars"
),
learner_pars = list(
bm_ppr = list(
nterms = c(2,4),
sm.method = "supsmu"
),
bm_svr = list(
kernel = "rbfdot",
C = c(1,5),
epsilon = .01
),
bm_glm = list(
alpha = c(1, 0)
),
bm_randomforest = list(
num.trees = 500
),
bm_gbm = list(
interaction.depth = 1,
shrinkage = c(.01, .005),
n.trees = c(100)
),
bm_mars = list(
nk = 15,
degree = 3,
thresh = .001
),
bm_ffnn = list(
size = 30,
decay = .01
),
bm_pls_pcr = list(
method = c("kernelpls", "simpls", "cppls")
),
bm_gaussianprocess = list(
kernel = "vanilladot",
tol = .01
),
bm_cubist = list(
committees = 50,
neighbors = 0
)
)
)
Model weighting
Description
This is an utility function that takes the raw error of models and scales them into a 0-1 range according to one of three strategies:
Usage
model_weighting(x, trans = "softmax", ...)
Arguments
x |
A object describing the loss of each base model |
trans |
Character value describing the transformation type. The available options are softmax, linear and erfc. The softmax and erfc provide a non-linear transformation where the weights decay exponentially as the relative loss of a given model increases (with respect to all available models). The linear transformation is a simple normalization of values using the max-min method. |
... |
Further arguments to |
Details
- erfc
using the complementary Gaussian error function
- softmax
using a softmax function
- linear
A simple normalization using max-min method
These tranformations culminate into the final weights of the models.
Value
An object describing the weights of models
See Also
Other weighting base models:
EMASE()
,
build_committee()
,
get_top_models()
,
model_recent_performance()
,
select_best()
Computing the mean squared error
Description
Utility function to compute mean squared error (MSE)
Usage
mse(y, y_hat)
Arguments
y |
A numeric vector representing the actual values. |
y_hat |
A numeric vector representing the forecasted values. |
See Also
Other error/performance functions:
ae()
,
se()
Scale a numeric vector using max-min
Description
Utility function used to linearly normalize a numeric vector
Usage
normalize(x)
Arguments
x |
a numeric vector. |
Value
a linearly normalized vector
Examples
normalize(rnorm(4L))
normalize(1:10)
Predicting new observations using an ensemble
Description
Initially, the predictions of the base models are collected. Then, the predictions of the loss to be incurred by the base models E_hat (estimated by their associate meta models) are computed. The weights of the base models are then estimated according to E_hat and the committee of top models. The committee is built according to the lambda and omega parameters. Finally, the predictions are combined according to the weights and the committee setup.
Usage
## S4 method for signature 'ADE'
predict(object, newdata)
## S4 method for signature 'DETS'
predict(object, newdata)
## S4 method for signature 'base_ensemble'
predict(object, newdata)
Arguments
object |
an object of class |
newdata |
new data to predict |
Examples
###### Predicting with an ADE ensemble
specs <- model_specs(
learner = c("bm_glm", "bm_mars"),
learner_pars = NULL
)
data("water_consumption")
dataset <- embed_timeseries(water_consumption, 5)
train <- dataset[1:1000, ]
test <- dataset[1001:1500, ]
model <- ADE(target ~., train, specs)
preds <- predict(model, test)
## Not run:
###### Predicting with a DETS ensemble
specs <- model_specs(
learner = c("bm_svr", "bm_glm", "bm_mars"),
learner_pars = NULL
)
data("water_consumption")
dataset <- embed_timeseries(water_consumption, 5)
train <- dataset[1:700, ]
test <- dataset[701:1000, ]
model <- DETS(target ~., train, specs, lambda = 50, omega = .2)
preds <- predict(model, test)
## End(Not run)
## Not run:
###### Predicting with a base ensemble
model <- ADE(target ~., train, specs)
basepreds <- predict(model@base_ensemble, test)
## End(Not run)
predict method for pls/pcr
Description
predict method for pls/pcr
Usage
predict_pls_pcr(model, newdata)
Arguments
model |
pls/pcr model |
newdata |
new data |
Computing the proportions of a numeric vector
Description
Utility function used to compute the proportion of the values of a vector. The proportion of a value is its ratio relative to the sum of the vector.
Usage
proportion(x)
Arguments
x |
a numeric vector; |
Value
A vector of proportions
Examples
proportion(rnorm(5L))
proportion(1:10)
rbind with do.call syntax
Description
rbind with do.call syntax
Usage
rbind_l(x)
Arguments
x |
object to call |
Get most recent lambda observations
Description
Get most recent lambda observations
Usage
recent_lambda_observations(data, lambda)
Arguments
data |
time series data as data.frame |
lambda |
number of observations to keep |
Computing the root mean squared error
Description
Utility function to compute Root Mean Squared Error (RMSE)
Usage
rmse(y, y_hat)
Arguments
y |
A numeric vector representing the actual values. |
y_hat |
A numeric vector representing the forecasted values. |
Computing the rolling mean of the columns of a matrix
Description
Computing the rolling mean of the columns of a matrix
Usage
roll_mean_matrix(x, lambda)
Arguments
x |
a numeric data.frame; |
lambda |
periods to average over when computing the moving average. |
Computing the squared error
Description
Utility function to compute pointwise squared error (SE)
Usage
se(y, y_hat)
Arguments
y |
A numeric vector representing the actual values. |
y_hat |
A numeric vector representing the forecasted values. |
Value
squared error of forecasted values.
See Also
Other error/performance functions:
ae()
,
mse()
Selecting best model according to weights
Description
This function select the best model from a matrix of data x models. For each row (data point), the model with maximum weight is assigned a weight of 1, while the remaining models are assigned a weight of 0.
Usage
select_best(model_scores)
Arguments
model_scores |
matrix containing the model weights across the observations |
See Also
Other weighting base models:
EMASE()
,
build_committee()
,
get_top_models()
,
model_recent_performance()
,
model_weighting()
Sequential Re-weighting for controlling predictions' redundancy
Description
Besides ensemble heterogeneity we encourage diversity explicitly during the aggregation of the output of experts. This is achieved by taking into account not only predictions of performance produced by the arbiters, but also the correlation among experts in a recent window of observations.
Usage
sequential_reweighting(sliding_similarity, W)
Arguments
sliding_similarity |
list of pairwise similarity values. See
|
W |
weights before re-weighting |
Sliding similarity via Pearson's correlation
Description
Sliding similarity via Pearson's correlation
Usage
sliding_similarity(Y_hat_ext, lambda)
Arguments
Y_hat_ext |
Predictions from the base-learners across the examples. |
lambda |
window size for computing correlations |
Value
a list with a correlation matrix for each prediction point
Soft Imputation
Description
Soft Imputation
Usage
soft.completion(x)
Arguments
x |
data |
Computing the softmax
Description
This function computes the softmax function in a numeric vector
Usage
softmax(x)
Arguments
x |
numeric vector |
Splitting expressions by pattern
Description
This is an utility function that can be used to split expressions.
It is based on strsplit
function.
split_by is the general purpose splitter
split_by_ splits expressions by \"_\"
split_by. splits expressions by a dot
Usage
split_by(expr, split, unlist. = TRUE, ...)
split_by_(expr, ...)
split_by.(expr, ...)
Arguments
expr |
character expression to split; |
split |
expression to split |
unlist. |
Logical. If TRUE, the splitted |
... |
Further parameters to pass to |
Value
a list or vector with a splitted expression
Examples
split_by_("time_series")
split_by.("time.series")
split_by("born2bewild", "2")
Training procedure of for ADE
Description
Base level models are trained according to specs, and meta level models are trained using a blocked prequential procedure in out-of-bag samples from the training data.
Usage
train_ade(form, train, specs, lambda, lfun, meta_model_type, num_cores)
Arguments
form |
formula; |
train |
training data as a data frame; |
specs |
a |
lambda |
window size. Number of observations to compute the recent performance of the base models, according to the committee ratio omega. Essentially, the top omega models are selected and weighted at each prediction instance, according to their performance in the last lambda observations. Defaults to 50 according to empirical experiments; |
lfun |
meta loss function - defaults to ae (absolute error) |
meta_model_type |
algorithm used to train meta models. Defaults to a random forest (using ranger package) |
num_cores |
A numeric value to specify the number of cores used to train base and meta models. num_cores = 1 leads to sequential training of models. num_cores > 1 splits the training of the base models across num_cores cores. |
ADE training poor version Train meta-models in the training data, as opposed to using a validation dataset
Description
Saves times by not computing oob predictions. Testing comp costs are the same.
Usage
train_ade_quick(form, train, specs, lambda, lfun, meta_model_type, num_cores)
Arguments
form |
formula |
train |
training data |
specs |
a |
lambda |
window size. Number of observations to compute the recent performance of the base models, according to the committee ratio omega. Essentially, the top omega models are selected and weighted at each prediction instance, according to their performance in the last lambda observations. Defaults to 50 according to empirical experiments; |
lfun |
meta loss function - defaults to ae (absolute error) |
meta_model_type |
algorithm used to train meta models. Defaults to a random forest (using ranger package) |
num_cores |
A numeric value to specify the number of cores used to train base and meta models. num_cores = 1 leads to sequential training of models. num_cores > 1 splits the training of the base models across num_cores cores. |
Dynamic Ensembles for Time Series Forecasting
Description
This package implements ensemble methods for time series forecasting tasks. Dynamically combining different forecasting models is a common approach to tackle these problems.
Details
The main methods in tsensembler are in ADE-class
and
DETS-class
:
- ADE
Arbitrated Dynamic Ensemble (ADE) is an ensemble approach for dynamically combining forecasting models using a metalearning strategy called arbitrating. A meta model is trained for each base model in the ensemble. Each meta-learner is specifically designed to model the error of its associate across the time series. At forecasting time, the base models are weighted according to their degree of competence in the input observation, estimated by the predictions of the meta models
- DETS
Dynamic Ensemble for Time Series (DETS) is similar to ADE in the sense that it adaptively combines the base models in an ensemble for time series forecasting. DETS follows a more traditional approach for forecaster combination. It pre-trains a set of heterogeneous base models, and at run-time weights them dynamically according to recent performance. Like ADE, the ensemble includes a committee, which dynamically selects a subset of base models that are weighted with a non-linear function
The ensemble methods can be used to predict
new observations
or forecast
future values of a time series. They can also be
updated using generic functions (check see also section).
References
Cerqueira, Vitor; Torgo, Luis; Pinto, Fabio; and Soares, Carlos. "Arbitrated Ensemble for Time Series Forecasting" to appear at: Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Springer International Publishing, 2017.
V. Cerqueira, L. Torgo, and C. Soares, “Arbitrated ensemble for solar radiation forecasting,” in International Work-Conference on Artificial Neural Networks. Springer, 2017, pp. 720–732
Cerqueira, Vitor; Torgo, Luis; Oliveira, Mariana, and Bernhard Pfahringer. "Dynamic and Heterogeneous Ensembles for Time Series Forecasting." Data Science and Advanced Analytics (DSAA), 2017 IEEE International Conference on. IEEE, 2017.
See Also
ADE-class
for setting up an ADE model;
and DETS-class
for setting up an DETS model;
see also update_weights
and update_base_models
to check the generic function for updating the predictive models in
an ensemble.
Examples
## Not run:
data("water_consumption")
# embedding time series into a matrix
dataset <- embed_timeseries(water_consumption, 5)
# splitting data into train/test
train <- dataset[1:1000,]
test <- dataset[1001:1020, ]
# setting up base model parameters
specs <- model_specs(
learner = c("bm_ppr","bm_glm","bm_svr","bm_mars"),
learner_pars = list(
bm_glm = list(alpha = c(0, .5, 1)),
bm_svr = list(kernel = c("rbfdot", "polydot"),
C = c(1,3)),
bm_ppr = list(nterms = 4)
))
# building the ensemble
model <- ADE(target ~., train, specs)
# forecast next value and update base and meta models
# every three points;
# in the other points, only the weights are updated
predictions <- numeric(nrow(test))
for (i in seq_along(predictions)) {
predictions[i] <- predict(model, test[i, ])@y_hat
if (i %% 3 == 0) {
model <-
update_base_models(model,
rbind.data.frame(train, test[seq_len(i), ]))
model <- update_ade_meta(model, rbind.data.frame(train, test[seq_len(i), ]))
}
else
model <- update_weights(model, test[i, ])
}
point_forecast <- forecast(model, h = 5)
# setting up an ensemble of support vector machines
specs2 <-
model_specs(learner = c("bm_svr"),
learner_pars = list(
bm_svr = list(kernel = c("vanilladot", "polydot",
"rbfdot"),
C = c(1,3,6))
))
model <- DETS(target ~., train, specs2)
preds <- predict(model, test)@y_hat
## End(Not run)
Updating an ADE model
Description
update_ade is a generic function that combines
update_base_models
, update_ade_meta
,
and update_weights
.
Usage
update_ade(object, newdata, num_cores = 1)
## S4 method for signature 'ADE'
update_ade(object, newdata, num_cores = 1)
Arguments
object |
a |
newdata |
data used to update the ADE model. This should be
the data used to initially train the models (training set), together
with new observations (for example, validation set). Each model
is retrained using |
num_cores |
A numeric value to specify the number of cores used to train base and meta models. num_cores = 1 leads to sequential training of models. num_cores > 1 splits the training of the base models across num_cores cores. |
See Also
ADE-class
for building an ADE model;
update_weights
for updating the weights of the ensemble (without
retraining the models); update_base_models
for updating the
base models of an ensemble; and update_ade_meta
for
updating the meta-models of an ADE model.
Other updating models:
update_ade_meta()
,
update_weights()
Examples
specs <- model_specs(
learner = c("bm_svr", "bm_glm", "bm_mars"),
learner_pars = NULL
)
data("water_consumption")
dataset <- embed_timeseries(water_consumption, 5)
# toy size for checks
train <- dataset[1:300, ]
validation <- dataset[301:400, ]
test <- dataset[401:500, ]
model <- ADE(target ~., train, specs)
preds_val <- predict(model, validation)
model <- update_ade(model, rbind.data.frame(train, validation))
preds_test <- predict(model, test)
Updating the metalearning layer of an ADE model
Description
The update_ade_meta function uses new information to
update the meta models of an ADE-class
ensemble. As input
it receives a ADE-class
model object class and a new dataset
for updating the weights of the base models in the ensemble.
This new data should have the same structure as the one used to build the
ensemble. Updating the base models of the ensemble is done using the update_base_models
function.
Usage
update_ade_meta(object, newdata, num_cores = 1)
## S4 method for signature 'ADE'
update_ade_meta(object, newdata, num_cores = 1)
Arguments
object |
a |
newdata |
data used to update the meta models. This should be
the data used to initially train the meta-models (training set), together
with new observations (for example, validation set). Each meta model
is retrained using |
num_cores |
A numeric value to specify the number of cores used to train base and meta models. num_cores = 1 leads to sequential training of models. num_cores > 1 splits the training of the base models across num_cores cores. |
See Also
ADE-class
for building an ADE model;
update_weights
for updating the weights of the ensemble (without
retraining the models); and update_base_models
for updating the
base models of an ensemble.
Other updating models:
update_ade()
,
update_weights()
Examples
## Not run:
specs <- model_specs(
learner = c("bm_svr", "bm_glm", "bm_mars"),
learner_pars = NULL
)
data("water_consumption")
dataset <- embed_timeseries(water_consumption, 5)
train <- dataset[1:1000, ]
validation <- dataset[1001:1200, ]
test <- dataset[1201:1500, ]
model <- ADE(target ~., train, specs)
preds_val <- predict(model, validation)
model <- update_ade_meta(model, rbind.data.frame(train, validation))
preds_test <- predict(model, test)
## End(Not run)
Update the base models of an ensemble
Description
This is a generic function for updating the base models comprising an ensemble.
Usage
update_base_models(object, newdata, num_cores = 1)
## S4 method for signature 'ADE'
update_base_models(object, newdata, num_cores = 1)
## S4 method for signature 'DETS'
update_base_models(object, newdata, num_cores = 1)
Arguments
object |
an ensemble object, of class |
newdata |
new data used to update the models. Each base model
is retrained, so |
num_cores |
A numeric value to specify the number of cores used to train base and meta models. num_cores = 1 leads to sequential training of models. num_cores > 1 splits the training of the base models across num_cores cores. |
Details
update_base_models function receives a model object and a new dataset for retraining the base models. This new data should have the same structure as the one used to build the ensemble.
See Also
ADE-class
for the ADE model information, and
DETS-class
for the DETS model information;
update_ade_meta
for updating the meta models of an ADE ensemble.
See update_weights
for the method used to update
the weights of the ensemble. Updating the weights only changes the information
about the recent observations for computing the weights of the base models,
while updating the model uses that information to retrain the models.
Examples
data("water_consumption")
dataset <- embed_timeseries(water_consumption, 5)
# toy size for checks execution time
train <- dataset[1:300,]
test <- dataset[301:305, ]
specs <- model_specs(c("bm_ppr","bm_glm","bm_mars"), NULL)
model <- ADE(target ~., train, specs)
predictions <- numeric(nrow(test))
for (i in seq_along(predictions)) {
predictions[i] <- predict(model, test[i, ])@y_hat
model <-
update_base_models(model,
rbind.data.frame(train, test[seq_len(i), ]))
}
####
specs2 <- model_specs(c("bm_ppr","bm_randomforest","bm_svr"), NULL)
modeldets <- DETS(target ~., train, specs2)
predictions <- numeric(nrow(test))
# predict new data and update models every three points
# in the remaining points, the only the weights are updated
for (i in seq_along(predictions)) {
predictions[i] <- predict(modeldets, test[i, ])@y_hat
if (i %% 3 == 0)
modeldets <-
update_base_models(modeldets,
rbind.data.frame(train, test[seq_len(i), ]))
else
modeldets <- update_weights(modeldets, test[seq_len(i), ])
}
Updating the weights of base models
Description
Update the weights of base models of a ADE-class
or DETS-class
ensemble.
This is accomplished by using computing the loss of the base models
in new recent observations.
Usage
update_weights(object, newdata)
## S4 method for signature 'ADE'
update_weights(object, newdata)
## S4 method for signature 'DETS'
update_weights(object, newdata)
Arguments
object |
a |
newdata |
new data used to update the most recent observations of the time series. At prediction time these observations are used to compute the weights of the base models |
Note
Updating the weights of an ensemble is only necessary between
different calls of the functions predict
or forecast
.
Otherwise, if consecutive know observations are predicted
(e.g. a validation/test set) the updating is automatically done internally.
See Also
update_weights
for the weight updating method
for an ADE
model, and update_weights
for the same method
for a DETS
model
Other updating models:
update_ade_meta()
,
update_ade()
Examples
data("water_consumption")
dataset <- embed_timeseries(water_consumption, 5)
# toy size for checks
train <- dataset[1:300,]
test <- dataset[301:305, ]
specs <- model_specs(c("bm_ppr","bm_glm","bm_mars"), NULL)
## same with model <- DETS(target ~., train, specs)
model <- ADE(target ~., train, specs)
# if consecutive know observations are predicted (e.g. a validation/test set)
# the updating is automatically done internally.
predictions1 <- predict(model, test)@y_hat
# otherwise, the models need to be updated
predictions <- numeric(nrow(test))
# predict new data and update the weights of the model
for (i in seq_along(predictions)) {
predictions[i] <- predict(model, test[i, ])@y_hat
model <- update_weights(model, test[i, ])
}
#all.equal(predictions1, predictions)
Water Consumption in Oporto city (Portugal) area.
Description
A time series of classes xts
and zoo
containing the water consumption
levels a specific delivery point at Oporto town, in
Portugal.
Usage
water_consumption
Format
The time series has 1741 values from Jan, 2012 to Oct, 2016 in a daily granularity.
- consumption
consumption of water, raw value from sensor
Source
XGB optimizer
Description
XGB optimizer
Usage
xgb_optimizer(X, y, gsearch)
Arguments
X |
Covariates |
y |
Target values |
gsearch |
Grid search |
XGBoost predict function
Description
XGBoost predict function
Usage
xgb_predict(model, newdata)
Arguments
model |
Model from bm_xgb |
newdata |
Test data |
asdasd
Description
asdasd
Usage
xgb_predict_(model, newdata)
Arguments
model |
mode |
newdata |
s |