Title: Dynamic Ensembles for Time Series Forecasting
Version: 0.1.0
Author: Vitor Cerqueira [aut, cre], Luis Torgo [ctb], Carlos Soares [ctb]
Maintainer: Vitor Cerqueira <cerqueira.vitormanuel@gmail.com>
Description: A framework for dynamically combining forecasting models for time series forecasting predictive tasks. It leverages machine learning models from other packages to automatically combine expert advice using metalearning and other state-of-the-art forecasting combination approaches. The predictive methods receive a data matrix as input, representing an embedded time series, and return a predictive ensemble model. The ensemble use generic functions 'predict()' and 'forecast()' to forecast future values of the time series. Moreover, an ensemble can be updated using methods, such as 'update_weights()' or 'update_base_models()'. A complete description of the methods can be found in: Cerqueira, V., Torgo, L., Pinto, F., and Soares, C. "Arbitrated Ensemble for Time Series Forecasting." to appear at: Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Springer International Publishing, 2017; and Cerqueira, V., Torgo, L., and Soares, C.: "Arbitrated Ensemble for Solar Radiation Forecasting." International Work-Conference on Artificial Neural Networks. Springer, 2017 <doi:10.1007/978-3-319-59153-7_62>.
Imports: xts, zoo, RcppRoll, methods, ranger, glmnet, earth, kernlab, Cubist, gbm, pls, monmlp, doParallel, foreach, xgboost, softImpute
Suggests: testthat
License: GPL-2 | GPL-3 [expanded from: GPL (≥ 2)]
Encoding: UTF-8
LazyData: true
RoxygenNote: 7.1.1
URL: https://github.com/vcerqueira/tsensembler
NeedsCompilation: no
Packaged: 2020-10-26 09:31:22 UTC; root
Repository: CRAN
Date/Publication: 2020-10-27 14:00:02 UTC

Arbitrated Dynamic Ensemble

Description

Arbitrated Dynamic Ensemble (ADE) is an ensemble approach for adaptively combining forecasting models. A metalearning strategy is used that specializes base models across the time series. Each meta-learner is specifically designed to model how apt its base counterpart is to make a prediction for a given test example. This is accomplished by analysing how the error incurred by a given learning model relates to the characteristics of the data. At test time, the base-learners are weighted according to their degree of competence in the input observation, estimated by the predictions of the meta-learners.

Usage

ADE(
  form,
  data,
  specs,
  lambda = 50,
  omega = 0.5,
  select_best = FALSE,
  all_models = FALSE,
  aggregation = "linear",
  sequential_reweight = FALSE,
  meta_loss_fun = ae,
  meta_model_type = "randomforest",
  num_cores = 1
)

quickADE(
  form,
  data,
  specs,
  lambda = 50,
  omega = 0.5,
  select_best = FALSE,
  all_models = FALSE,
  aggregation = "linear",
  sequential_reweight = FALSE,
  meta_loss_fun = ae,
  meta_model_type = "randomforest",
  num_cores = 1
)

Arguments

form

formula;

data

data to train the base models

specs

object of class model_specs-class. Contains the parameter setting information for training the base models;

lambda

window size. Number of observations to compute the recent performance of the base models, according to the committee ratio omega. Essentially, the top omega models are selected and weighted at each prediction instance, according to their performance in the last lambda observations. Defaults to 50 according to empirical experiments;

omega

committee ratio size. Essentially, the top omega * 100 percent of models are selected and weighted at each prediction instance, according to their performance in the last lambda observations. Defaults to .5 according to empirical experiments;

select_best

Logical. If true, at each prediction time, a single base model is picked to make a prediction. The picked model is the one that has the lowest loss prediction from the meta models. Defaults to FALSE;

all_models

Logical. If true, at each prediction time, all base models are picked to make a prediction. The models are weighted according to their predicted loss and the aggregation function. Defaults to FALSE;

aggregation

Type of aggregation used to combine the predictions of the base models. The options are:

softmax

default

erfc

the complementary Gaussian error function

linear

a linear scaling

sequential_reweight

Besides ensemble heterogeneity we encourage diversity explicitly during the aggregation of the output of experts. This is achieved by taking into account not only predictions of performance produced by the arbiters, but also the correlation among experts in a recent window of observations.

meta_loss_fun

Besides

meta_model_type

meta model to use – defaults to random forest

num_cores

A numeric value to specify the number of cores used to train base and meta models. num_cores = 1 leads to sequential training of models. num_cores > 1 splits the training of the base models across num_cores cores.

References

Cerqueira, Vitor; Torgo, Luis; Pinto, Fabio; and Soares, Carlos. "Arbitrated Ensemble for Time Series Forecasting" to appear at: Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Springer International Publishing, 2017.

V. Cerqueira, L. Torgo, and C. Soares, “Arbitrated ensemble for solar radiation forecasting,” in International Work-Conference on Artificial Neural Networks. Springer, Cham, 2017, pp. 720–732

See Also

model_specs-class for setting up the ensemble parameters for an ADE model; predict for the method that predicts new held out observations; update_weights for the method used to update the weights of an ADE model between successive predict or forecast calls; update_ade_meta for updating (retraining) the meta models of an ADE model; update_base_models for the updating (retraining) the base models of an ADE ensemble (and respective weights); ade_hat-class for the object that results from predicting with an ADE model; and update_ade to update an ADE model, combining functions update_base_models, update_meta_ade, and update_weights.

Examples

specs <- model_specs(
  learner = c("bm_ppr", "bm_glm", "bm_mars"),
  learner_pars = list(
    bm_glm = list(alpha = c(0, .5, 1)),
    bm_svr = list(kernel = c("rbfdot", "polydot"),
                  C = c(1, 3)),
    bm_ppr = list(nterms = 4)
  )
)

data("water_consumption")
train <- embed_timeseries(water_consumption, 5)
train <- train[1:300, ] # toy size for checks

model <- ADE(target ~., train, specs)


Arbitrated Dynamic Ensemble

Description

Arbitrated Dynamic Ensemble (ADE) is an ensemble approach for adaptively combining forecasting models. A metalearning strategy is used that specializes base models across the time series. Each meta-learner is specifically designed to model how apt its base counterpart is to make a prediction for a given test example. This is accomplished by analysing how the error incurred by a given learning model relates to the characteristics of the data. At test time, the base-learners are weighted according to their degree of competence in the input observation, estimated by the predictions of the meta-learners.

Slots

base_ensemble

object of class base_ensemble-class. It contains the base models used that can be used for predicting new data or forecasting future values;

meta_model

a list containing the meta models, one for each base model. The meta-models are random forests;

form

formula;

specs

object of class model_specs-class. Contains the parameter setting information for training the base models;

lambda

window size. Number of observations to compute the recent performance of the base models, according to the committee ratio omega. Essentially, the top omega models are selected and weighted at each prediction instance, according to their performance in the last lambda observations. Defaults to 50 according to empirical experiments;

omega

committee ratio size. Essentially, the top omega * 100 percent of models are selected and weighted at each prediction instance, according to their performance in the last lambda observations. Defaults to .5 according to empirical experiments;

select_best

Logical. If true, at each prediction time, a single base model is picked to make a prediction. The picked model is the one that has the lowest loss prediction from the meta models. Defaults to FALSE;

all_models

Logical. If true, at each prediction time, all base models are picked to make a prediction. The models are weighted according to their predicted loss and the aggregation function. Defaults to FALSE;

aggregation

Type of aggregation used to combine the predictions of the base models. The options are:

softmax

default

erfc

the complementary Gaussian error function

linear

a linear scaling

sequential_reweight

Besides ensemble heterogeneity we encourage diversity explicitly during the aggregation of the output of experts. This is achieved by taking into account not only predictions of performance produced by the arbiters, but also the correlation among experts in a recent window of observations.

recent_series

the most recent lambda observations.

out_of_bag

Out of bag observations used to train arbiters.

meta_model_type

meta model to use – defaults to random forest

References

Cerqueira, Vitor; Torgo, Luis; Pinto, Fabio; and Soares, Carlos. "Arbitrated Ensemble for Time Series Forecasting" to appear at: Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Springer International Publishing, 2017.

V. Cerqueira, L. Torgo, and C. Soares, “Arbitrated ensemble for solar radiation forecasting,” in International Work-Conference on Artificial Neural Networks. Springer, Cham, 2017, pp. 720–732

See Also

model_specs-class for setting up the ensemble parameters for an ADE model; predict for the method that predicts new held out observations; update_weights for the method used to update the weights of an ADE model between successive predict or forecast calls; update_ade_meta for updating (retraining) the meta models of an ADE model; update_base_models for the updating (retraining) the base models of an ADE ensemble (and respective weights); ade_hat-class for the object that results from predicting with an ADE model; and update_ade to update an ADE model, combining functions update_base_models, update_meta_ade, and update_weights.

Examples

specs <- model_specs(
  learner = c("bm_ppr", "bm_glm", "bm_mars"),
  learner_pars = list(
    bm_glm = list(alpha = c(0, .5, 1)),
    bm_svr = list(kernel = c("rbfdot", "polydot"),
                  C = c(1, 3)),
    bm_ppr = list(nterms = 4)
  )
)

data("water_consumption")
train <- embed_timeseries(water_consumption, 5)
train <- train[1:300, ] # toy size for checks

model <- ADE(target ~., train, specs)


Dynamic Ensemble for Time Series

Description

A Dynamic Ensemble for Time Series (DETS). The DETS ensemble method we present settles on individually pre-trained models which are dynamically combined at run-time to make a prediction. The combination rule is reactive to changes in the environment, rendering an online combined model. The main properties of the ensemble are:

heterogeneity

Heterogeneous ensembles are those comprised of different types of base learners. By employing models that follow different learning strategies, use different features and/or data observations we expect that individual learners will disagree with each other, introducing a natural diversity into the ensemble that helps in handling different dynamic regimes in a time series forecasting setting;

responsiveness

We promote greater responsiveness of heterogeneous ensembles in time series tasks by making the aggregation of their members' predictions time-dependent. By tracking the loss of each learner over time, we weigh the predictions of individual learners according to their recent performance using a non-linear function. This strategy may be advantageous for better detecting regime changes and also to quickly adapt the ensemble to new regimes.

Usage

DETS(
  form,
  data,
  specs,
  lambda = 50,
  omega = 0.5,
  select_best = FALSE,
  num_cores = 1
)

Arguments

form

formula;

data

data frame to train the base models;

specs

object of class model_specs-class. Contains the parameter setting information for training the base models;

lambda

window size. Number of observations to compute the recent performance of the base models, according to the committee ratio omega. Essentially, the top omega models are selected and weighted at each prediction instance, according to their performance in the last lambda observations. Defaults to 50 according to empirical experiments;

omega

committee ratio size. Essentially, the top omega models are selected and weighted at each prediction instance, according to their performance in the last lambda observations. Defaults to .5 according to empirical experiments;

select_best

Logical. If true, at each prediction time, a single base model is picked to make a prediction. The picked model is the one that has the lowest loss prediction from the meta models. Defaults to FALSE;

num_cores

A numeric value to specify the number of cores used to train base and meta models. num_cores = 1 leads to sequential training of models. num_cores > 1 splits the training of the base models across num_cores cores.

References

Cerqueira, Vitor; Torgo, Luis; Oliveira, Mariana, and Bernhard Pfahringer. "Dynamic and Heterogeneous Ensembles for Time Series Forecasting." Data Science and Advanced Analytics (DSAA), 2017 IEEE International Conference on. IEEE, 2017.

See Also

model_specs-class for setting up the ensemble parameters for an DETS model; predict for the method that predicts new held out observations; update_weights for the method used to update the weights of an DETS model between successive predict or forecast calls; update_base_models for the updating (retraining) the base models of an DETS ensemble (and respective weights); and dets_hat-class for the object that results from predicting with an DETS model.

Examples

specs <- model_specs(
 c("bm_ppr", "bm_svr"),
 list(bm_ppr = list(nterms = c(2, 4)),
      bm_svr = list(kernel = c("vanilladot", "polydot"), C = c(1,5)))
)

data("water_consumption");
train <- embed_timeseries(water_consumption, 5);

model <- DETS(target ~., train, specs, lambda = 30, omega = .2)


Dynamic Ensemble for Time Series

Description

A Dynamic Ensemble for Time Series (DETS). The DETS ensemble method we present settles on individually pre-trained models which are dynamically combined at run-time to make a prediction. The combination rule is reactive to changes in the environment, rendering an online combined model. The main properties of the ensemble are:

heterogeneity

Heterogeneous ensembles are those comprised of different types of base learners. By employing models that follow different learning strategies, use different features and/or data observations we expect that individual learners will disagree with each other, introducing a natural diversity into the ensemble that helps in handling different dynamic regimes in a time series forecasting setting;

responsiveness

We promote greater responsiveness of heterogeneous ensembles in time series tasks by making the aggregation of their members' predictions time-dependent. By tracking the loss of each learner over time, we weigh the predictions of individual learners according to their recent performance using a non-linear function. This strategy may be advantageous for better detecting regime changes and also to quickly adapt the ensemble to new regimes.

Slots

base_ensemble

object of class base_ensemble-class. It contains the base models used that can be used for predicting new data or forecasting future values;

form

formula;

specs

object of class model_specs-class. Contains the parameter setting information for training the base models;

lambda

window size. Number of observations to compute the recent performance of the base models, according to the committee ratio omega. Essentially, the top omega models are selected and weighted at each prediction instance, according to their performance in the last lambda observations. Defaults to 50 according to empirical experiments;

omega

committee ratio size. Essentially, the top omega models are selected and weighted at each prediction instance, according to their performance in the last lambda observations. Defaults to .5 according to empirical experiments;

select_best

Logical. If true, at each prediction time, a single base model is picked to make a prediction. The picked model is the one that has the lowest loss prediction from the meta models. Defaults to FALSE;

recent_series

the most recent lambda observations.

References

Cerqueira, Vitor; Torgo, Luis; Oliveira, Mariana, and Bernhard Pfahringer. "Dynamic and Heterogeneous Ensembles for Time Series Forecasting." Data Science and Advanced Analytics (DSAA), 2017 IEEE International Conference on. IEEE, 2017.

See Also

model_specs-class for setting up the ensemble parameters for an DETS model; predict for the method that predicts new held out observations; update_weights for the method used to update the weights of an DETS model between successive predict or forecast calls; update_base_models for the updating (retraining) the base models of an DETS ensemble (and respective weights); and dets_hat-class for the object that results from predicting with an DETS model.

Examples

specs <- model_specs(
 c("bm_ppr", "bm_svr"),
 list(bm_ppr = list(nterms = c(2, 4)),
      bm_svr = list(kernel = c("vanilladot"), C = c(1,5)))
)

data("water_consumption")
train <- embed_timeseries(water_consumption, 5)[1:500,]

model <- DETS(target ~., train, specs, lambda = 30, omega = .2)


Weighting Base Models by their Moving Average Squared Error

Description

This function computes the weights of the learning models using the Moving Average Squared Error (MASE) function This method provides a simple way to quantify the recent performance of each base learner and adapt the combined model accordingly.

Usage

EMASE(loss, lambda, pre_weights)

Arguments

loss

Squared error of the models at each test point;

lambda

Number of periods to average over when computing MASE;

pre_weights

pre-weights of the base models computed in the train set.

Value

The weights of the models in test time.

See Also

Other weighting base models: build_committee(), get_top_models(), model_recent_performance(), model_weighting(), select_best()


Predictions by an ADE ensemble

Description

Predictions produced by a ADE-class object. It contains y_hat, the combined predictions, Y_hat, the predictions of each base model, Y_committee, the base models used for prediction at each time point, and E_hat, the loss predictions by each meta-model.

Usage

ade_hat(y_hat, Y_hat, Y_committee, E_hat)

Arguments

y_hat

combined predictions of the ensemble ADE. A numeric vector;

Y_hat

a matrix containing the predictions made by individual models;

Y_committee

a list describing the models selected for predictions at each time point (according to lambda and omega);

E_hat

predictions of error of each base model, estimated by their respective meta model associate;

See Also

ADE-class for generating an ADE ensemble.

Other ensemble predictions: ade_hat-class, dets_hat-class, dets_hat


Predictions by an ADE ensemble

Description

Predictions produced by a ADE-class object. It contains y_hat, the combined predictions, Y_hat, the predictions of each base model, Y_committee, the base models used for prediction at each time point, and E_hat, the loss predictions by each meta-model.

Slots

y_hat

combined predictions of the ensemble ADE-class. A numeric vector;

Y_hat

a matrix containing the predictions made by individual models;

Y_committee

a list describing the models selected for predictions at each time point (according to lambda and omega);

E_hat

predictions of error of each base model, estimated by their respective meta model associate;

See Also

ADE for generating an ADE ensemble.

Other ensemble predictions: ade_hat, dets_hat-class, dets_hat


Computing the absolute error

Description

Element-wise computation of the absolute error loss function.

Usage

ae(y, y_hat)

Arguments

y

A numeric vector representing the actual values.

y_hat

A numeric vector representing the forecasted values.

See Also

Other error/performance functions: mse(), se()


base_ensemble

Description

base_ensemble is a S4 class that contains the base models comprising the ensemble. Besides the base learning algorithms – base_models – base_ensemble class contains information about other meta-data used to compute predictions for new upcoming data.

Usage

base_ensemble(base_models, pre_weights, form, colnames)

Arguments

base_models

a list comprising the base models;

pre_weights

normalized relative weights of the base learners according to their performance on the available data;

form

formula;

colnames

names of the columns of the data used to train the base_models;


base_ensemble-class

Description

base_ensemble is a S4 class that contains the base models comprising the ensemble. Besides the base learning algorithms – base_models – base_ensemble class contains information about other meta-data used to compute predictions for new upcoming data.

Slots

base_models

a list comprising the base models;

pre_weights

Normalized relative weights of the base learners according to their performance on the available data;

form

formula;

colnames

names of the columns of the data used to train the base_models;

N

number of base models;

model_distribution

base learner distribution with respect to the type of learner. That is, the number of Decision Trees, SVMs, etc.


Computing the error of base models

Description

Computing the error of base models

Usage

base_models_loss(Y_hat, Y, lfun = se)

Arguments

Y_hat

predictions of the base models ("@Y_hat" slot) from base_ensemble-class object;

Y

true values from the time series;

lfun

loss function to compute. Defaults to ae, absolute error


Get best PLS/PCR model

Description

Get best PLS/PCR model

Usage

best_mvr(obj, form, validation_data)

Arguments

obj

PLS/PCR model object

form

formula

validation_data

validation data used for predicting performances of the model by number of principal components


Prequential Procedure in Blocks

Description

Prequential Procedure in Blocks

Usage

blocked_prequential(x, nfolds, FUN, .rbind = TRUE, ...)

Arguments

x

data to split into nfolds blocks;

nfolds

number of blocks to split data into;

FUN

to apply to train/test;

.rbind

logical. If TRUE, the results from FUN are rbinded;

...

further parameters to FUN

See Also

intraining_estimations function to use as FUN parameter.


Fit Cubist models (M5)

Description

Learning a M5 model from training data Parameter setting can vary in committees and neighbors parameters.

Usage

bm_cubist(form, data, lpars)

Arguments

form

formula

data

training data for building the predictive model

lpars

a list containing the learning parameters

Details

See cubist for a comprehensive description.

Imports learning procedure from Cubist package.

See Also

other learning models: bm_mars; bm_ppr; bm_gbm; bm_glm; bm_gaussianprocess; bm_randomforest; bm_pls_pcr; bm_ffnn; bm_svr

Other base learning models: bm_ffnn(), bm_gaussianprocess(), bm_gbm(), bm_glm(), bm_mars(), bm_pls_pcr(), bm_ppr(), bm_randomforest(), bm_svr()


Fit Feedforward Neural Networks models

Description

Learning a Feedforward Neural Network model from training data.

Usage

bm_ffnn(form, data, lpars)

Arguments

form

formula

data

training data for building the predictive model

lpars

a list containing the learning parameters

Details

Parameter setting can vary in size, maxit, and decay parameters.

See nnet for a comprehensive description.

Imports learning procedure from nnet package.

See Also

other learning models: bm_mars; bm_ppr; bm_gbm; bm_glm; bm_cubist; bm_randomforest; bm_pls_pcr; bm_gaussianprocess; bm_svr

Other base learning models: bm_cubist(), bm_gaussianprocess(), bm_gbm(), bm_glm(), bm_mars(), bm_pls_pcr(), bm_ppr(), bm_randomforest(), bm_svr()


Fit Gaussian Process models

Description

Learning a Gaussian Process model from training data. Parameter setting can vary in kernel and tolerance. See gausspr for a comprehensive description.

Usage

bm_gaussianprocess(form, data, lpars)

Arguments

form

formula

data

training data for building the predictive model

lpars

a list containing the learning parameters

Details

Imports learning procedure from kernlab package.

Value

A list containing Gaussian Processes models

See Also

other learning models: bm_mars; bm_ppr; bm_gbm; bm_glm; bm_cubist; bm_randomforest; bm_pls_pcr; bm_ffnn; bm_svr

Other base learning models: bm_cubist(), bm_ffnn(), bm_gbm(), bm_glm(), bm_mars(), bm_pls_pcr(), bm_ppr(), bm_randomforest(), bm_svr()


Fit Generalized Boosted Regression models

Description

Learning a Boosted Tree Model from training data. Parameter setting can vary in interaction.depth, n.trees, and shrinkage parameters.

Usage

bm_gbm(form, data, lpars)

Arguments

form

formula

data

training data for building the predictive model

lpars

a list containing the learning parameters

Details

See gbm for a comprehensive description.

Imports learning procedure from gbm package.

See Also

other learning models: bm_mars; bm_ppr; bm_gaussianprocess; bm_glm; bm_cubist; bm_randomforest; bm_pls_pcr; bm_ffnn; bm_svr

Other base learning models: bm_cubist(), bm_ffnn(), bm_gaussianprocess(), bm_glm(), bm_mars(), bm_pls_pcr(), bm_ppr(), bm_randomforest(), bm_svr()


Fit Generalized Linear Models

Description

Learning a Generalized Linear Model from training data. Parameter setting can vary in alpha. See glmnet for a comprehensive description.

Usage

bm_glm(form, data, lpars)

Arguments

form

formula

data

training data for building the predictive model

lpars

a list containing the learning parameters

Details

Imports learning procedure from glmnet package.

See Also

other learning models: bm_mars; bm_ppr; bm_gbm; bm_gaussianprocess; bm_cubist; bm_randomforest; bm_pls_pcr; bm_ffnn; bm_svr

Other base learning models: bm_cubist(), bm_ffnn(), bm_gaussianprocess(), bm_gbm(), bm_mars(), bm_pls_pcr(), bm_ppr(), bm_randomforest(), bm_svr()


Fit Multivariate Adaptive Regression Splines models

Description

Learning a Multivariate Adaptive Regression Splines model from training data.

Usage

bm_mars(form, data, lpars)

Arguments

form

formula

data

training data for building the predictive model

lpars

a list containing the learning parameters

Details

Parameter setting can vary in nk, degree, and thresh parameters.

See earth for a comprehensive description.

Imports learning procedure from earth package.

See Also

other learning models: bm_gaussianprocess; bm_ppr; bm_gbm; bm_glm; bm_cubist; bm_randomforest; bm_pls_pcr; bm_ffnn; bm_svr

Other base learning models: bm_cubist(), bm_ffnn(), bm_gaussianprocess(), bm_gbm(), bm_glm(), bm_pls_pcr(), bm_ppr(), bm_randomforest(), bm_svr()


Fit PLS/PCR regression models

Description

Learning aPartial Least Squares or Principal Components Regression from training data

Usage

bm_pls_pcr(form, data, lpars)

Arguments

form

formula

data

data to train the model

lpars

parameter setting: For this multivariate regression model the main parameter is "method". The available options are "kernelpls", "svdpc", "cppls", "widekernelpls", and "simpls"

Details

Parameter setting can vary in method

See mvr for a comprehensive description.

Imports learning procedure from pls package.

See Also

other learning models: bm_mars; bm_ppr; bm_gbm; bm_glm; bm_cubist; bm_randomforest; bm_gaussianprocess; bm_ffnn; bm_svr

Other base learning models: bm_cubist(), bm_ffnn(), bm_gaussianprocess(), bm_gbm(), bm_glm(), bm_mars(), bm_ppr(), bm_randomforest(), bm_svr()


Fit Projection Pursuit Regression models

Description

Learning a Projection Pursuit Regression model from training data. Parameter setting can vary in nterms and sm.method parameters. See ppr for a comprehensive description.

Usage

bm_ppr(form, data, lpars)

Arguments

form

formula

data

training data for building the predictive model

lpars

a list containing the learning parameters

Details

Imports learning procedure from stats package.

See Also

other learning models: bm_mars; bm_gaussianprocess; bm_gbm; bm_glm; bm_cubist; bm_randomforest; bm_pls_pcr; bm_ffnn; bm_svr

Other base learning models: bm_cubist(), bm_ffnn(), bm_gaussianprocess(), bm_gbm(), bm_glm(), bm_mars(), bm_pls_pcr(), bm_randomforest(), bm_svr()


Fit Random Forest models

Description

Learning a Random Forest Model from training data. Parameter setting can vary in num.trees and mtry parameters.

Usage

bm_randomforest(form, data, lpars)

Arguments

form

formula

data

training data for building the predictive model

lpars

a list containing the learning parameters

Details

See ranger for a comprehensive description.

Imports learning procedure from ranger package.

See Also

other learning models: bm_mars; bm_ppr; bm_gbm; bm_glm; bm_cubist; bm_gaussianprocess; bm_pls_pcr; bm_ffnn; bm_svr

Other base learning models: bm_cubist(), bm_ffnn(), bm_gaussianprocess(), bm_gbm(), bm_glm(), bm_mars(), bm_pls_pcr(), bm_ppr(), bm_svr()


Fit Support Vector Regression models

Description

Learning a Support Vector Regression model from training data.

Usage

bm_svr(form, data, lpars)

Arguments

form

formula

data

training data for building the predictive model

lpars

a list containing the learning parameters

Details

Parameter setting can vary in kernel, C, and epsilon parameters.

See ksvm for a comprehensive description.

Imports learning procedure from kernlab package.

See Also

other learning models: bm_mars; bm_ppr; bm_gbm; bm_glm; bm_cubist; bm_randomforest; bm_pls_pcr; bm_ffnn; bm_gaussianprocess

Other base learning models: bm_cubist(), bm_ffnn(), bm_gaussianprocess(), bm_gbm(), bm_glm(), bm_mars(), bm_pls_pcr(), bm_ppr(), bm_randomforest()


Base model for XGBoost

Description

Base model for XGBoost

Usage

bm_xgb(form, data, lpars)

Arguments

form

formula

data

Training data

lpars

list of parameters–deprecated


Wrapper for creating an ensemble

Description

Using the parameter specifications from model_specs-class, this function trains a set of regression models.

Usage

build_base_ensemble(form, data, specs, num_cores = 1)

Arguments

form

formula;

data

data.frame for training the predictive models;

specs

object of class model_specs-class. Contains the information about the parameter setting of the models to train.

num_cores

number of cores

Value

An S4 class with the following slots: base_models, a list containing the trained models; pre_weights, a numeric vector describing the weights of the base models according to their performance in the training data; and colnames, the column names of the data, used for reference.

Examples

data("water_consumption")
dataset <- embed_timeseries(water_consumption, 5)
specs <- model_specs(c("bm_ppr","bm_svr"), NULL)
M <- build_base_ensemble(target ~., dataset, specs, 1)


Building a committee for an ADE model

Description

Building a committee for an ADE model

Usage

build_committee(Y_hat, Y, lambda, omega)

Arguments

Y_hat

A data.frame containing the predictions of base models;

Y

True values of the time interval for which to compute the committee;

lambda

Window size. Number of observations to take into account to build the committee;

omega

Committee ratio – ratio of models to dynamically weight across the data;

See Also

Other weighting base models: EMASE(), get_top_models(), model_recent_performance(), model_weighting(), select_best()


Combining the predictions of several models

Description

This function simply applies a weighted average, where the predictions of the base models Y_hat are weighted according to their weights W. If a committee is specified, only models in the committee are weighted.

Usage

combine_predictions(Y_hat, W, committee = NULL)

Arguments

Y_hat

a data.frame with the predictions of the base models;

W

a matrix or data.frame with the weights of the base models;

committee

A list containing the ids of the models in the committee.


Compute the predictions of base models

Description

This function is used to predict new observations using the predictive models comprising an ensemble. It calls on the respective method based on the type of model, and returns the predictions as a list.

Usage

compute_predictions(M, form, data)

Arguments

M

list of base models;

form

formula;

data

new data to predict;


Predictions by an DETS ensemble

Description

Predictions by an DETS ensemble

Usage

dets_hat(y_hat, Y_hat, Y_committee, W)

Arguments

y_hat

combined predictions of the ensemble DETS. A numeric vector;

Y_hat

a matrix containing the predictions made by individual models;

Y_committee

a list describing the models selected for predictions at each time point (according to lambda and omega);

W

a matrix with the weights of the base models at each prediction time.

Value

Set of results from predicting with a DETS ensemble

See Also

Other ensemble predictions: ade_hat-class, ade_hat, dets_hat-class


Predictions by an DETS ensemble

Description

Predictions by an DETS ensemble

Slots

y_hat

combined predictions of the ensemble DETS-class. A numeric vector;

Y_hat

a matrix containing the predictions made by individual models;

Y_committee

a list describing the models selected for predictions at each time point (according to lambda and omega);

W

a matrix with the weights of the base models at each prediction time.

See Also

Other ensemble predictions: ade_hat-class, ade_hat, dets_hat


Embedding a Time Series

Description

This function embeds a time series into an Euclidean space. This implementation is based on the function embed of stats package and has theoretical backgroung on reconstruction of attractors (see Takens, 1981). This shape transformation of the series allows for the use of any regression tool available to learn the time series. The assumption is that there are no long-term dependencies in the data.

Usage

embed_timeseries(timeseries, embedding.dimension)

Arguments

timeseries

a time series of class \"xts\".

embedding.dimension

an integer specifying the embedding dimension.

Value

An embedded time series

See Also

embed for the details of the embedding procedure.

Examples

## Not run: 
require(xts)
ts <- as.xts(rnorm(100L), order.by = Sys.Date() + rnorm(100L))
embedded.ts <- embed.timeseries(ts, 20L)

## End(Not run)


Get the target from a formula

Description

Get the target from a formula

Usage

get_target(form)

Arguments

form

formula

Value

the target variable as character


Extract top learners from their weights

Description

This function extracts the top learners at each test point from a score matrix, according to the committee ratio omega.

Usage

get_top_models(scores, omega)

Arguments

scores

data frame containing the weights;

omega

committee ratio of top base learners

Value

A list containing the top base models

See Also

Other weighting base models: EMASE(), build_committee(), model_recent_performance(), model_weighting(), select_best()


Get the response values from a data matrix

Description

Given a formula and a data set, get_y function retrieves the response values.

Usage

get_y(data, form)

Arguments

data

data set with the response values;

form

formula


Holdout

Description

Holdout

Usage

holdout(x, beta, FUN, ...)

Arguments

x

data to split into nfolds blocks;

beta

ratio of observations for training

FUN

function to apply to train/test split

...

further arguments to FUN


Out-of-bag loss estimations

Description

A pipeline for retrieving out-of-bag loss estimations

Usage

intraining_estimations(train, test, form, specs, lfun, num_cores)

Arguments

train

train set from the training set;

test

test set from the training set;

form

formula;

specs

object of class model_specs-class. Contains the specifications of the base models.

lfun

loss function for metalearning. Defaults to ae – absolute error.

num_cores

A numeric value to specify the number of cores used to train base and meta models. num_cores = 1 leads to sequential training of models. num_cores > 1 splits the training of the base models across num_cores cores.

Value

A list containing two objects:

mloss

loss of base models in test

oob

out-of-bag test samples

Y_hat

predictions by base models

See Also

Other out-of-bag functions: intraining_predictions()


Out-of-bag predictions

Description

A pipeline for retrieving out-of-bag predictions from the base models

Usage

intraining_predictions(train, test, form, specs)

Arguments

train

train set from the training set;

test

test set from the training set;

form

formula;

specs

object of class model_specs-class. Contains the specifications of the base models.

See Also

Other out-of-bag functions: intraining_estimations()


Applying lapply on the rows

Description

Wrapper function used to compute lapply on the rows of a data.frame

Usage

l1apply(obj, FUN, ...)

Arguments

obj

a data.frame object to apply the function.

FUN

function to apply to each row of obj

...

Further parameters to lapply


Training the base models of an ensemble

Description

This function uses train to build a set of predictive models, according to specs

Usage

learning_base_models(train, form, specs, num_cores)

Arguments

train

training set to build the predictive models;

form

formula;

specs

object of class model_specs-class

num_cores

A numeric value to specify the number of cores used to train base and meta models. num_cores = 1 leads to sequential training of models. num_cores > 1 splits the training of the base models across num_cores cores.

Value

A series of predictive models (base_model), and the weights of the models computed in the training data (preweights).

See Also

build_base_ensemble.

Examples

data("water_consumption")
dataset <- embed_timeseries(water_consumption, 5)
specs <- model_specs(c("bm_ppr","bm_svr"), NULL)
M <- build_base_ensemble(target ~., dataset, specs, 1)


Training an arbiter

Description

Training an arbiter

Usage

loss_meta_learn(form, data, meta_model)

Arguments

form

form

data

data

meta_model

learning algorithm – either a "randomforest", a "lasso", or a "gaussianprocess".


Training a RBR arbiter

Description

Training a RBR arbiter

Usage

meta_cubist(form, data)

Arguments

form

formula

data

data


Arbiter predictions via Cubist

Description

Arbiter predictions via Cubist

Usage

meta_cubist_predict(meta_model, newdata)

Arguments

meta_model

arbiter – a ranger object

newdata

new data to predict


Training a Gaussian prosadacess arbiter

Description

Training a Gaussian prosadacess arbiter

Usage

meta_ffnn(form, data)

Arguments

form

form

data

data


Arbiter predictions via linear ssmodel

Description

Arbiter predictions via linear ssmodel

Usage

meta_ffnn_predict(model, newdata)

Arguments

model

arbiter – a Gaussian process model

newdata

new data to predict loss


Training a Gaussian process arbiter

Description

Training a Gaussian process arbiter

Usage

meta_gp(form, data)

Arguments

form

form

data

data


Arbiter predictions via linear model

Description

Arbiter predictions via linear model

Usage

meta_gp_predict(model, newdata)

Arguments

model

arbiter – a Gaussian process model

newdata

new data to predict loss


Training a LASSO arbiter

Description

Training a LASSO arbiter

Usage

meta_lasso(form, data)

Arguments

form

form

data

data


Arbiter predictions via linear model

Description

Arbiter predictions via linear model

Usage

meta_lasso_predict(meta_model, newdata)

Arguments

meta_model

arbiter – a glmnet object

newdata

new data to predict


Training a meta_mars process arbiter

Description

Training a meta_mars process arbiter

Usage

meta_mars(form, data)

Arguments

form

form

data

data


Arbiter predictions via mars model

Description

Arbiter predictions via mars model

Usage

meta_mars_predict(model, newdata)

Arguments

model

arbiter – a Gaussian process model

newdata

new data to predict loss


Training a pls process arbiter

Description

Training a pls process arbiter

Usage

meta_pls(form, data)

Arguments

form

form

data

data


Arbiter predictions via pls model

Description

Arbiter predictions via pls model

Usage

meta_pls_predict(model, newdata)

Arguments

model

arbiter – a Gaussian process model

newdata

new data to predict loss


Training a meta_mars process arbiter

Description

Training a meta_mars process arbiter

Usage

meta_ppr(form, data)

Arguments

form

form

data

data


Arbiter predictions via ppr model

Description

Arbiter predictions via ppr model

Usage

meta_ppr_predict(model, newdata)

Arguments

model

arbiter – a Gaussian process model

newdata

new data to predict loss


Predicting loss using arbiter

Description

Predicting loss using arbiter

Usage

meta_predict(model, newdata, meta_model)

Arguments

model

arbiter model

newdata

new data to predict loss

meta_model

learning algorithm – either a "randomforest", a "lasso", or a "gaussianprocess".


Training a random forest arbiter

Description

Training a random forest arbiter

Usage

meta_rf(form, data)

Arguments

form

formula

data

data


Arbiter predictions via ranger

Description

Arbiter predictions via ranger

Usage

meta_rf_predict(meta_model, newdata)

Arguments

meta_model

arbiter – a ranger object

newdata

new data to predict


Training a Gaussian process arbiter

Description

Training a Gaussian process arbiter

Usage

meta_svr(form, data)

Arguments

form

form

data

data


Arbiter predictions via linear model

Description

Arbiter predictions via linear model

Usage

meta_svr_predict(model, newdata)

Arguments

model

arbiter – a Gaussian process model

newdata

new data to predict loss


Training a xgb arbiter

Description

Training a xgb arbiter

Usage

meta_xgb(form, data)

Arguments

form

formula

data

data


Arbiter predictions via xgb

Description

Arbiter predictions via xgb

Usage

meta_xgb_predict(meta_model, newdata)

Arguments

meta_model

arbiter – a ranger object

newdata

new data to predict


Recent performance of models using EMASE

Description

This function computes EMASE, Erfc Moving Average Squared Error, to quantify the recent performance of the base models.

Usage

model_recent_performance(Y_hat, Y, lambda, omega, pre_weights)

Arguments

Y_hat

A data.frame containing the predictions of each base model;

Y

know true values from past data to compare the predictions to;

lambda

Window size. Number of periods to average over when computing MASE;

omega

Ratio of top models in the committee;

pre_weights

The initial weights of the models, computed in the available data during the learning phase;

Value

A list containing two objects:

model_scores

The weights of the models in each time point

top_models

Models in the committee in each time point

See Also

Other weighting base models: EMASE(), build_committee(), get_top_models(), model_weighting(), select_best()


Setup base learning models

Description

This class sets up the base learning models and respective parameters setting to learn the ensemble.

Usage

model_specs(learner, learner_pars = NULL)

Arguments

learner

character vector with the base learners to be trained. Currently available models are:

bm_gaussianprocess

Gaussian Process models, from the kernlab package. See gausspr for a complete description and possible parametrization. See bm_gaussianprocess for the function implementation.

bm_ppr

Projection Pursuit Regression models, from the stats package. See ppr for a complete description and possible parametrization. See bm_ppr for the function implementation.

bm_glm

Generalized Linear Models, from the glmnet package. See glmnet for a complete description and possible parametrization. See bm_glm for the function implementation.

bm_gbm

Generalized Boosted Regression models, from the gbm package. See gbm for a complete description and possible parametrization. See bm_gbm for the function implementation.

bm_randomforest

Random Forest models, from the ranger package. See ranger for a complete description and possible parametrization. See bm_randomforest for the function implementation.

bm_cubist

M5 tree models, from the Cubist package. See cubist for a complete description and possible parametrization. See bm_cubist for the function implementation.

bm_mars

Multivariate Adaptive Regression Splines models, from the earth package. See earth for a complete description and possible parametrization. See bm_mars for the function implementation.

bm_svr

Support Vector Regression models, from the kernlab package. See ksvm for a complete description and possible parametrization. See bm_svr for the function implementation.

bm_ffnn

Feedforward Neural Network models, from the nnet package. See nnet for a complete description and possible parametrization. See bm_ffnn for the function implementation.

bm_pls_pcr

Partial Least Regression and Principal Component Regression models, from the pls package. See mvr for a complete description and possible parametrization. See bm_pls_pcr for the function implementation.

learner_pars

a list with parameter setting for the learner. For each model, a inner list should be created with the specified parameters.

Check each implementation to see the possible variations of parameters (also examplified below).

Examples

# A PPR model and a GLM model with default parameters
model_specs(learner = c("bm_ppr", "bm_glm"), learner_pars = NULL)


# A PPR model and a SVR model. The listed parameters are combined
# with a cartesian product.
# With these specifications an ensemble with 6 predictive base
# models will be created. Two PPR models, one with 2 nterms
# and another with 4; and 4 SVR models, combining the kernel
# and C parameters.
specs <- model_specs(
 c("bm_ppr", "bm_svr"),
 list(bm_ppr = list(nterms = c(2, 4)),
      bm_svr = list(kernel = c("vanilladot", "polydot"), C = c(1,5)))
)

# All parameters currently available (parameter values can differ)
model_specs(
 learner = c("bm_ppr", "bm_svr", "bm_randomforest",
             "bm_gaussianprocess", "bm_cubist", "bm_glm",
             "bm_gbm", "bm_pls_pcr", "bm_ffnn", "bm_mars"
         ),
 learner_pars = list(
    bm_ppr = list(
       nterms = c(2,4),
       sm.method = "supsmu"
     ),
    bm_svr = list(
       kernel = "rbfdot",
       C = c(1,5),
       epsilon = .01
     ),
    bm_glm = list(
       alpha = c(1, 0)
     ),
    bm_randomforest = list(
       num.trees = 500
     ),
    bm_gbm = list(
       interaction.depth = 1,
       shrinkage = c(.01, .005),
       n.trees = c(100)
     ),
    bm_mars = list(
       nk = 15,
       degree = 3,
       thresh = .001
     ),
    bm_ffnn = list(
       size = 30,
       decay = .01
     ),
    bm_pls_pcr = list(
       method = c("kernelpls", "simpls", "cppls")
     ),
    bm_gaussianprocess = list(
       kernel = "vanilladot",
       tol = .01
     ),
    bm_cubist = list(
       committees = 50,
       neighbors = 0
     )
  )
)


Setup base learning models

Description

This class sets up the base learning models and respective parameters setting to learn the ensemble.

Slots

learner

character vector with the base learners to be trained. Currently available models are:

bm_gaussianprocess

Gaussian Process models, from the kernlab package. See gausspr for a complete description and possible parametrization. See bm_gaussianprocess for the function implementation.

bm_ppr

Projection Pursuit Regression models, from the stats package. See ppr for a complete description and possible parametrization. See bm_ppr for the function implementation.

bm_glm

Generalized Linear Models, from the glmnet package. See glmnet for a complete description and possible parametrization. See bm_glm for the function implementation.

bm_gbm

Generalized Boosted Regression models, from the gbm package. See gbm for a complete description and possible parametrization. See bm_gbm for the function implementation.

bm_randomforest

Random Forest models, from the ranger package. See ranger for a complete description and possible parametrization. See bm_randomforest for the function implementation.

bm_cubist

M5 tree models, from the Cubist package. See cubist for a complete description and possible parametrization. See bm_cubist for the function implementation.

bm_mars

Multivariate Adaptive Regression Splines models, from the earth package. See earth for a complete description and possible parametrization. See bm_mars for the function implementation.

bm_svr

Support Vector Regression models, from the kernlab package. See ksvm for a complete description and possible parametrization. See bm_svr for the function implementation.

bm_ffnn

Feedforward Neural Network models, from the nnet package. See nnet for a complete description and possible parametrization. See bm_ffnn for the function implementation.

bm_pls_pcr

Partial Least Regression and Principal Component Regression models, from the pls package. See mvr for a complete description and possible parametrization. See bm_pls_pcr for the function implementation.

learner_pars

a list with parameter setting for the learner. For each model, a inner list should be created with the specified parameters.

Check each implementation to see the possible variations of parameters (also examplified below).

Examples

# A PPR model and a GLM model with default parameters
model_specs(learner = c("bm_ppr", "bm_glm"), learner_pars = NULL)


# A PPR model and a SVR model. The listed parameters are combined
# with a cartesian product.
# With these specifications an ensemble with 6 predictive base
# models will be created. Two PPR models, one with 2 nterms
# and another with 4; and 4 SVR models, combining the kernel
# and C parameters.
specs <- model_specs(
 c("bm_ppr", "bm_svr"),
 list(bm_ppr = list(nterms = c(2, 4)),
      bm_svr = list(kernel = c("vanilladot", "polydot"), C = c(1,5)))
)

# All parameters currently available (parameter values can differ)
model_specs(
 learner = c("bm_ppr", "bm_svr", "bm_randomforest",
             "bm_gaussianprocess", "bm_cubist", "bm_glm",
             "bm_gbm", "bm_pls_pcr", "bm_ffnn", "bm_mars"
         ),
 learner_pars = list(
    bm_ppr = list(
       nterms = c(2,4),
       sm.method = "supsmu"
     ),
    bm_svr = list(
       kernel = "rbfdot",
       C = c(1,5),
       epsilon = .01
     ),
    bm_glm = list(
       alpha = c(1, 0)
     ),
    bm_randomforest = list(
       num.trees = 500
     ),
    bm_gbm = list(
       interaction.depth = 1,
       shrinkage = c(.01, .005),
       n.trees = c(100)
     ),
    bm_mars = list(
       nk = 15,
       degree = 3,
       thresh = .001
     ),
    bm_ffnn = list(
       size = 30,
       decay = .01
     ),
    bm_pls_pcr = list(
       method = c("kernelpls", "simpls", "cppls")
     ),
    bm_gaussianprocess = list(
       kernel = "vanilladot",
       tol = .01
     ),
    bm_cubist = list(
       committees = 50,
       neighbors = 0
     )
  )
)


Model weighting

Description

This is an utility function that takes the raw error of models and scales them into a 0-1 range according to one of three strategies:

Usage

model_weighting(x, trans = "softmax", ...)

Arguments

x

A object describing the loss of each base model

trans

Character value describing the transformation type. The available options are softmax, linear and erfc. The softmax and erfc provide a non-linear transformation where the weights decay exponentially as the relative loss of a given model increases (with respect to all available models). The linear transformation is a simple normalization of values using the max-min method.

...

Further arguments to normalize and proportion functions \(na.rm = TRUE\)

Details

erfc

using the complementary Gaussian error function

softmax

using a softmax function

linear

A simple normalization using max-min method

These tranformations culminate into the final weights of the models.

Value

An object describing the weights of models

See Also

Other weighting base models: EMASE(), build_committee(), get_top_models(), model_recent_performance(), select_best()


Computing the mean squared error

Description

Utility function to compute mean squared error (MSE)

Usage

mse(y, y_hat)

Arguments

y

A numeric vector representing the actual values.

y_hat

A numeric vector representing the forecasted values.

See Also

Other error/performance functions: ae(), se()


Scale a numeric vector using max-min

Description

Utility function used to linearly normalize a numeric vector

Usage

normalize(x)

Arguments

x

a numeric vector.

Value

a linearly normalized vector

Examples

normalize(rnorm(4L))
normalize(1:10)


Predicting new observations using an ensemble

Description

Initially, the predictions of the base models are collected. Then, the predictions of the loss to be incurred by the base models E_hat (estimated by their associate meta models) are computed. The weights of the base models are then estimated according to E_hat and the committee of top models. The committee is built according to the lambda and omega parameters. Finally, the predictions are combined according to the weights and the committee setup.

Usage

## S4 method for signature 'ADE'
predict(object, newdata)

## S4 method for signature 'DETS'
predict(object, newdata)

## S4 method for signature 'base_ensemble'
predict(object, newdata)

Arguments

object

an object of class ADE-class;

newdata

new data to predict

Examples


###### Predicting with an ADE ensemble

specs <- model_specs(
 learner = c("bm_glm", "bm_mars"),
 learner_pars = NULL
)

data("water_consumption")
dataset <- embed_timeseries(water_consumption, 5)
train <- dataset[1:1000, ]
test <- dataset[1001:1500, ]

model <- ADE(target ~., train, specs)

preds <- predict(model, test)


## Not run: 

###### Predicting with a DETS ensemble

specs <- model_specs(
 learner = c("bm_svr", "bm_glm", "bm_mars"),
 learner_pars = NULL
)

data("water_consumption")
dataset <- embed_timeseries(water_consumption, 5)
train <- dataset[1:700, ]
test <- dataset[701:1000, ]

model <- DETS(target ~., train, specs, lambda = 50, omega = .2)

preds <- predict(model, test)

## End(Not run)


## Not run: 
###### Predicting with a base ensemble

model <- ADE(target ~., train, specs)

basepreds <- predict(model@base_ensemble, test)

## End(Not run)



predict method for pls/pcr

Description

predict method for pls/pcr

Usage

predict_pls_pcr(model, newdata)

Arguments

model

pls/pcr model

newdata

new data


Computing the proportions of a numeric vector

Description

Utility function used to compute the proportion of the values of a vector. The proportion of a value is its ratio relative to the sum of the vector.

Usage

proportion(x)

Arguments

x

a numeric vector;

Value

A vector of proportions

Examples

proportion(rnorm(5L))
proportion(1:10)


rbind with do.call syntax

Description

rbind with do.call syntax

Usage

rbind_l(x)

Arguments

x

object to call rbind to.


Get most recent lambda observations

Description

Get most recent lambda observations

Usage

recent_lambda_observations(data, lambda)

Arguments

data

time series data as data.frame

lambda

number of observations to keep


Computing the root mean squared error

Description

Utility function to compute Root Mean Squared Error (RMSE)

Usage

rmse(y, y_hat)

Arguments

y

A numeric vector representing the actual values.

y_hat

A numeric vector representing the forecasted values.


Computing the rolling mean of the columns of a matrix

Description

Computing the rolling mean of the columns of a matrix

Usage

roll_mean_matrix(x, lambda)

Arguments

x

a numeric data.frame;

lambda

periods to average over when computing the moving average.


Computing the squared error

Description

Utility function to compute pointwise squared error (SE)

Usage

se(y, y_hat)

Arguments

y

A numeric vector representing the actual values.

y_hat

A numeric vector representing the forecasted values.

Value

squared error of forecasted values.

See Also

Other error/performance functions: ae(), mse()


Selecting best model according to weights

Description

This function select the best model from a matrix of data x models. For each row (data point), the model with maximum weight is assigned a weight of 1, while the remaining models are assigned a weight of 0.

Usage

select_best(model_scores)

Arguments

model_scores

matrix containing the model weights across the observations

See Also

Other weighting base models: EMASE(), build_committee(), get_top_models(), model_recent_performance(), model_weighting()


Sequential Re-weighting for controlling predictions' redundancy

Description

Besides ensemble heterogeneity we encourage diversity explicitly during the aggregation of the output of experts. This is achieved by taking into account not only predictions of performance produced by the arbiters, but also the correlation among experts in a recent window of observations.

Usage

sequential_reweighting(sliding_similarity, W)

Arguments

sliding_similarity

list of pairwise similarity values. See sliding_similarity

W

weights before re-weighting


Sliding similarity via Pearson's correlation

Description

Sliding similarity via Pearson's correlation

Usage

sliding_similarity(Y_hat_ext, lambda)

Arguments

Y_hat_ext

Predictions from the base-learners across the examples.

lambda

window size for computing correlations

Value

a list with a correlation matrix for each prediction point


Soft Imputation

Description

Soft Imputation

Usage

soft.completion(x)

Arguments

x

data


Computing the softmax

Description

This function computes the softmax function in a numeric vector

Usage

softmax(x)

Arguments

x

numeric vector


Splitting expressions by pattern

Description

This is an utility function that can be used to split expressions. It is based on strsplit function. split_by is the general purpose splitter split_by_ splits expressions by \"_\" split_by. splits expressions by a dot

Usage

split_by(expr, split, unlist. = TRUE, ...)

split_by_(expr, ...)

split_by.(expr, ...)

Arguments

expr

character expression to split;

split

expression to split expr by;

unlist.

Logical. If TRUE, the splitted expr is unlisted;

...

Further parameters to pass to strsplit;

Value

a list or vector with a splitted expression

Examples

split_by_("time_series")
split_by.("time.series")
split_by("born2bewild", "2")


Training procedure of for ADE

Description

Base level models are trained according to specs, and meta level models are trained using a blocked prequential procedure in out-of-bag samples from the training data.

Usage

train_ade(form, train, specs, lambda, lfun, meta_model_type, num_cores)

Arguments

form

formula;

train

training data as a data frame;

specs

a model_specs-class object class. It contains the parameter setting specifications for training the ensemble;

lambda

window size. Number of observations to compute the recent performance of the base models, according to the committee ratio omega. Essentially, the top omega models are selected and weighted at each prediction instance, according to their performance in the last lambda observations. Defaults to 50 according to empirical experiments;

lfun

meta loss function - defaults to ae (absolute error)

meta_model_type

algorithm used to train meta models. Defaults to a random forest (using ranger package)

num_cores

A numeric value to specify the number of cores used to train base and meta models. num_cores = 1 leads to sequential training of models. num_cores > 1 splits the training of the base models across num_cores cores.


ADE training poor version Train meta-models in the training data, as opposed to using a validation dataset

Description

Saves times by not computing oob predictions. Testing comp costs are the same.

Usage

train_ade_quick(form, train, specs, lambda, lfun, meta_model_type, num_cores)

Arguments

form

formula

train

training data

specs

a model_specs-class object class. It contains the parameter setting specifications for training the ensemble;

lambda

window size. Number of observations to compute the recent performance of the base models, according to the committee ratio omega. Essentially, the top omega models are selected and weighted at each prediction instance, according to their performance in the last lambda observations. Defaults to 50 according to empirical experiments;

lfun

meta loss function - defaults to ae (absolute error)

meta_model_type

algorithm used to train meta models. Defaults to a random forest (using ranger package)

num_cores

A numeric value to specify the number of cores used to train base and meta models. num_cores = 1 leads to sequential training of models. num_cores > 1 splits the training of the base models across num_cores cores.


Dynamic Ensembles for Time Series Forecasting

Description

This package implements ensemble methods for time series forecasting tasks. Dynamically combining different forecasting models is a common approach to tackle these problems.

Details

The main methods in tsensembler are in ADE-class and DETS-class:

ADE

Arbitrated Dynamic Ensemble (ADE) is an ensemble approach for dynamically combining forecasting models using a metalearning strategy called arbitrating. A meta model is trained for each base model in the ensemble. Each meta-learner is specifically designed to model the error of its associate across the time series. At forecasting time, the base models are weighted according to their degree of competence in the input observation, estimated by the predictions of the meta models

DETS

Dynamic Ensemble for Time Series (DETS) is similar to ADE in the sense that it adaptively combines the base models in an ensemble for time series forecasting. DETS follows a more traditional approach for forecaster combination. It pre-trains a set of heterogeneous base models, and at run-time weights them dynamically according to recent performance. Like ADE, the ensemble includes a committee, which dynamically selects a subset of base models that are weighted with a non-linear function

The ensemble methods can be used to predict new observations or forecast future values of a time series. They can also be updated using generic functions (check see also section).

References

Cerqueira, Vitor; Torgo, Luis; Pinto, Fabio; and Soares, Carlos. "Arbitrated Ensemble for Time Series Forecasting" to appear at: Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Springer International Publishing, 2017.

V. Cerqueira, L. Torgo, and C. Soares, “Arbitrated ensemble for solar radiation forecasting,” in International Work-Conference on Artificial Neural Networks. Springer, 2017, pp. 720–732

Cerqueira, Vitor; Torgo, Luis; Oliveira, Mariana, and Bernhard Pfahringer. "Dynamic and Heterogeneous Ensembles for Time Series Forecasting." Data Science and Advanced Analytics (DSAA), 2017 IEEE International Conference on. IEEE, 2017.

See Also

ADE-class for setting up an ADE model; and DETS-class for setting up an DETS model; see also update_weights and update_base_models to check the generic function for updating the predictive models in an ensemble.

Examples


## Not run: 

data("water_consumption")
# embedding time series into a matrix
dataset <- embed_timeseries(water_consumption, 5)

# splitting data into train/test
train <- dataset[1:1000,]
test <- dataset[1001:1020, ]

# setting up base model parameters
specs <- model_specs(
  learner = c("bm_ppr","bm_glm","bm_svr","bm_mars"),
  learner_pars = list(
    bm_glm = list(alpha = c(0, .5, 1)),
    bm_svr = list(kernel = c("rbfdot", "polydot"),
                  C = c(1,3)),
    bm_ppr = list(nterms = 4)
  ))

# building the ensemble
model <- ADE(target ~., train, specs)


# forecast next value and update base and meta models
# every three points;
# in the other points, only the weights are updated
predictions <- numeric(nrow(test))
for (i in seq_along(predictions)) {
  predictions[i] <- predict(model, test[i, ])@y_hat
  if (i %% 3 == 0) {
    model <-
      update_base_models(model,
                         rbind.data.frame(train, test[seq_len(i), ]))

    model <- update_ade_meta(model, rbind.data.frame(train, test[seq_len(i), ]))
  }
  else
    model <- update_weights(model, test[i, ])
}

point_forecast <- forecast(model, h = 5)

# setting up an ensemble of support vector machines
specs2 <-
  model_specs(learner = c("bm_svr"),
              learner_pars = list(
                bm_svr = list(kernel = c("vanilladot", "polydot",
                                         "rbfdot"),
                              C = c(1,3,6))
              ))

model <- DETS(target ~., train, specs2)
preds <- predict(model, test)@y_hat


## End(Not run)



Updating an ADE model

Description

update_ade is a generic function that combines update_base_models, update_ade_meta, and update_weights.

Usage

update_ade(object, newdata, num_cores = 1)

## S4 method for signature 'ADE'
update_ade(object, newdata, num_cores = 1)

Arguments

object

a ADE-class object.

newdata

data used to update the ADE model. This should be the data used to initially train the models (training set), together with new observations (for example, validation set). Each model is retrained using newdata.

num_cores

A numeric value to specify the number of cores used to train base and meta models. num_cores = 1 leads to sequential training of models. num_cores > 1 splits the training of the base models across num_cores cores.

See Also

ADE-class for building an ADE model; update_weights for updating the weights of the ensemble (without retraining the models); update_base_models for updating the base models of an ensemble; and update_ade_meta for updating the meta-models of an ADE model.

Other updating models: update_ade_meta(), update_weights()

Examples

specs <- model_specs(
 learner = c("bm_svr", "bm_glm", "bm_mars"),
 learner_pars = NULL
)

data("water_consumption")
dataset <- embed_timeseries(water_consumption, 5)
# toy size for checks
train <- dataset[1:300, ]
validation <- dataset[301:400, ]
test <- dataset[401:500, ]

model <- ADE(target ~., train, specs)

preds_val <- predict(model, validation)
model <- update_ade(model, rbind.data.frame(train, validation))

preds_test <- predict(model, test)



Updating the metalearning layer of an ADE model

Description

The update_ade_meta function uses new information to update the meta models of an ADE-class ensemble. As input it receives a ADE-class model object class and a new dataset for updating the weights of the base models in the ensemble. This new data should have the same structure as the one used to build the ensemble. Updating the base models of the ensemble is done using the update_base_models function.

Usage

update_ade_meta(object, newdata, num_cores = 1)

## S4 method for signature 'ADE'
update_ade_meta(object, newdata, num_cores = 1)

Arguments

object

a ADE-class object.

newdata

data used to update the meta models. This should be the data used to initially train the meta-models (training set), together with new observations (for example, validation set). Each meta model is retrained using newdata.

num_cores

A numeric value to specify the number of cores used to train base and meta models. num_cores = 1 leads to sequential training of models. num_cores > 1 splits the training of the base models across num_cores cores.

See Also

ADE-class for building an ADE model; update_weights for updating the weights of the ensemble (without retraining the models); and update_base_models for updating the base models of an ensemble.

Other updating models: update_ade(), update_weights()

Examples

## Not run: 
specs <- model_specs(
 learner = c("bm_svr", "bm_glm", "bm_mars"),
 learner_pars = NULL
)

data("water_consumption")
dataset <- embed_timeseries(water_consumption, 5)
train <- dataset[1:1000, ]
validation <- dataset[1001:1200, ]
test <- dataset[1201:1500, ]

model <- ADE(target ~., train, specs)

preds_val <- predict(model, validation)
model <- update_ade_meta(model, rbind.data.frame(train, validation))

preds_test <- predict(model, test)

## End(Not run)


Update the base models of an ensemble

Description

This is a generic function for updating the base models comprising an ensemble.

Usage

update_base_models(object, newdata, num_cores = 1)

## S4 method for signature 'ADE'
update_base_models(object, newdata, num_cores = 1)

## S4 method for signature 'DETS'
update_base_models(object, newdata, num_cores = 1)

Arguments

object

an ensemble object, of class DETS-class or ADE-class;

newdata

new data used to update the models. Each base model is retrained, so newdata should be the past data used for initially training the models along with any further available observations.

num_cores

A numeric value to specify the number of cores used to train base and meta models. num_cores = 1 leads to sequential training of models. num_cores > 1 splits the training of the base models across num_cores cores.

Details

update_base_models function receives a model object and a new dataset for retraining the base models. This new data should have the same structure as the one used to build the ensemble.

See Also

ADE-class for the ADE model information, and DETS-class for the DETS model information; update_ade_meta for updating the meta models of an ADE ensemble. See update_weights for the method used to update the weights of the ensemble. Updating the weights only changes the information about the recent observations for computing the weights of the base models, while updating the model uses that information to retrain the models.

Examples

data("water_consumption")
dataset <- embed_timeseries(water_consumption, 5)
# toy size for checks execution time
train <- dataset[1:300,]
test <- dataset[301:305, ]

specs <- model_specs(c("bm_ppr","bm_glm","bm_mars"), NULL)

model <- ADE(target ~., train, specs)

predictions <- numeric(nrow(test))
for (i in seq_along(predictions)) {
  predictions[i] <- predict(model, test[i, ])@y_hat
  model <-
    update_base_models(model,
                       rbind.data.frame(train, test[seq_len(i), ]))
}

####

specs2 <- model_specs(c("bm_ppr","bm_randomforest","bm_svr"), NULL)

modeldets <- DETS(target ~., train, specs2)

predictions <- numeric(nrow(test))
# predict new data and update models every three points
# in the remaining points, the only the weights are updated
for (i in seq_along(predictions)) {
  predictions[i] <- predict(modeldets, test[i, ])@y_hat

  if (i %% 3 == 0)
    modeldets <-
      update_base_models(modeldets,
                         rbind.data.frame(train, test[seq_len(i), ]))
  else
    modeldets <- update_weights(modeldets, test[seq_len(i), ])
}



Updating the weights of base models

Description

Update the weights of base models of a ADE-class or DETS-class ensemble. This is accomplished by using computing the loss of the base models in new recent observations.

Usage

update_weights(object, newdata)

## S4 method for signature 'ADE'
update_weights(object, newdata)

## S4 method for signature 'DETS'
update_weights(object, newdata)

Arguments

object

a ADE-class or DETS-class model object;

newdata

new data used to update the most recent observations of the time series. At prediction time these observations are used to compute the weights of the base models

Note

Updating the weights of an ensemble is only necessary between different calls of the functions predict or forecast. Otherwise, if consecutive know observations are predicted (e.g. a validation/test set) the updating is automatically done internally.

See Also

update_weights for the weight updating method for an ADE model, and update_weights for the same method for a DETS model

Other updating models: update_ade_meta(), update_ade()

Examples

data("water_consumption")
dataset <- embed_timeseries(water_consumption, 5)

# toy size for checks
train <- dataset[1:300,]
test <- dataset[301:305, ]

specs <- model_specs(c("bm_ppr","bm_glm","bm_mars"), NULL)
## same with model <- DETS(target ~., train, specs)
model <- ADE(target ~., train, specs)

# if consecutive know observations are predicted (e.g. a validation/test set)
# the updating is automatically done internally.
predictions1 <- predict(model, test)@y_hat

# otherwise, the models need to be updated
predictions <- numeric(nrow(test))
# predict new data and update the weights of the model
for (i in seq_along(predictions)) {
  predictions[i] <- predict(model, test[i, ])@y_hat

  model <- update_weights(model, test[i, ])
}

#all.equal(predictions1, predictions)



Water Consumption in Oporto city (Portugal) area.

Description

A time series of classes xts and zoo containing the water consumption levels a specific delivery point at Oporto town, in Portugal.

Usage

water_consumption

Format

The time series has 1741 values from Jan, 2012 to Oct, 2016 in a daily granularity.

consumption

consumption of water, raw value from sensor

Source

https://www.addp.pt/home.php


XGB optimizer

Description

XGB optimizer

Usage

xgb_optimizer(X, y, gsearch)

Arguments

X

Covariates

y

Target values

gsearch

Grid search


XGBoost predict function

Description

XGBoost predict function

Usage

xgb_predict(model, newdata)

Arguments

model

Model from bm_xgb

newdata

Test data


asdasd

Description

asdasd

Usage

xgb_predict_(model, newdata)

Arguments

model

mode

newdata

s

mirror server hosted at Truenetwork, Russian Federation.