Type: | Package |
Title: | Feature-Based Forecast Model Selection |
Version: | 1.1.8 |
Maintainer: | Thiyanga Talagala <tstalagala@gmail.com> |
Description: | A novel meta-learning framework for forecast model selection using time series features. Many applications require a large number of time series to be forecast. Providing better forecasts for these time series is important in decision and policy making. We propose a classification framework which selects forecast models based on features calculated from the time series. We call this framework FFORMS (Feature-based FORecast Model Selection). FFORMS builds a mapping that relates the features of time series to the best forecast model using a random forest. 'seer' package is the implementation of the FFORMS algorithm. For more details see our paper at https://www.monash.edu/business/econometrics-and-business-statistics/research/publications/ebs/wp06-2018.pdf. |
License: | GPL-3 |
URL: | https://thiyangt.github.io/seer/ |
BugReports: | https://github.com/thiyangt/seer/issues |
Depends: | R (≥ 3.2.3) |
Imports: | stats, urca, forecast (≥ 8.3), dplyr, magrittr, randomForest, forecTheta, stringr, tibble, purrr, future, furrr, utils, tsfeatures |
Encoding: | UTF-8 |
RoxygenNote: | 7.2.1 |
Suggests: | testthat (≥ 2.1.0), covr, repmis, knitr, rmarkdown, ggplot2, tidyr, Mcomp, GGally |
NeedsCompilation: | no |
Packaged: | 2022-10-01 06:53:39 UTC; thiyangashaminitalagala |
Author: | Thiyanga Talagala |
Repository: | CRAN |
Date/Publication: | 2022-10-01 07:10:02 UTC |
Calculate accuracy measue based on ARIMA models
Description
Calculate accuracy measue based on ARIMA models
Usage
accuracy_arima(ts_info, function_name, length_out)
Arguments
ts_info |
list containing training and test part of a time series |
function_name |
function to calculate the accuracy function, the arguments of this function should be forecast, training and test set of the time series |
length_out |
number of measures calculated by the function |
Value
a list which contains the accuracy and name of the specific ARIMA model.
Forecast-accuracy calculation
Description
Calculate accuracy measure based on ETS models
Usage
accuracy_ets(ts_info, function_name, length_out)
Arguments
ts_info |
list containing training and test part of a time series |
function_name |
function to calculate the accuracy function, the arguments of this function should be forecast, training and test set of the time series |
length_out |
number of measures calculated by the function |
Value
a list which contains the accuracy and name of the specific ETS model.
Calculate accuracy based on MSTL
Description
Calculate accuracy based on MSTL
Usage
accuracy_mstl(ts_info, function_name, length_out, mtd)
Arguments
ts_info |
list containing training and test part of a time series |
function_name |
function to calculate the accuracy function, the arguments of this function should be forecast, training and test set of the time series |
length_out |
number of measures calculated by the function |
mtd |
Method to use for forecasting the seasonally adjusted series |
Value
accuracy measure calculated based on multiple seasonal decomposition
Calculate accuracy measure calculated based on neural network forecasts
Description
Calculate accuracy measure calculated based on neural network forecasts
Usage
accuracy_nn(ts_info, function_name, length_out)
Arguments
ts_info |
list containing training and test part of a time series |
function_name |
function to calculate the accuracy function, the arguments of this function should be forecast, training and test set of the time series |
length_out |
number of measures calculated by the function |
Value
accuracy measure calculated based on neural network forecasts
Calculate accuracy measure based on random walk models
Description
Calculate accuracy measure based on random walk models
Usage
accuracy_rw(ts_info, function_name, length_out)
Arguments
ts_info |
list containing training and test part of a time series |
function_name |
function to calculate the accuracy function, the arguments of this function should be forecast, training and test set of the time series |
length_out |
number of measures calculated by the function |
Value
returns accuracy measure calculated baded on random walk model
Calculate accuracy measure based on random walk with drift
Description
Calculate accuracy measure based on random walk with drift
Usage
accuracy_rwd(ts_info, function_name, length_out)
Arguments
ts_info |
list containing training and test part of a time series |
function_name |
function to calculate the accuracy function, the arguments of this function should be forecast, training and test set of the time series |
length_out |
number of measures calculated by the function |
Value
accuracy measure calculated baded on random walk with drift model
Calculate accuracy measure based on snaive method
Description
Calculate accuracy measure based on snaive method
Usage
accuracy_snaive(ts_info, function_name, length_out)
Arguments
ts_info |
list containing training and test part of a time series |
function_name |
function to calculate the accuracy function, the arguments of this function should be forecast, training and test set of the time series |
length_out |
number of measures calculated by the function |
Value
accuracy measure calculated based on snaive method
Calculate accuracy measure based on STL-AR method
Description
Calculate accuracy measure based on STL-AR method
Usage
accuracy_stlar(ts_info, function_name, length_out)
Arguments
ts_info |
list containing training and test part of a time series |
function_name |
function to calculate the accuracy function, the arguments of this function should be forecast, training and test set of the time series |
length_out |
number of measures calculated by the function |
Value
accuracy measure calculated based on stlar method
Calculate accuracy measure based on TBATS
Description
Calculate accuracy measure based on TBATS
Usage
accuracy_tbats(ts_info, function_name, length_out)
Arguments
ts_info |
list containing training and test part of a time series |
function_name |
function to calculate the accuracy function, the arguments of this function should be forecast, training and test set of the time series |
length_out |
number of measures calculated by the function |
Value
accuracy measure calculated based on TBATS models
Calculate accuracy measure based on Theta method
Description
Calculate accuracy measure based on Theta method
Usage
accuracy_theta(ts_info, function_name, length_out)
Arguments
ts_info |
list containing training and test part of a time series |
function_name |
function to calculate the accuracy function, the arguments of this function should be forecast, training and test set of the time series |
length_out |
number of measures calculated by the function |
Value
returns accuracy measure calculated based on theta method
Calculate accuracy measure based on white noise process
Description
Calculate accuracy measure based on white noise process
Usage
accuracy_wn(ts_info, function_name, length_out)
Arguments
ts_info |
list containing training and test part of a time series |
function_name |
function to calculate the accuracy function, the arguments of this function should be forecast, training and test set of the time series |
length_out |
number of measures calculated by the function |
Value
returns accuracy measure calculated based on white noise process
Autocorrelation-based features
Description
Computes various measures based on autocorrelation coefficients of the original series, first-differenced series and second-differenced series
Usage
acf5(y)
Arguments
y |
a univariate time series |
Value
A vector of 3 values: sum of squared of first five autocorrelation coefficients of original series, first-differenced series, and twice-differenced series.
Author(s)
Thiyanga Talagala
Autocorrelation coefficients based on seasonally differenced series
Description
Autocorrelation coefficients based on seasonally differenced series
Usage
acf_seasonalDiff(y, m, lagmax)
Arguments
y |
a univariate time series |
m |
frequency of the time series |
lagmax |
maximum lag at which to calculate the acf |
Value
A vector of 3 values: first ACF value of seasonally-differenced series, ACF value at the first seasonal lag of seasonally-differenced series, sum of squares of first 5 autocorrelation coefficients of seasonally-differenced series.
Author(s)
Thiyanga Talagala
build random forest classifier
Description
train a random forest model and predict forecast-models for new series
Usage
build_rf(
training_set,
testset = FALSE,
rf_type = c("ru", "rcp"),
ntree,
seed,
import = FALSE,
mtry = 8
)
Arguments
training_set |
data frame of features and class labels |
testset |
features of new time series, default FALSE if a testset is not available |
rf_type |
whether ru(random forest based on unbiased sample) or rcp(random forest based on class priors) |
ntree |
number of trees in the forest |
seed |
a value for seed |
import |
Should importance of predictors be assessed?, TRUE of FALSE |
mtry |
number of features to be selected at each node |
Value
a list containing the random forest and forecast-models for new series
Mean Absolute Scaled Error(MASE)
Description
Calculation of mean absolute scaled error
Usage
cal_MASE(training, test, forecast)
Arguments
training |
training peiod of the time series |
test |
test period of the time series |
forecast |
forecast values of the series |
Value
returns a single value
Author(s)
Thiyanga Talagala
Weighted Average
Description
Weighted Average(WA) calculated based on MASE, sMAPE for an individual time series
Usage
cal_WA(training, test, forecast)
Arguments
training |
training period of a time series |
test |
test peiod of a time series |
forecast |
forecast obtained from a fitted to the training period |
Value
returns a single value: WA based on MASE and sMAPE
Author(s)
Thiyanga Talagala
Calculate features for new time series instances
Description
Computes relevant time series features before applying them to the model
Usage
cal_features(
tslist,
seasonal = FALSE,
m = 1,
lagmax = 2L,
database,
h,
highfreq
)
Arguments
tslist |
a list of univariate time series |
seasonal |
if FALSE, restricts to features suitable for non-seasonal data |
m |
frequency of the time series or minimum frequency in the case of msts objects |
lagmax |
maximum lag at which to calculate the acf (quarterly series-5L, monthly-13L, weekly-53L, daily-8L, hourly-25L) |
database |
whether the time series is from mcomp or other |
h |
forecast horizon |
highfreq |
whether the time series is weekly, daily or hourly |
Value
dataframe: each column represent a feature and each row represent a time series
Author(s)
Thiyanga Talagala
Mean of MASE and sMAPE
Description
Calculate MASE and sMAPE for an individual time series
Usage
cal_m4measures(training, test, forecast)
Arguments
training |
training period of a time series |
test |
test peiod of a time series |
forecast |
forecast obtained from a fitted to the training period |
Value
returns a single value: mean on MASE and sMAPE
Author(s)
Thiyanga Talagala
Examples
require(Mcomp)
require(magrittr)
ts <- Mcomp::M3[[1]]$x
fcast_arima <- auto.arima(ts) %>% forecast(h=6)
cal_m4measures(M3[[1]]$x, M3[[1]]$xx, fcast_arima$mean)
scale MASE and sMAPE by median
Description
Given a matrix of MASE and sMAPE for each forecasting method and scaled by median and take the mean of MASE-scaled by median and sMAPE-scaled by median as the forecast accuracy measure to identify the class labels
Usage
cal_medianscaled(x)
Arguments
x |
output form the function fcast_accuracy, where the parameter accuracyFun = cal_m4measures |
Value
a list with accuracy matrix, vector of arima models and vector of ets models the accuracy for each forecast-method is average of scaled-MASE and scaled-sMAPE. Median of MASE and sMAPE calculated based on forecast produced from different models for a given series.
symmetric Mean Absolute Pecentage Error(sMAPE)
Description
Calculation of symmetric mean absolute percentage error
Usage
cal_sMAPE(training, test, forecast)
Arguments
training |
training peiod of the time series |
test |
test period of the time series |
forecast |
forecast values of the series |
Value
returns a single value
Author(s)
Thiyanga Talagala
Classify labels according to the FFORMS famework
Description
This function further classify class labels as in FFORMS framework
Usage
classify_labels(df_final)
Arguments
df_final |
a dataframe: output from split_names function |
Value
a vector of class labels in FFORMS framewok
identify the best forecasting method
Description
identify the best forecasting method according to the forecast accuacy measure
Usage
classlabel(accuracy_mat)
Arguments
accuracy_mat |
matrix of forecast accuracy measures (rows: time series, columns: forecasting method) |
Value
a vector: best forecasting method for each series corresponding to the rows of accuracy_mat
Author(s)
Thiyanga Talagala
This function is call to be inside fforms_combination
Description
Given weights and time series in a two seperate vectors calculate combination forecast
Usage
combination_forecast_inside(x, y, h)
Arguments
x |
weights and names of models (output based on fforms.ensemble) |
y |
time series values |
h |
forecast horizon |
Value
list of combination forecasts corresponds to point, lower and upper
Author(s)
Thiyanga Talagala
Convert multiple frequency time series into msts object
Description
Convert multiple frequency(daily, hourly, half-hourly, minutes, seconds) time series into msts object.
Usage
convert_msts(y, category)
Arguments
y |
univariate time series |
category |
frequency data have been collected |
Value
a ts object or msts object
Autocorrelation coefficient at lag 1 of the residuals
Description
Computes the first order autocorrelation of the residual series of the deterministic trend model
Usage
e_acf1(y)
Arguments
y |
a univariate time series |
Value
A numeric value.
Author(s)
Thiyanga Talagala
calculate forecast accuracy from different forecasting methods
Description
Calculate forecast accuracy on test set according to a specified criterion
Usage
fcast_accuracy(
tslist,
models = c("ets", "arima", "rw", "rwd", "wn", "theta", "stlar", "nn", "snaive",
"mstlarima", "mstlets", "tbats"),
database,
accuracyFun,
h,
length_out,
fcast_save
)
Arguments
tslist |
a list of time series |
models |
a vector of models to compute |
database |
whether the time series is from mcomp or other |
accuracyFun |
function to calculate the accuracy measure, the arguments for the accuracy function should be training, test and forecast |
h |
forecast horizon |
length_out |
number of measures calculated by a single function |
fcast_save |
if the argument is TRUE, forecasts from each series are saved |
Value
a list with accuracy matrix, vector of arima models and vector of ets models
Author(s)
Thiyanga Talagala
Combination forecast based on fforms
Description
Compute combination forecast based on the vote matrix probabilities
Usage
fforms_combinationforecast(
fforms.ensemble,
tslist,
database,
h,
holdout = TRUE,
parallel = FALSE,
multiprocess = future::multisession
)
Arguments
fforms.ensemble |
a list output from fforms_ensemble function |
tslist |
list of new time series |
database |
whethe the time series is from mcom or other |
h |
length of the forecast horizon |
holdout |
if holdout=TRUE take a holdout sample from your data to caldulate forecast accuracy measure, if FALSE all of the data will be used for forecasting. Default is TRUE |
parallel |
If TRUE, multiple cores (or multiple sessions) will be used. This only speeds things up when there are a large number of time series. |
multiprocess |
The function from the |
Value
a list containing, point forecast, confidence interval, accuracy measure
Author(s)
Thiyanga Talagala
Function to identify models to compute combination forecast using FFORMS algorithm
Description
This function identify models to be use in producing combination forecast
Usage
fforms_ensemble(votematrix, threshold = 0.5)
Arguments
votematrix |
a matrix of votes of probabilities based of fforms random forest classifier |
threshold |
threshold value for sum of probabilities of votes, default is 0.5 |
Value
a list containing the names of the forecast models
Author(s)
Thiyanga Talagala
Parameter estimates of Holt-Winters seasonal method
Description
Estimate the smoothing parameter for the level-alpha and the smoothing parameter for the trend-beta, and seasonality-gamma
Usage
holtWinter_parameters(y)
Arguments
y |
a univariate time series |
Value
A vector of 3 values: alpha, beta, gamma
Author(s)
Thiyanga Talagala
preparation of training set
Description
Preparation of a training set for random forest training
Usage
prepare_trainingset(accuracy_set, feature_set)
Arguments
accuracy_set |
output from the fcast_accuracy |
feature_set |
output from the cal_features |
Value
dataframe consisting features and classlabels
function to calculate point forecast, 95% confidence intervals, forecast-accuracy for new series
Description
Given the prediction results of random forest calculate point forecast, 95% confidence intervals, forecast-accuracy for the test set
Usage
rf_forecast(
predictions,
tslist,
database,
function_name,
h,
accuracy,
holdout = TRUE
)
Arguments
predictions |
prediction results obtained from random forest classifier |
tslist |
list of new time series |
database |
whethe the time series is from mcom or other |
function_name |
specify the name of the accuracy function (for eg., cal_MASE, etc.) to calculate accuracy measure, ( if a user written function the arguments for the accuracy function should be training period, test period and forecast). |
h |
length of the forecast horizon |
accuracy |
if true a accuaracy measure will be calculated |
holdout |
if holdout=TRUE take a holdout sample from your data to caldulate forecast accuracy measure, if FALSE all of the data will be used for forecasting. Default is TRUE |
Value
a list containing, point forecast, confidence interval, accuracy measure
Author(s)
Thiyanga Talagala
Simulate time series based on ARIMA models
Description
simulate multiple time series for a given series based on ARIMA models
Usage
sim_arimabased(
y,
Nsim,
Combine = TRUE,
M = TRUE,
Future = FALSE,
Length = NA,
extralength = NA
)
Arguments
y |
a time series or M-competition data time series (Mcomp) |
Nsim |
number of time series to simulate |
Combine |
if TRUE, training and test data in the M-competition data are combined and generate a time series corresponds to the full length of the series. Otherwise, it generate a time series based on the training period of the series. |
M |
if TRUE, y is considered to be a Mcomp data object |
Future |
if future=TRUE, the simulated observations are conditional on the historical observations. In other words, they are possible future sample paths of the time series. But if future=FALSE, the historical data are ignored, and the simulations are possible realizations of the time series model that are not connected to the original data. |
Length |
length of the simulated time series. If future = FALSE, the Length agument should be NA. |
extralength |
extra length need to be added for simulated time series |
Value
A list of time series.
Author(s)
Thiyanga Talagala
Simulate time series based on ETS models
Description
simulate multiple time series for a given series based on ETS models
Usage
sim_etsbased(
y,
Nsim,
Combine = TRUE,
M = TRUE,
Future = FALSE,
Length = NA,
extralength = NA
)
Arguments
y |
a time series or M-competition data time series (Mcomp) |
Nsim |
number of time series to simulate |
Combine |
if TRUE, training and test data in the M-competition data are combined and generate a time series corresponds to the full length of the series. Otherwise, it generate a time series based on the training period of the series. |
M |
if TRUE, y is considered to be a Mcomp data object |
Future |
if future=TRUE, the simulated observations are conditional on the historical observations. In other words, they are possible future sample paths of the time series. But if future=FALSE, the historical data are ignored, and the simulations are possible realizations of the time series model that are not connected to the original data. |
Length |
length of the simulated time series. If future = FALSE, the Length agument should be NA. |
extralength |
extra length need to be added for simulated time series |
Value
A list of time series.
Author(s)
Thiyanga Talagala
Simulate time series based on multiple seasonal decomposition
Description
simulate multiple time series based a given series using multiple seasonal decomposition
Usage
sim_mstlbased(
y,
Nsim,
Combine = TRUE,
M = TRUE,
Future = FALSE,
Length = NA,
extralength = NA,
mtd = "ets"
)
Arguments
y |
a time series or M-competition data time series (Mcomp object) |
Nsim |
number of time series to simulate |
Combine |
if TRUE, training and test data in the M-competition data are combined and generate a time series corresponds to the full length of the series. Otherwise, it generate a time series based on the training period of the series. |
M |
if TRUE, y is considered to be a Mcomp data object |
Future |
if future=TRUE, the simulated observations are conditional on the historical observations. In other words, they are possible future sample paths of the time series. But if future=FALSE, the historical data are ignored, and the simulations are possible realizations of the time series model that are not connected to the original data. |
Length |
length of the simulated time series. If future = FALSE, the Length agument should be NA. |
extralength |
extra length need to be added for simulated time series |
mtd |
method to use for forecasting seasonally adjusted time series |
Value
A list of time series.
Author(s)
Thiyanga Talagala
split the names of ARIMA and ETS models
Description
split the names of ARIMA, ETS models to model name, different number of parameters in each case.
Usage
split_names(models)
Arguments
models |
vector of model names |
Value
a dataframe where columns gives the description of model components
STL-AR method
Description
STL decomposition method applied to the time series, then an AR model is used to forecast seasonally adjusted data, while the seasonal naive method is used to forecast the seasonal component
Usage
stlar(y, h = 10, s.window = 11, robust = FALSE)
Arguments
y |
a univariate time series |
h |
forecast horizon |
s.window |
Either the character string “periodic” or the span (in lags) of the loess window for seasonal extraction |
robust |
logical indicating if robust fitting be used in the loess procedue |
Value
return object of class forecast
Author(s)
Thiyanga Talagala
Unit root test statistics
Description
Computes the test statistics based on unit root tests Phillips–Perron test and KPSS test
Usage
unitroot(y)
Arguments
y |
a univariate time series |
Value
A vector of 3 values: test statistic based on PP-test and KPSS-test
Author(s)
Thiyanga Talagala