Title: | The Tidymodels Extension for Time Series Modeling |
Version: | 1.3.1 |
Description: | The time series forecasting framework for use with the 'tidymodels' ecosystem. Models include ARIMA, Exponential Smoothing, and additional time series models from the 'forecast' and 'prophet' packages. Refer to "Forecasting Principles & Practice, Second edition" (https://otexts.com/fpp2/). Refer to "Prophet: forecasting at scale" (https://research.facebook.com/blog/2017/02/prophet-forecasting-at-scale/.). |
URL: | https://github.com/business-science/modeltime, https://business-science.github.io/modeltime/ |
BugReports: | https://github.com/business-science/modeltime/issues |
License: | MIT + file LICENSE |
Encoding: | UTF-8 |
LazyData: | true |
Depends: | R (≥ 3.5.0) |
Imports: | StanHeaders, timetk (≥ 2.8.1), parsnip (≥ 0.2.1), dials, yardstick (≥ 0.0.8), workflows (≥ 1.0.0), hardhat (≥ 1.0.0), rlang (≥ 0.1.2), glue, plotly, reactable, gt, ggplot2, tibble, tidyr, dplyr (≥ 1.1.0), purrr, stringr, forcats, scales, janitor, parallel, parallelly, doParallel, foreach, magrittr, forecast, xgboost (≥ 1.2.0.1), prophet, methods, cli, tidymodels |
Suggests: | rstan, slider, sparklyr, workflowsets, recipes, rsample, tune (≥ 0.2.0), lubridate, testthat, kernlab, glmnet, thief, smooth, greybox, earth, randomForest, trelliscopejs, knitr, rmarkdown (≥ 2.9), webshot, qpdf, TSrepr |
VignetteBuilder: | knitr |
RoxygenNote: | 7.3.2 |
NeedsCompilation: | no |
Packaged: | 2024-10-22 19:42:04 UTC; mdancho |
Author: | Matt Dancho [aut, cre], Business Science [cph] |
Maintainer: | Matt Dancho <mdancho@business-science.io> |
Repository: | CRAN |
Date/Publication: | 2024-10-22 20:10:02 UTC |
Pipe operator
Description
See magrittr::%>%
for details.
Usage
lhs %>% rhs
Prepare Recursive Transformations
Description
Prepare Recursive Transformations
Usage
.prepare_transform(.transform)
.prepare_panel_transform(.transform)
Arguments
.transform |
A transformation function |
Value
A function that applies a recursive transformation
Bridge prediction function for ADAM models
Description
Bridge prediction function for ADAM models
Usage
Adam_predict_impl(object, new_data, ...)
Arguments
object |
An object of class |
new_data |
A rectangular data object, such as a data frame. |
... |
Additional arguments passed to |
Low-Level ARIMA function for translating modeltime to forecast
Description
Low-Level ARIMA function for translating modeltime to forecast
Usage
Arima_fit_impl(
x,
y,
period = "auto",
p = 0,
d = 0,
q = 0,
P = 0,
D = 0,
Q = 0,
...
)
Arguments
x |
A dataframe of xreg (exogenous regressors) |
y |
A numeric vector of values to fit |
period |
A seasonal frequency. Uses "auto" by default. A character phrase of "auto" or time-based phrase of "2 weeks" can be used if a date or date-time variable is provided. |
p |
The order of the non-seasonal auto-regressive (AR) terms. Often denoted "p" in pdq-notation. |
d |
The order of integration for non-seasonal differencing. Often denoted "d" in pdq-notation. |
q |
The order of the non-seasonal moving average (MA) terms. Often denoted "q" in pdq-notation. |
P |
The order of the seasonal auto-regressive (SAR) terms. Often denoted "P" in PDQ-notation. |
D |
The order of integration for seasonal differencing. Often denoted "D" in PDQ-notation. |
Q |
The order of the seasonal moving average (SMA) terms. Often denoted "Q" in PDQ-notation. |
... |
Additional arguments passed to |
Bridge prediction function for ARIMA models
Description
Bridge prediction function for ARIMA models
Usage
Arima_predict_impl(object, new_data, ...)
Arguments
object |
An object of class |
new_data |
A rectangular data object, such as a data frame. |
... |
Additional arguments passed to |
Bridge prediction function for AUTO ADAM models
Description
Bridge prediction function for AUTO ADAM models
Usage
Auto_adam_predict_impl(object, new_data, ...)
Arguments
object |
An object of class |
new_data |
A rectangular data object, such as a data frame. |
... |
Additional arguments passed to |
Low-Level ADAM function for translating modeltime to forecast
Description
Low-Level ADAM function for translating modeltime to forecast
Usage
adam_fit_impl(
x,
y,
period = "auto",
p = 0,
d = 0,
q = 0,
P = 0,
D = 0,
Q = 0,
model = "ZXZ",
constant = FALSE,
regressors = c("use", "select", "adapt"),
outliers = c("ignore", "use", "select"),
level = 0.99,
occurrence = c("none", "auto", "fixed", "general", "odds-ratio", "inverse-odds-ratio",
"direct"),
distribution = c("default", "dnorm", "dlaplace", "ds", "dgnorm", "dlnorm", "dinvgauss",
"dgamma"),
loss = c("likelihood", "MSE", "MAE", "HAM", "LASSO", "RIDGE", "MSEh", "TMSE", "GTMSE",
"MSCE"),
ic = c("AICc", "AIC", "BIC", "BICc"),
select_order = FALSE,
...
)
Arguments
x |
A data.frame of predictors |
y |
A vector with outcome |
period |
A seasonal frequency. Uses "auto" by default. A character phrase of "auto" or time-based phrase of "2 weeks" can be used if a date or date-time variable is provided. |
p |
The order of the non-seasonal auto-regressive (AR) terms. Often denoted "p" in pdq-notation. |
d |
The order of integration for non-seasonal differencing. Often denoted "d" in pdq-notation. |
q |
The order of the non-seasonal moving average (MA) terms. Often denoted "q" in pdq-notation. |
P |
The order of the seasonal auto-regressive (SAR) terms. Often denoted "P" in PDQ-notation. |
D |
The order of integration for seasonal differencing. Often denoted "D" in PDQ-notation. |
Q |
The order of the seasonal moving average (SMA) terms. Often denoted "Q" in PDQ-notation. |
model |
The type of ETS model. |
constant |
Logical, determining, whether the constant is needed in the model or not. |
regressors |
The variable defines what to do with the provided explanatory variables. |
outliers |
Defines what to do with outliers. |
level |
What confidence level to use for detection of outliers. |
occurrence |
The type of model used in probability estimation. |
distribution |
what density function to assume for the error term. |
loss |
The type of Loss Function used in optimization. |
ic |
The information criterion to use in the model selection / combination procedure. |
select_order |
If TRUE, then the function will select the most appropriate order using a mechanism similar to auto.msarima(), but implemented in auto.adam(). The values list(ar=...,i=...,ma=...) specify the maximum orders to check in this case |
... |
Additional arguments passed to |
Tuning Parameters for ADAM Models
Description
Tuning Parameters for ADAM Models
Usage
ets_model(values = c("ZZZ", "XXX", "YYY", "CCC", "PPP", "FFF"))
loss(
values = c("likelihood", "MSE", "MAE", "HAM", "LASSO", "RIDGE", "TMSE", "GTMSE",
"MSEh", "MSCE")
)
use_constant(values = c(FALSE, TRUE))
regressors_treatment(values = c("use", "select", "adapt"))
outliers_treatment(values = c("ignore", "use", "select"))
probability_model(
values = c("none", "auto", "fixed", "general", "odds-ratio", "inverse-odds-ratio",
"direct")
)
distribution(
values = c("default", "dnorm", "dlaplace", "ds", "dgnorm", "dlnorm", "dinvgauss",
"dgamma")
)
information_criteria(values = c("AICc", "AIC", "BICc", "BIC"))
select_order(values = c(FALSE, TRUE))
Arguments
values |
A character string of possible values. |
Details
The main parameters for ADAM models are:
-
ets_model
:model="ZZZ" means that the model will be selected based on the chosen information criteria type. The Branch and Bound is used in the process.
model="XXX" means that only additive components are tested, using Branch and Bound.
model="YYY" implies selecting between multiplicative components.
model="CCC" triggers the combination of forecasts of models using information criteria weights (Kolassa, 2011).
combinations between these four and the classical components are also accepted. For example, model="CAY" will combine models with additive trend and either none or multiplicative seasonality.
model="PPP" will produce the selection between pure additive and pure multiplicative models. "P" stands for "Pure". This cannot be mixed with other types of components.
model="FFF" will select between all the 30 types of models. "F" stands for "Full". This cannot be mixed with other types of components.
The parameter model can also be a vector of names of models for a finer tuning (pool of models). For example, model=c("ANN","AAA") will estimate only two models and select the best of them.
-
loss
:likelihood - the model is estimated via the maximization of the likelihood of the function specified in distribution;
MSE (Mean Squared Error),
MAE (Mean Absolute Error),
HAM (Half Absolute Moment),
LASSO - use LASSO to shrink the parameters of the model;
RIDGE - use RIDGE to shrink the parameters of the model;
TMSE - Trace Mean Squared Error,
GTMSE - Geometric Trace Mean Squared Error,
MSEh - optimisation using only h-steps ahead error,
MSCE - Mean Squared Cumulative Error.
-
non_seasonal_ar
: The order of the non-seasonal auto-regressive (AR) terms. -
non_seasonal_differences
: The order of integration for non-seasonal differencing. -
non_seasonal_ma
: The order of the non-seasonal moving average (MA) terms. -
seasonal_ar
: The order of the seasonal auto-regressive (SAR) terms. -
seasonal_differences
: The order of integration for seasonal differencing. -
seasonal_ma
: The order of the seasonal moving average (SMA) terms. -
use_constant
: Logical, determining, whether the constant is needed in the model or not. -
regressors_treatment
: The variable defines what to do with the provided explanatory variables. -
outliers_treatment
: Defines what to do with outliers. -
probability_model
: The type of model used in probability estimation. -
distribution
: What density function to assume for the error term. -
information_criteria
: The information criterion to use in the model selection / combination procedure. -
select_order
: If TRUE, then the function will select the most appropriate order.
Value
A dials
parameter
A parameter
A parameter
A parameter
A parameter
A parameter
A parameter
A parameter
A parameter
A parameter
Examples
use_constant()
regressors_treatment()
distribution()
General Interface for ADAM Regression Models
Description
adam_reg()
is a way to generate a specification of an ADAM model
before fitting and allows the model to be created using
different packages. Currently the only package is smooth
.
Usage
adam_reg(
mode = "regression",
ets_model = NULL,
non_seasonal_ar = NULL,
non_seasonal_differences = NULL,
non_seasonal_ma = NULL,
seasonal_ar = NULL,
seasonal_differences = NULL,
seasonal_ma = NULL,
use_constant = NULL,
regressors_treatment = NULL,
outliers_treatment = NULL,
outliers_ci = NULL,
probability_model = NULL,
distribution = NULL,
loss = NULL,
information_criteria = NULL,
seasonal_period = NULL,
select_order = NULL
)
Arguments
mode |
A single character string for the type of model. The only possible value for this model is "regression". |
ets_model |
The type of ETS model. The first letter stands for the type of the error term ("A" or "M"), the second (and sometimes the third as well) is for the trend ("N", "A", "Ad", "M" or "Md"), and the last one is for the type of seasonality ("N", "A" or "M"). |
non_seasonal_ar |
The order of the non-seasonal auto-regressive (AR) terms. Often denoted "p" in pdq-notation. |
non_seasonal_differences |
The order of integration for non-seasonal differencing. Often denoted "d" in pdq-notation. |
non_seasonal_ma |
The order of the non-seasonal moving average (MA) terms. Often denoted "q" in pdq-notation. |
seasonal_ar |
The order of the seasonal auto-regressive (SAR) terms. Often denoted "P" in PDQ-notation. |
seasonal_differences |
The order of integration for seasonal differencing. Often denoted "D" in PDQ-notation. |
seasonal_ma |
The order of the seasonal moving average (SMA) terms. Often denoted "Q" in PDQ-notation. |
use_constant |
Logical, determining, whether the constant is needed in the model or not. This is mainly needed for ARIMA part of the model, but can be used for ETS as well. |
regressors_treatment |
The variable defines what to do with the provided explanatory variables: "use" means that all of the data should be used, while "select" means that a selection using ic should be done, "adapt" will trigger the mechanism of time varying parameters for the explanatory variables. |
outliers_treatment |
Defines what to do with outliers: "ignore", so just returning the model, "detect" outliers based on specified level and include dummies for them in the model, or detect and "select" those of them that reduce ic value. |
outliers_ci |
What confidence level to use for detection of outliers. Default is 99%. |
probability_model |
The type of model used in probability estimation. Can be "none" - none, "fixed" - constant probability, "general" - the general Beta model with two parameters, "odds-ratio" - the Odds-ratio model with b=1 in Beta distribution, "inverse-odds-ratio" - the model with a=1 in Beta distribution, "direct" - the TSB-like (Teunter et al., 2011) probability update mechanism a+b=1, "auto" - the automatically selected type of occurrence model. |
distribution |
what density function to assume for the error term. The full name of the distribution should be provided, starting with the letter "d" - "density". |
loss |
The type of Loss Function used in optimization. |
information_criteria |
The information criterion to use in the model selection / combination procedure. |
seasonal_period |
A seasonal frequency. Uses "auto" by default. A character phrase of "auto" or time-based phrase of "2 weeks" can be used if a date or date-time variable is provided. See Fit Details below. |
select_order |
If |
Details
The data given to the function are not saved and are only used
to determine the mode of the model. For adam_reg()
, the
mode will always be "regression".
The model can be created using the fit()
function using the
following engines:
"auto_adam" (default) - Connects to
smooth::auto.adam()
"adam" - Connects to
smooth::adam()
Main Arguments
The main arguments (tuning parameters) for the model are:
-
seasonal_period
: The periodic nature of the seasonality. Uses "auto" by default. -
non_seasonal_ar
: The order of the non-seasonal auto-regressive (AR) terms. -
non_seasonal_differences
: The order of integration for non-seasonal differencing. -
non_seasonal_ma
: The order of the non-seasonal moving average (MA) terms. -
seasonal_ar
: The order of the seasonal auto-regressive (SAR) terms. -
seasonal_differences
: The order of integration for seasonal differencing. -
seasonal_ma
: The order of the seasonal moving average (SMA) terms. -
ets_model
: The type of ETS model. -
use_constant
: Logical, determining, whether the constant is needed in the model or not. -
regressors_treatment
: The variable defines what to do with the provided explanatory variables. -
outliers_treatment
: Defines what to do with outliers. -
probability_model
: The type of model used in probability estimation. -
distribution
: what density function to assume for the error term. -
loss
: The type of Loss Function used in optimization. -
information_criteria
: The information criterion to use in the model selection / combination procedure.
These arguments are converted to their specific names at the time that the model is fit.
Other options and argument can be
set using set_engine()
(See Engine Details below).
If parameters need to be modified, update()
can be used
in lieu of recreating the object from scratch.
auto_adam (default engine)
The engine uses smooth::auto.adam()
.
Function Parameters:
#> function (data, model = "ZXZ", lags = c(frequency(data)), orders = list(ar = c(3, #> 3), i = c(2, 1), ma = c(3, 3), select = TRUE), formula = NULL, regressors = c("use", #> "select", "adapt"), occurrence = c("none", "auto", "fixed", "general", #> "odds-ratio", "inverse-odds-ratio", "direct"), distribution = c("dnorm", #> "dlaplace", "ds", "dgnorm", "dlnorm", "dinvgauss", "dgamma"), outliers = c("ignore", #> "use", "select"), level = 0.99, h = 0, holdout = FALSE, persistence = NULL, #> phi = NULL, initial = c("optimal", "backcasting", "complete"), arma = NULL, #> ic = c("AICc", "AIC", "BIC", "BICc"), bounds = c("usual", "admissible", #> "none"), silent = TRUE, parallel = FALSE, ...)
The MAXIMUM nonseasonal ARIMA terms (max.p
, max.d
, max.q
) and
seasonal ARIMA terms (max.P
, max.D
, max.Q
) are provided to
forecast::auto.arima()
via arima_reg()
parameters.
Other options and argument can be set using set_engine()
.
Parameter Notes:
All values of nonseasonal pdq and seasonal PDQ are maximums. The
smooth::auto.adam()
model will select a value using these as an upper limit.-
xreg
- This is supplied via the parsnip / modeltimefit()
interface (so don't provide this manually). See Fit Details (below).
adam
The engine uses smooth::adam()
.
Function Parameters:
#> function (data, model = "ZXZ", lags = c(frequency(data)), orders = list(ar = c(0), #> i = c(0), ma = c(0), select = FALSE), constant = FALSE, formula = NULL, #> regressors = c("use", "select", "adapt"), occurrence = c("none", "auto", #> "fixed", "general", "odds-ratio", "inverse-odds-ratio", "direct"), #> distribution = c("default", "dnorm", "dlaplace", "ds", "dgnorm", "dlnorm", #> "dinvgauss", "dgamma"), loss = c("likelihood", "MSE", "MAE", "HAM", #> "LASSO", "RIDGE", "MSEh", "TMSE", "GTMSE", "MSCE"), outliers = c("ignore", #> "use", "select"), level = 0.99, h = 0, holdout = FALSE, persistence = NULL, #> phi = NULL, initial = c("optimal", "backcasting", "complete"), arma = NULL, #> ic = c("AICc", "AIC", "BIC", "BICc"), bounds = c("usual", "admissible", #> "none"), silent = TRUE, ...)
The nonseasonal ARIMA terms (orders
) and seasonal ARIMA terms (orders
)
are provided to smooth::adam()
via adam_reg()
parameters.
Other options and argument can be set using set_engine()
.
Parameter Notes:
-
xreg
- This is supplied via the parsnip / modeltimefit()
interface (so don't provide this manually). See Fit Details (below).
Fit Details
Date and Date-Time Variable
It's a requirement to have a date or date-time variable as a predictor.
The fit()
interface accepts date and date-time features and handles them internally.
-
fit(y ~ date)
Seasonal Period Specification
The period can be non-seasonal (seasonal_period = 1 or "none"
) or
yearly seasonal (e.g. For monthly time stamps, seasonal_period = 12
, seasonal_period = "12 months"
, or seasonal_period = "yearly"
).
There are 3 ways to specify:
-
seasonal_period = "auto"
: A seasonal period is selected based on the periodicity of the data (e.g. 12 if monthly) -
seasonal_period = 12
: A numeric frequency. For example, 12 is common for monthly data -
seasonal_period = "1 year"
: A time-based phrase. For example, "1 year" would convert to 12 for monthly data.
Univariate (No xregs, Exogenous Regressors):
For univariate analysis, you must include a date or date-time feature. Simply use:
Formula Interface (recommended):
fit(y ~ date)
will ignore xreg's.
Multivariate (xregs, Exogenous Regressors)
The xreg
parameter is populated using the fit()
function:
Only
factor
,ordered factor
, andnumeric
data will be used as xregs.Date and Date-time variables are not used as xregs
-
character
data should be converted to factor.
Xreg Example: Suppose you have 3 features:
-
y
(target) -
date
(time stamp), -
month.lbl
(labeled month as a ordered factor).
The month.lbl
is an exogenous regressor that can be passed to the arima_reg()
using
fit()
:
-
fit(y ~ date + month.lbl)
will passmonth.lbl
on as an exogenous regressor.
Note that date or date-time class values are excluded from xreg
.
See Also
fit.model_spec()
, set_engine()
Examples
library(dplyr)
library(parsnip)
library(rsample)
library(timetk)
library(smooth)
# Data
m750 <- m4_monthly %>% filter(id == "M750")
m750
# Split Data 80/20
splits <- initial_time_split(m750, prop = 0.8)
# ---- AUTO ADAM ----
# Model Spec
model_spec <- adam_reg() %>%
set_engine("auto_adam")
# Fit Spec
model_fit <- model_spec %>%
fit(log(value) ~ date, data = training(splits))
model_fit
# ---- STANDARD ADAM ----
# Model Spec
model_spec <- adam_reg(
seasonal_period = 12,
non_seasonal_ar = 3,
non_seasonal_differences = 1,
non_seasonal_ma = 3,
seasonal_ar = 1,
seasonal_differences = 0,
seasonal_ma = 1
) %>%
set_engine("adam")
# Fit Spec
model_fit <- model_spec %>%
fit(log(value) ~ date, data = training(splits))
model_fit
Add a Model into a Modeltime Table
Description
Add a Model into a Modeltime Table
Usage
add_modeltime_model(object, model, location = "bottom")
Arguments
object |
Multiple Modeltime Tables (class |
model |
A model of class |
location |
Where to add the model. Either "top" or "bottom". Default: "bottom". |
See Also
-
combine_modeltime_tables()
: Combine 2 or more Modeltime Tables together -
add_modeltime_model()
: Adds a new row with a new model to a Modeltime Table -
drop_modeltime_model()
: Drop one or more models from a Modeltime Table -
update_modeltime_description()
: Updates a description for a model inside a Modeltime Table -
update_modeltime_model()
: Updates a model inside a Modeltime Table -
pull_modeltime_model()
: Extracts a model from a Modeltime Table
Examples
library(tidymodels)
model_fit_ets <- exp_smoothing() %>%
set_engine("ets") %>%
fit(value ~ date, training(m750_splits))
m750_models %>%
add_modeltime_model(model_fit_ets)
General Interface for "Boosted" ARIMA Regression Models
Description
arima_boost()
is a way to generate a specification of a time series model
that uses boosting to improve modeling errors (residuals) on Exogenous Regressors.
It works with both "automated" ARIMA (auto.arima
) and standard ARIMA (arima
).
The main algorithms are:
Auto ARIMA + XGBoost Errors (engine =
auto_arima_xgboost
, default)ARIMA + XGBoost Errors (engine =
arima_xgboost
)
Usage
arima_boost(
mode = "regression",
seasonal_period = NULL,
non_seasonal_ar = NULL,
non_seasonal_differences = NULL,
non_seasonal_ma = NULL,
seasonal_ar = NULL,
seasonal_differences = NULL,
seasonal_ma = NULL,
mtry = NULL,
trees = NULL,
min_n = NULL,
tree_depth = NULL,
learn_rate = NULL,
loss_reduction = NULL,
sample_size = NULL,
stop_iter = NULL
)
Arguments
mode |
A single character string for the type of model. The only possible value for this model is "regression". |
seasonal_period |
A seasonal frequency. Uses "auto" by default. A character phrase of "auto" or time-based phrase of "2 weeks" can be used if a date or date-time variable is provided. See Fit Details below. |
non_seasonal_ar |
The order of the non-seasonal auto-regressive (AR) terms. Often denoted "p" in pdq-notation. |
non_seasonal_differences |
The order of integration for non-seasonal differencing. Often denoted "d" in pdq-notation. |
non_seasonal_ma |
The order of the non-seasonal moving average (MA) terms. Often denoted "q" in pdq-notation. |
seasonal_ar |
The order of the seasonal auto-regressive (SAR) terms. Often denoted "P" in PDQ-notation. |
seasonal_differences |
The order of integration for seasonal differencing. Often denoted "D" in PDQ-notation. |
seasonal_ma |
The order of the seasonal moving average (SMA) terms. Often denoted "Q" in PDQ-notation. |
mtry |
A number for the number (or proportion) of predictors that will be randomly sampled at each split when creating the tree models (specific engines only). |
trees |
An integer for the number of trees contained in the ensemble. |
min_n |
An integer for the minimum number of data points in a node that is required for the node to be split further. |
tree_depth |
An integer for the maximum depth of the tree (i.e. number of splits) (specific engines only). |
learn_rate |
A number for the rate at which the boosting algorithm adapts from iteration-to-iteration (specific engines only). This is sometimes referred to as the shrinkage parameter. |
loss_reduction |
A number for the reduction in the loss function required to split further (specific engines only). |
sample_size |
number for the number (or proportion) of data that is exposed to the fitting routine. |
stop_iter |
The number of iterations without improvement before
stopping ( |
Details
The data given to the function are not saved and are only used
to determine the mode of the model. For arima_boost()
, the
mode will always be "regression".
The model can be created using the fit()
function using the
following engines:
"auto_arima_xgboost" (default) - Connects to
forecast::auto.arima()
and xgboost::xgb.train"arima_xgboost" - Connects to
forecast::Arima()
and xgboost::xgb.train
Main Arguments
The main arguments (tuning parameters) for the ARIMA model are:
-
seasonal_period
: The periodic nature of the seasonality. Uses "auto" by default. -
non_seasonal_ar
: The order of the non-seasonal auto-regressive (AR) terms. -
non_seasonal_differences
: The order of integration for non-seasonal differencing. -
non_seasonal_ma
: The order of the non-seasonal moving average (MA) terms. -
seasonal_ar
: The order of the seasonal auto-regressive (SAR) terms. -
seasonal_differences
: The order of integration for seasonal differencing. -
seasonal_ma
: The order of the seasonal moving average (SMA) terms.
The main arguments (tuning parameters) for the model XGBoost model are:
-
mtry
: The number of predictors that will be randomly sampled at each split when creating the tree models. -
trees
: The number of trees contained in the ensemble. -
min_n
: The minimum number of data points in a node that are required for the node to be split further. -
tree_depth
: The maximum depth of the tree (i.e. number of splits). -
learn_rate
: The rate at which the boosting algorithm adapts from iteration-to-iteration. -
loss_reduction
: The reduction in the loss function required to split further. -
sample_size
: The amount of data exposed to the fitting routine. -
stop_iter
: The number of iterations without improvement before stopping.
These arguments are converted to their specific names at the time that the model is fit.
Other options and argument can be
set using set_engine()
(See Engine Details below).
If parameters need to be modified, update()
can be used
in lieu of recreating the object from scratch.
Engine Details
The standardized parameter names in modeltime
can be mapped to their original
names in each engine:
Model 1: ARIMA:
modeltime | forecast::auto.arima | forecast::Arima |
seasonal_period | ts(frequency) | ts(frequency) |
non_seasonal_ar, non_seasonal_differences, non_seasonal_ma | max.p(5), max.d(2), max.q(5) | order = c(p(0), d(0), q(0)) |
seasonal_ar, seasonal_differences, seasonal_ma | max.P(2), max.D(1), max.Q(2) | seasonal = c(P(0), D(0), Q(0)) |
Model 2: XGBoost:
modeltime | xgboost::xgb.train |
tree_depth | max_depth (6) |
trees | nrounds (15) |
learn_rate | eta (0.3) |
mtry | colsample_bynode (1) |
min_n | min_child_weight (1) |
loss_reduction | gamma (0) |
sample_size | subsample (1) |
stop_iter | early_stop |
Other options can be set using set_engine()
.
auto_arima_xgboost (default engine)
Model 1: Auto ARIMA (forecast::auto.arima
):
#> function (y, d = NA, D = NA, max.p = 5, max.q = 5, max.P = 2, max.Q = 2, #> max.order = 5, max.d = 2, max.D = 1, start.p = 2, start.q = 2, start.P = 1, #> start.Q = 1, stationary = FALSE, seasonal = TRUE, ic = c("aicc", "aic", #> "bic"), stepwise = TRUE, nmodels = 94, trace = FALSE, approximation = (length(x) > #> 150 | frequency(x) > 12), method = NULL, truncate = NULL, xreg = NULL, #> test = c("kpss", "adf", "pp"), test.args = list(), seasonal.test = c("seas", #> "ocsb", "hegy", "ch"), seasonal.test.args = list(), allowdrift = TRUE, #> allowmean = TRUE, lambda = NULL, biasadj = FALSE, parallel = FALSE, #> num.cores = 2, x = y, ...)
Parameter Notes:
All values of nonseasonal pdq and seasonal PDQ are maximums. The
auto.arima
will select a value using these as an upper limit.-
xreg
- This should not be used since XGBoost will be doing the regression
Model 2: XGBoost (xgboost::xgb.train
):
#> function (params = list(), data, nrounds, watchlist = list(), obj = NULL, #> feval = NULL, verbose = 1, print_every_n = 1L, early_stopping_rounds = NULL, #> maximize = NULL, save_period = NULL, save_name = "xgboost.model", xgb_model = NULL, #> callbacks = list(), ...)
Parameter Notes:
XGBoost uses a
params = list()
to capture. Parsnip / Modeltime automatically sends any args provided as...
inside ofset_engine()
to theparams = list(...)
.
Fit Details
Date and Date-Time Variable
It's a requirement to have a date or date-time variable as a predictor.
The fit()
interface accepts date and date-time features and handles them internally.
-
fit(y ~ date)
Seasonal Period Specification
The period can be non-seasonal (seasonal_period = 1
) or seasonal (e.g. seasonal_period = 12
or seasonal_period = "12 months"
).
There are 3 ways to specify:
-
seasonal_period = "auto"
: A period is selected based on the periodicity of the data (e.g. 12 if monthly) -
seasonal_period = 12
: A numeric frequency. For example, 12 is common for monthly data -
seasonal_period = "1 year"
: A time-based phrase. For example, "1 year" would convert to 12 for monthly data.
Univariate (No xregs, Exogenous Regressors):
For univariate analysis, you must include a date or date-time feature. Simply use:
Formula Interface (recommended):
fit(y ~ date)
will ignore xreg's.XY Interface:
fit_xy(x = data[,"date"], y = data$y)
will ignore xreg's.
Multivariate (xregs, Exogenous Regressors)
The xreg
parameter is populated using the fit()
or fit_xy()
function:
Only
factor
,ordered factor
, andnumeric
data will be used as xregs.Date and Date-time variables are not used as xregs
-
character
data should be converted to factor.
Xreg Example: Suppose you have 3 features:
-
y
(target) -
date
(time stamp), -
month.lbl
(labeled month as a ordered factor).
The month.lbl
is an exogenous regressor that can be passed to the arima_boost()
using
fit()
:
-
fit(y ~ date + month.lbl)
will passmonth.lbl
on as an exogenous regressor. -
fit_xy(data[,c("date", "month.lbl")], y = data$y)
will pass x, where x is a data frame containingmonth.lbl
and thedate
feature. Onlymonth.lbl
will be used as an exogenous regressor.
Note that date or date-time class values are excluded from xreg
.
See Also
fit.model_spec()
, set_engine()
Examples
library(dplyr)
library(lubridate)
library(parsnip)
library(rsample)
library(timetk)
# Data
m750 <- m4_monthly %>% filter(id == "M750")
# Split Data 80/20
splits <- initial_time_split(m750, prop = 0.9)
# MODEL SPEC ----
# Set engine and boosting parameters
model_spec <- arima_boost(
# ARIMA args
seasonal_period = 12,
non_seasonal_ar = 0,
non_seasonal_differences = 1,
non_seasonal_ma = 1,
seasonal_ar = 0,
seasonal_differences = 1,
seasonal_ma = 1,
# XGBoost Args
tree_depth = 6,
learn_rate = 0.1
) %>%
set_engine(engine = "arima_xgboost")
# FIT ----
# Boosting - Happens by adding numeric date and month features
# model_fit_boosted <- model_spec %>%
# fit(value ~ date + as.numeric(date) + month(date, label = TRUE),
# data = training(splits))
# model_fit_boosted
Tuning Parameters for ARIMA Models
Description
Tuning Parameters for ARIMA Models
Usage
non_seasonal_ar(range = c(0L, 5L), trans = NULL)
non_seasonal_differences(range = c(0L, 2L), trans = NULL)
non_seasonal_ma(range = c(0L, 5L), trans = NULL)
seasonal_ar(range = c(0L, 2L), trans = NULL)
seasonal_differences(range = c(0L, 1L), trans = NULL)
seasonal_ma(range = c(0L, 2L), trans = NULL)
Arguments
range |
A two-element vector holding the defaults for the smallest and largest possible values, respectively. If a transformation is specified, these values should be in the transformed units. |
trans |
A |
Details
The main parameters for ARIMA models are:
-
non_seasonal_ar
: The order of the non-seasonal auto-regressive (AR) terms. -
non_seasonal_differences
: The order of integration for non-seasonal differencing. -
non_seasonal_ma
: The order of the non-seasonal moving average (MA) terms. -
seasonal_ar
: The order of the seasonal auto-regressive (SAR) terms. -
seasonal_differences
: The order of integration for seasonal differencing. -
seasonal_ma
: The order of the seasonal moving average (SMA) terms.
Examples
ets_model()
non_seasonal_ar()
non_seasonal_differences()
non_seasonal_ma()
General Interface for ARIMA Regression Models
Description
arima_reg()
is a way to generate a specification of an ARIMA model
before fitting and allows the model to be created using
different packages. Currently the only package is forecast
.
Usage
arima_reg(
mode = "regression",
seasonal_period = NULL,
non_seasonal_ar = NULL,
non_seasonal_differences = NULL,
non_seasonal_ma = NULL,
seasonal_ar = NULL,
seasonal_differences = NULL,
seasonal_ma = NULL
)
Arguments
mode |
A single character string for the type of model. The only possible value for this model is "regression". |
seasonal_period |
A seasonal frequency. Uses "auto" by default. A character phrase of "auto" or time-based phrase of "2 weeks" can be used if a date or date-time variable is provided. See Fit Details below. |
non_seasonal_ar |
The order of the non-seasonal auto-regressive (AR) terms. Often denoted "p" in pdq-notation. |
non_seasonal_differences |
The order of integration for non-seasonal differencing. Often denoted "d" in pdq-notation. |
non_seasonal_ma |
The order of the non-seasonal moving average (MA) terms. Often denoted "q" in pdq-notation. |
seasonal_ar |
The order of the seasonal auto-regressive (SAR) terms. Often denoted "P" in PDQ-notation. |
seasonal_differences |
The order of integration for seasonal differencing. Often denoted "D" in PDQ-notation. |
seasonal_ma |
The order of the seasonal moving average (SMA) terms. Often denoted "Q" in PDQ-notation. |
Details
The data given to the function are not saved and are only used
to determine the mode of the model. For arima_reg()
, the
mode will always be "regression".
The model can be created using the fit()
function using the
following engines:
"auto_arima" (default) - Connects to
forecast::auto.arima()
"arima" - Connects to
forecast::Arima()
Main Arguments
The main arguments (tuning parameters) for the model are:
-
seasonal_period
: The periodic nature of the seasonality. Uses "auto" by default. -
non_seasonal_ar
: The order of the non-seasonal auto-regressive (AR) terms. -
non_seasonal_differences
: The order of integration for non-seasonal differencing. -
non_seasonal_ma
: The order of the non-seasonal moving average (MA) terms. -
seasonal_ar
: The order of the seasonal auto-regressive (SAR) terms. -
seasonal_differences
: The order of integration for seasonal differencing. -
seasonal_ma
: The order of the seasonal moving average (SMA) terms.
These arguments are converted to their specific names at the time that the model is fit.
Other options and argument can be
set using set_engine()
(See Engine Details below).
If parameters need to be modified, update()
can be used
in lieu of recreating the object from scratch.
Engine Details
The standardized parameter names in modeltime
can be mapped to their original
names in each engine:
modeltime | forecast::auto.arima | forecast::Arima |
seasonal_period | ts(frequency) | ts(frequency) |
non_seasonal_ar, non_seasonal_differences, non_seasonal_ma | max.p(5), max.d(2), max.q(5) | order = c(p(0), d(0), q(0)) |
seasonal_ar, seasonal_differences, seasonal_ma | max.P(2), max.D(1), max.Q(2) | seasonal = c(P(0), D(0), Q(0)) |
Other options can be set using set_engine()
.
auto_arima (default engine)
The engine uses forecast::auto.arima()
.
Function Parameters:
#> function (y, d = NA, D = NA, max.p = 5, max.q = 5, max.P = 2, max.Q = 2, #> max.order = 5, max.d = 2, max.D = 1, start.p = 2, start.q = 2, start.P = 1, #> start.Q = 1, stationary = FALSE, seasonal = TRUE, ic = c("aicc", "aic", #> "bic"), stepwise = TRUE, nmodels = 94, trace = FALSE, approximation = (length(x) > #> 150 | frequency(x) > 12), method = NULL, truncate = NULL, xreg = NULL, #> test = c("kpss", "adf", "pp"), test.args = list(), seasonal.test = c("seas", #> "ocsb", "hegy", "ch"), seasonal.test.args = list(), allowdrift = TRUE, #> allowmean = TRUE, lambda = NULL, biasadj = FALSE, parallel = FALSE, #> num.cores = 2, x = y, ...)
The MAXIMUM nonseasonal ARIMA terms (max.p
, max.d
, max.q
) and
seasonal ARIMA terms (max.P
, max.D
, max.Q
) are provided to
forecast::auto.arima()
via arima_reg()
parameters.
Other options and argument can be set using set_engine()
.
Parameter Notes:
All values of nonseasonal pdq and seasonal PDQ are maximums. The
forecast::auto.arima()
model will select a value using these as an upper limit.-
xreg
- This is supplied via the parsnip / modeltimefit()
interface (so don't provide this manually). See Fit Details (below).
arima
The engine uses forecast::Arima()
.
Function Parameters:
#> function (y, order = c(0, 0, 0), seasonal = c(0, 0, 0), xreg = NULL, include.mean = TRUE, #> include.drift = FALSE, include.constant, lambda = model$lambda, biasadj = FALSE, #> method = c("CSS-ML", "ML", "CSS"), model = NULL, x = y, ...)
The nonseasonal ARIMA terms (order
) and seasonal ARIMA terms (seasonal
)
are provided to forecast::Arima()
via arima_reg()
parameters.
Other options and argument can be set using set_engine()
.
Parameter Notes:
-
xreg
- This is supplied via the parsnip / modeltimefit()
interface (so don't provide this manually). See Fit Details (below). -
method
- The default is set to "ML" (Maximum Likelihood). This method is more robust at the expense of speed and possible selections may fail unit root inversion testing. Alternatively, you can addmethod = "CSS-ML"
to evaluate Conditional Sum of Squares for starting values, then Maximium Likelihood.
Fit Details
Date and Date-Time Variable
It's a requirement to have a date or date-time variable as a predictor.
The fit()
interface accepts date and date-time features and handles them internally.
-
fit(y ~ date)
Seasonal Period Specification
The period can be non-seasonal (seasonal_period = 1 or "none"
) or
yearly seasonal (e.g. For monthly time stamps, seasonal_period = 12
, seasonal_period = "12 months"
, or seasonal_period = "yearly"
).
There are 3 ways to specify:
-
seasonal_period = "auto"
: A seasonal period is selected based on the periodicity of the data (e.g. 12 if monthly) -
seasonal_period = 12
: A numeric frequency. For example, 12 is common for monthly data -
seasonal_period = "1 year"
: A time-based phrase. For example, "1 year" would convert to 12 for monthly data.
Univariate (No xregs, Exogenous Regressors):
For univariate analysis, you must include a date or date-time feature. Simply use:
Formula Interface (recommended):
fit(y ~ date)
will ignore xreg's.XY Interface:
fit_xy(x = data[,"date"], y = data$y)
will ignore xreg's.
Multivariate (xregs, Exogenous Regressors)
The xreg
parameter is populated using the fit()
or fit_xy()
function:
Only
factor
,ordered factor
, andnumeric
data will be used as xregs.Date and Date-time variables are not used as xregs
-
character
data should be converted to factor.
Xreg Example: Suppose you have 3 features:
-
y
(target) -
date
(time stamp), -
month.lbl
(labeled month as a ordered factor).
The month.lbl
is an exogenous regressor that can be passed to the arima_reg()
using
fit()
:
-
fit(y ~ date + month.lbl)
will passmonth.lbl
on as an exogenous regressor. -
fit_xy(data[,c("date", "month.lbl")], y = data$y)
will pass x, where x is a data frame containingmonth.lbl
and thedate
feature. Onlymonth.lbl
will be used as an exogenous regressor.
Note that date or date-time class values are excluded from xreg
.
See Also
fit.model_spec()
, set_engine()
Examples
library(dplyr)
library(parsnip)
library(rsample)
library(timetk)
# Data
m750 <- m4_monthly %>% filter(id == "M750")
m750
# Split Data 80/20
splits <- initial_time_split(m750, prop = 0.8)
# ---- AUTO ARIMA ----
# Model Spec
model_spec <- arima_reg() %>%
set_engine("auto_arima")
# Fit Spec
model_fit <- model_spec %>%
fit(log(value) ~ date, data = training(splits))
model_fit
# ---- STANDARD ARIMA ----
# Model Spec
model_spec <- arima_reg(
seasonal_period = 12,
non_seasonal_ar = 3,
non_seasonal_differences = 1,
non_seasonal_ma = 3,
seasonal_ar = 1,
seasonal_differences = 0,
seasonal_ma = 1
) %>%
set_engine("arima")
# Fit Spec
model_fit <- model_spec %>%
fit(log(value) ~ date, data = training(splits))
model_fit
Bridge ARIMA-XGBoost Modeling function
Description
Bridge ARIMA-XGBoost Modeling function
Usage
arima_xgboost_fit_impl(
x,
y,
period = "auto",
p = 0,
d = 0,
q = 0,
P = 0,
D = 0,
Q = 0,
include.mean = TRUE,
include.drift = FALSE,
include.constant,
lambda = model$lambda,
biasadj = FALSE,
method = c("CSS-ML", "ML", "CSS"),
model = NULL,
max_depth = 6,
nrounds = 15,
eta = 0.3,
colsample_bytree = NULL,
colsample_bynode = NULL,
min_child_weight = 1,
gamma = 0,
subsample = 1,
validation = 0,
early_stop = NULL,
...
)
Arguments
x |
A dataframe of xreg (exogenous regressors) |
y |
A numeric vector of values to fit |
period |
A seasonal frequency. Uses "auto" by default. A character phrase of "auto" or time-based phrase of "2 weeks" can be used if a date or date-time variable is provided. |
p |
The order of the non-seasonal auto-regressive (AR) terms. |
d |
The order of integration for non-seasonal differencing. |
q |
The order of the non-seasonal moving average (MA) terms. |
P |
The order of the seasonal auto-regressive (SAR) terms. |
D |
The order of integration for seasonal differencing. |
Q |
The order of the seasonal moving average (SMA) terms. |
include.mean |
Should the ARIMA model include a mean term? The default
is |
include.drift |
Should the ARIMA model include a linear drift term?
(i.e., a linear regression with ARIMA errors is fitted.) The default is
|
include.constant |
If |
lambda |
Box-Cox transformation parameter. If |
biasadj |
Use adjusted back-transformed mean for Box-Cox transformations. If transformed data is used to produce forecasts and fitted values, a regular back transformation will result in median forecasts. If biasadj is TRUE, an adjustment will be made to produce mean forecasts and fitted values. |
method |
Fitting method: maximum likelihood or minimize conditional sum-of-squares. The default (unless there are missing values) is to use conditional-sum-of-squares to find starting values, then maximum likelihood. |
model |
Output from a previous call to |
max_depth |
An integer for the maximum depth of the tree. |
nrounds |
An integer for the number of boosting iterations. |
eta |
A numeric value between zero and one to control the learning rate. |
colsample_bytree |
Subsampling proportion of columns. |
colsample_bynode |
Subsampling proportion of columns for each node
within each tree. See the |
min_child_weight |
A numeric value for the minimum sum of instance weights needed in a child to continue to split. |
gamma |
A number for the minimum loss reduction required to make a further partition on a leaf node of the tree |
subsample |
Subsampling proportion of rows. |
validation |
A positive number. If on |
early_stop |
An integer or |
... |
Additional arguments passed to |
Bridge prediction Function for ARIMA-XGBoost Models
Description
Bridge prediction Function for ARIMA-XGBoost Models
Usage
arima_xgboost_predict_impl(object, new_data, ...)
Arguments
object |
An object of class |
new_data |
A rectangular data object, such as a data frame. |
... |
Additional arguments passed to |
Low-Level ADAM function for translating modeltime to forecast
Description
Low-Level ADAM function for translating modeltime to forecast
Usage
auto_adam_fit_impl(
x,
y,
period = "auto",
p = 0,
d = 0,
q = 0,
P = 0,
D = 0,
Q = 0,
model = "ZXZ",
constant = FALSE,
regressors = c("use", "select", "adapt"),
outliers = c("ignore", "use", "select"),
level = 0.99,
occurrence = c("none", "auto", "fixed", "general", "odds-ratio", "inverse-odds-ratio",
"direct"),
distribution = c("default", "dnorm", "dlaplace", "ds", "dgnorm", "dlnorm", "dinvgauss",
"dgamma"),
loss = c("likelihood", "MSE", "MAE", "HAM", "LASSO", "RIDGE", "MSEh", "TMSE", "GTMSE",
"MSCE"),
ic = c("AICc", "AIC", "BIC", "BICc"),
select_order = FALSE,
...
)
Arguments
x |
A data.frame of predictors |
y |
A vector with outcome |
period |
A seasonal frequency. Uses "auto" by default. A character phrase of "auto" or time-based phrase of "2 weeks" can be used if a date or date-time variable is provided. |
p |
The order of the non-seasonal auto-regressive (AR) terms. Often denoted "p" in pdq-notation. |
d |
The order of integration for non-seasonal differencing. Often denoted "d" in pdq-notation. |
q |
The order of the non-seasonal moving average (MA) terms. Often denoted "q" in pdq-notation. |
P |
The order of the seasonal auto-regressive (SAR) terms. Often denoted "P" in PDQ-notation. |
D |
The order of integration for seasonal differencing. Often denoted "D" in PDQ-notation. |
Q |
The order of the seasonal moving average (SMA) terms. Often denoted "Q" in PDQ-notation. |
model |
The type of ETS model. |
constant |
Logical, determining, whether the constant is needed in the model or not. |
regressors |
The variable defines what to do with the provided explanatory variables. |
outliers |
Defines what to do with outliers. |
level |
What confidence level to use for detection of outliers. |
occurrence |
The type of model used in probability estimation. |
distribution |
what density function to assume for the error term. |
loss |
The type of Loss Function used in optimization. |
ic |
The information criterion to use in the model selection / combination procedure. |
select_order |
If TRUE, then the function will select the most appropriate order using a mechanism similar to auto.msarima(), but implemented in auto.adam(). The values list(ar=...,i=...,ma=...) specify the maximum orders to check in this case. |
... |
Additional arguments passed to |
Low-Level ARIMA function for translating modeltime to forecast
Description
Low-Level ARIMA function for translating modeltime to forecast
Usage
auto_arima_fit_impl(
x,
y,
period = "auto",
max.p = 5,
max.d = 2,
max.q = 5,
max.P = 2,
max.D = 1,
max.Q = 2,
...
)
Arguments
x |
A dataframe of xreg (exogenous regressors) |
y |
A numeric vector of values to fit |
period |
A seasonal frequency. Uses "auto" by default. A character phrase of "auto" or time-based phrase of "2 weeks" can be used if a date or date-time variable is provided. |
max.p |
The maximum order of the non-seasonal auto-regressive (AR) terms. |
max.d |
The maximum order of integration for non-seasonal differencing. |
max.q |
The maximum order of the non-seasonal moving average (MA) terms. |
max.P |
The maximum order of the seasonal auto-regressive (SAR) terms. |
max.D |
The maximum order of integration for seasonal differencing. |
max.Q |
The maximum order of the seasonal moving average (SMA) terms. |
... |
Additional arguments passed to |
Bridge ARIMA-XGBoost Modeling function
Description
Bridge ARIMA-XGBoost Modeling function
Usage
auto_arima_xgboost_fit_impl(
x,
y,
period = "auto",
max.p = 5,
max.d = 2,
max.q = 5,
max.P = 2,
max.D = 1,
max.Q = 2,
max.order = 5,
d = NA,
D = NA,
start.p = 2,
start.q = 2,
start.P = 1,
start.Q = 1,
stationary = FALSE,
seasonal = TRUE,
ic = c("aicc", "aic", "bic"),
stepwise = TRUE,
nmodels = 94,
trace = FALSE,
approximation = (length(x) > 150 | frequency(x) > 12),
method = NULL,
truncate = NULL,
test = c("kpss", "adf", "pp"),
test.args = list(),
seasonal.test = c("seas", "ocsb", "hegy", "ch"),
seasonal.test.args = list(),
allowdrift = TRUE,
allowmean = TRUE,
lambda = NULL,
biasadj = FALSE,
max_depth = 6,
nrounds = 15,
eta = 0.3,
colsample_bytree = NULL,
colsample_bynode = NULL,
min_child_weight = 1,
gamma = 0,
subsample = 1,
validation = 0,
early_stop = NULL,
...
)
Arguments
x |
A dataframe of xreg (exogenous regressors) |
y |
A numeric vector of values to fit |
period |
A seasonal frequency. Uses "auto" by default. A character phrase of "auto" or time-based phrase of "2 weeks" can be used if a date or date-time variable is provided. |
max.p |
The maximum order of the non-seasonal auto-regressive (AR) terms. |
max.d |
The maximum order of integration for non-seasonal differencing. |
max.q |
The maximum order of the non-seasonal moving average (MA) terms. |
max.P |
The maximum order of the seasonal auto-regressive (SAR) terms. |
max.D |
The maximum order of integration for seasonal differencing. |
max.Q |
The maximum order of the seasonal moving average (SMA) terms. |
max.order |
Maximum value of p+q+P+Q if model selection is not stepwise. |
d |
Order of first-differencing. If missing, will choose a value based
on |
D |
Order of seasonal-differencing. If missing, will choose a value
based on |
start.p |
Starting value of p in stepwise procedure. |
start.q |
Starting value of q in stepwise procedure. |
start.P |
Starting value of P in stepwise procedure. |
start.Q |
Starting value of Q in stepwise procedure. |
stationary |
If |
seasonal |
If |
ic |
Information criterion to be used in model selection. |
stepwise |
If |
nmodels |
Maximum number of models considered in the stepwise search. |
trace |
If |
approximation |
If |
method |
fitting method: maximum likelihood or minimize conditional sum-of-squares. The default (unless there are missing values) is to use conditional-sum-of-squares to find starting values, then maximum likelihood. Can be abbreviated. |
truncate |
An integer value indicating how many observations to use in
model selection. The last |
test |
Type of unit root test to use. See |
test.args |
Additional arguments to be passed to the unit root test. |
seasonal.test |
This determines which method is used to select the number of seasonal differences. The default method is to use a measure of seasonal strength computed from an STL decomposition. Other possibilities involve seasonal unit root tests. |
seasonal.test.args |
Additional arguments to be passed to the seasonal
unit root test.
See |
allowdrift |
If |
allowmean |
If |
lambda |
Box-Cox transformation parameter. If |
biasadj |
Use adjusted back-transformed mean for Box-Cox transformations. If transformed data is used to produce forecasts and fitted values, a regular back transformation will result in median forecasts. If biasadj is TRUE, an adjustment will be made to produce mean forecasts and fitted values. |
max_depth |
An integer for the maximum depth of the tree. |
nrounds |
An integer for the number of boosting iterations. |
eta |
A numeric value between zero and one to control the learning rate. |
colsample_bytree |
Subsampling proportion of columns. |
colsample_bynode |
Subsampling proportion of columns for each node
within each tree. See the |
min_child_weight |
A numeric value for the minimum sum of instance weights needed in a child to continue to split. |
gamma |
A number for the minimum loss reduction required to make a further partition on a leaf node of the tree |
subsample |
Subsampling proportion of rows. |
validation |
A positive number. If on |
early_stop |
An integer or |
... |
Additional arguments passed to |
Combine multiple Modeltime Tables into a single Modeltime Table
Description
Combine multiple Modeltime Tables into a single Modeltime Table
Usage
combine_modeltime_tables(...)
Arguments
... |
Multiple Modeltime Tables (class |
Details
This function combines multiple Modeltime Tables.
The
.model_id
will automatically be renumbered to ensure each model has a unique ID.Only the
.model_id
,.model
, and.model_desc
columns will be returned.
Re-Training Models on the Same Datasets
One issue can arise if your models are trained on different datasets.
If your models have been trained on different datasets, you can run
modeltime_refit()
to train all models on the same data.
Re-Calibrating Models
If your data has been calibrated using modeltime_calibrate()
,
the .test
and .calibration_data
columns will be removed.
To re-calibrate, simply run modeltime_calibrate()
on the newly
combined Modeltime Table.
See Also
-
combine_modeltime_tables()
: Combine 2 or more Modeltime Tables together -
add_modeltime_model()
: Adds a new row with a new model to a Modeltime Table -
drop_modeltime_model()
: Drop one or more models from a Modeltime Table -
update_modeltime_description()
: Updates a description for a model inside a Modeltime Table -
update_modeltime_model()
: Updates a model inside a Modeltime Table -
pull_modeltime_model()
: Extracts a model from a Modeltime Table
Examples
library(tidymodels)
library(timetk)
library(dplyr)
library(lubridate)
# Setup
m750 <- m4_monthly %>% filter(id == "M750")
splits <- time_series_split(m750, assess = "3 years", cumulative = TRUE)
model_fit_arima <- arima_reg() %>%
set_engine("auto_arima") %>%
fit(value ~ date, training(splits))
model_fit_prophet <- prophet_reg() %>%
set_engine("prophet") %>%
fit(value ~ date, training(splits))
# Multiple Modeltime Tables
model_tbl_1 <- modeltime_table(model_fit_arima)
model_tbl_2 <- modeltime_table(model_fit_prophet)
# Combine
combine_modeltime_tables(model_tbl_1, model_tbl_2)
Control aspects of the training process
Description
These functions are matched to the associated training functions:
-
control_refit()
: Used withmodeltime_refit()
-
control_fit_workflowset()
: Used withmodeltime_fit_workflowset()
-
control_nested_fit()
: Used withmodeltime_nested_fit()
-
control_nested_refit()
: Used withmodeltime_nested_refit()
-
control_nested_forecast()
: Used withmodeltime_nested_forecast()
Usage
control_refit(verbose = FALSE, allow_par = FALSE, cores = 1, packages = NULL)
control_fit_workflowset(
verbose = FALSE,
allow_par = FALSE,
cores = 1,
packages = NULL
)
control_nested_fit(
verbose = FALSE,
allow_par = FALSE,
cores = 1,
packages = NULL
)
control_nested_refit(
verbose = FALSE,
allow_par = FALSE,
cores = 1,
packages = NULL
)
control_nested_forecast(
verbose = FALSE,
allow_par = FALSE,
cores = 1,
packages = NULL
)
Arguments
verbose |
Logical to control printing. |
allow_par |
Logical to allow parallel computation. Default: |
cores |
Number of cores for computation. If -1, uses all available physical cores.
Default: |
packages |
An optional character string of additional R package names that should be loaded during parallel processing.
|
Value
A List with the control settings.
See Also
Setting Up Parallel Processing:
parallel_start()
, [parallel_stop())]Training Functions: [modeltime_refit()], [modeltime_fit_workflowset()], [modeltime_nested_fit()], [modeltime_nested_refit()]
[parallel_stop())]: R:parallel_stop()) [modeltime_refit()]: R:modeltime_refit() [modeltime_fit_workflowset()]: R:modeltime_fit_workflowset() [modeltime_nested_fit()]: R:modeltime_nested_fit() [modeltime_nested_refit()]: R:modeltime_nested_refit()
Examples
# No parallel processing by default
control_refit()
# Allow parallel processing and use all cores
control_refit(allow_par = TRUE, cores = -1)
# Set verbosity to show additional training information
control_refit(verbose = TRUE)
# Add additional packages used during modeling in parallel processing
# - This is useful if your namespace does not load all needed packages
# to run models.
# - An example is if I use `temporal_hierarchy()`, which depends on the `thief` package
control_refit(allow_par = TRUE, packages = "thief")
Helper to make parsnip
model specs from a dials
parameter grid
Description
Helper to make parsnip
model specs from a dials
parameter grid
Usage
create_model_grid(grid, f_model_spec, engine_name, ..., engine_params = list())
Arguments
grid |
A tibble that forms a grid of parameters to adjust |
f_model_spec |
A function name (quoted or unquoted) that
specifies a |
engine_name |
A name of an engine to use. Gets passed to |
... |
Static parameters that get passed to the f_model_spec |
engine_params |
A |
Details
This is a helper function that combines dials
grids with
parsnip
model specifications. The intent is to make it easier
to generate workflowset
objects for forecast evaluations
with modeltime_fit_workflowset()
.
The process follows:
Generate a grid (hyperparemeter combination)
Use
create_model_grid()
to apply the parameter combinations to a parsnip model spec and engine.
The output contains ".model" column that can be used as a list
of models inside the workflow_set()
function.
Value
Tibble with a new colum named .models
See Also
-
dials::grid_regular()
: For making parameter grids. -
workflowsets::workflow_set()
: For creating aworkflowset
from the.models
list stored in the ".models" column. -
modeltime_fit_workflowset()
: For fitting aworkflowset
to forecast data.
Examples
library(tidymodels)
# Parameters that get optimized
grid_tbl <- grid_regular(
learn_rate(),
levels = 3
)
# Generate model specs
grid_tbl %>%
create_model_grid(
f_model_spec = boost_tree,
engine_name = "xgboost",
# Static boost_tree() args
mode = "regression",
# Static set_engine() args
engine_params = list(
max_depth = 5
)
)
Developer Tools for preparing XREGS (Regressors)
Description
These functions are designed to assist developers in extending the modeltime
package. create_xregs_recipe()
makes it simple to automate conversion
of raw un-encoded features to machine-learning ready features.
Usage
create_xreg_recipe(
data,
prepare = TRUE,
clean_names = TRUE,
dummy_encode = TRUE,
one_hot = FALSE
)
Arguments
data |
A data frame |
prepare |
Whether or not to run |
clean_names |
Uses |
dummy_encode |
Should |
one_hot |
If |
Details
The default recipe contains steps to:
Remove date features
Clean the column names removing spaces and bad characters
Convert ordered factors to regular factors
Convert factors to dummy variables
Remove any variables that have zero variance
Value
A recipe
in either prepared or un-prepared format.
Examples
library(dplyr)
library(timetk)
library(recipes)
library(lubridate)
predictors <- m4_monthly %>%
filter(id == "M750") %>%
select(-value) %>%
mutate(month = month(date, label = TRUE))
predictors
# Create default recipe
xreg_recipe_spec <- create_xreg_recipe(predictors, prepare = TRUE)
# Extracts the preprocessed training data from the recipe (used in your fit function)
juice_xreg_recipe(xreg_recipe_spec)
# Applies the prepared recipe to new data (used in your predict function)
bake_xreg_recipe(xreg_recipe_spec, new_data = predictors)
Low-Level Exponential Smoothing function for translating modeltime to forecast
Description
Low-Level Exponential Smoothing function for translating modeltime to forecast
Usage
croston_fit_impl(x, y, alpha = 0.1, ...)
Arguments
x |
A dataframe of xreg (exogenous regressors) |
y |
A numeric vector of values to fit |
alpha |
Value of alpha. Default value is 0.1. |
... |
Additional arguments passed to |
Bridge prediction function for CROSTON models
Description
Bridge prediction function for CROSTON models
Usage
croston_predict_impl(object, new_data, ...)
Arguments
object |
An object of class |
new_data |
A rectangular data object, such as a data frame. |
... |
Additional arguments passed to |
Drop a Model from a Modeltime Table
Description
Drop a Model from a Modeltime Table
Usage
drop_modeltime_model(object, .model_id)
Arguments
object |
A Modeltime Table (class |
.model_id |
A numeric value matching the .model_id that you want to drop |
See Also
-
combine_modeltime_tables()
: Combine 2 or more Modeltime Tables together -
add_modeltime_model()
: Adds a new row with a new model to a Modeltime Table -
drop_modeltime_model()
: Drop one or more models from a Modeltime Table -
update_modeltime_description()
: Updates a description for a model inside a Modeltime Table -
update_modeltime_model()
: Updates a model inside a Modeltime Table -
pull_modeltime_model()
: Extracts a model from a Modeltime Table
Examples
library(tidymodels)
m750_models %>%
drop_modeltime_model(.model_id = c(2,3))
Low-Level Exponential Smoothing function for translating modeltime to forecast
Description
Low-Level Exponential Smoothing function for translating modeltime to forecast
Usage
ets_fit_impl(
x,
y,
period = "auto",
error = "auto",
trend = "auto",
season = "auto",
damping = "auto",
alpha = NULL,
beta = NULL,
gamma = NULL,
...
)
Arguments
x |
A dataframe of xreg (exogenous regressors) |
y |
A numeric vector of values to fit |
period |
A seasonal frequency. Uses "auto" by default. A character phrase of "auto" or time-based phrase of "2 weeks" can be used if a date or date-time variable is provided. |
error |
The form of the error term: "auto", "additive", or "multiplicative". If the error is multiplicative, the data must be non-negative. |
trend |
The form of the trend term: "auto", "additive", "multiplicative" or "none". |
season |
The form of the seasonal term: "auto", "additive", "multiplicative" or "none". |
damping |
Apply damping to a trend: "auto", "damped", or "none". |
alpha |
Value of alpha. If NULL, it is estimated. |
beta |
Value of beta. If NULL, it is estimated. |
gamma |
Value of gamma. If NULL, it is estimated. |
... |
Additional arguments passed to |
Bridge prediction function for Exponential Smoothing models
Description
Bridge prediction function for Exponential Smoothing models
Usage
ets_predict_impl(object, new_data, ...)
Arguments
object |
An object of class |
new_data |
A rectangular data object, such as a data frame. |
... |
Additional arguments passed to |
General Interface for Exponential Smoothing State Space Models
Description
exp_smoothing()
is a way to generate a specification of an Exponential Smoothing model
before fitting and allows the model to be created using
different packages. Currently the only package is forecast
. Several algorithms are implemented:
ETS - Automated Exponential Smoothing
CROSTON - Croston's forecast is a special case of Exponential Smoothing for intermittent demand
Theta - A special case of Exponential Smoothing with Drift that performed well in the M3 Competition
Usage
exp_smoothing(
mode = "regression",
seasonal_period = NULL,
error = NULL,
trend = NULL,
season = NULL,
damping = NULL,
smooth_level = NULL,
smooth_trend = NULL,
smooth_seasonal = NULL
)
Arguments
mode |
A single character string for the type of model. The only possible value for this model is "regression". |
seasonal_period |
A seasonal frequency. Uses "auto" by default. A character phrase of "auto" or time-based phrase of "2 weeks" can be used if a date or date-time variable is provided. See Fit Details below. |
error |
The form of the error term: "auto", "additive", or "multiplicative". If the error is multiplicative, the data must be non-negative. |
trend |
The form of the trend term: "auto", "additive", "multiplicative" or "none". |
season |
The form of the seasonal term: "auto", "additive", "multiplicative" or "none". |
damping |
Apply damping to a trend: "auto", "damped", or "none". |
smooth_level |
This is often called the "alpha" parameter used as the base level smoothing factor for exponential smoothing models. |
smooth_trend |
This is often called the "beta" parameter used as the trend smoothing factor for exponential smoothing models. |
smooth_seasonal |
This is often called the "gamma" parameter used as the seasonal smoothing factor for exponential smoothing models. |
Details
Models can be created using the following engines:
"ets" (default) - Connects to
forecast::ets()
"croston" - Connects to
forecast::croston()
"theta" - Connects to
forecast::thetaf()
"smooth_es" - Connects to
smooth::es()
Engine Details
The standardized parameter names in modeltime
can be mapped to their original
names in each engine:
modeltime | forecast::ets | forecast::croston() | forecast::thetaf() | smooth::es() |
seasonal_period() | ts(frequency) | ts(frequency) | ts(frequency) | ts(frequency) |
error(), trend(), season() | model ('ZZZ') | NA | NA | model('ZZZ') |
damping() | damped (NULL) | NA | NA | phi |
smooth_level() | alpha (NULL) | alpha (0.1) | NA | persistence(alpha) |
smooth_trend() | beta (NULL) | NA | NA | persistence(beta) |
smooth_seasonal() | gamma (NULL) | NA | NA | persistence(gamma) |
Other options can be set using set_engine()
.
ets (default engine)
The engine uses forecast::ets()
.
Function Parameters:
#> function (y, model = "ZZZ", damped = NULL, alpha = NULL, beta = NULL, gamma = NULL, #> phi = NULL, additive.only = FALSE, lambda = NULL, biasadj = FALSE, #> lower = c(rep(1e-04, 3), 0.8), upper = c(rep(0.9999, 3), 0.98), opt.crit = c("lik", #> "amse", "mse", "sigma", "mae"), nmse = 3, bounds = c("both", "usual", #> "admissible"), ic = c("aicc", "aic", "bic"), restrict = TRUE, allow.multiplicative.trend = FALSE, #> use.initial.values = FALSE, na.action = c("na.contiguous", "na.interp", #> "na.fail"), ...)
The main arguments are model
and damped
are defined using:
-
error()
= "auto", "additive", and "multiplicative" are converted to "Z", "A", and "M" -
trend()
= "auto", "additive", "multiplicative", and "none" are converted to "Z","A","M" and "N" -
season()
= "auto", "additive", "multiplicative", and "none" are converted to "Z","A","M" and "N" -
damping()
- "auto", "damped", "none" are converted to NULL, TRUE, FALSE -
smooth_level()
,smooth_trend()
, andsmooth_seasonal()
are automatically determined if not provided. They are mapped to "alpha", "beta" and "gamma", respectively.
By default, all arguments are set to "auto" to perform automated Exponential Smoothing using
in-sample data following the underlying forecast::ets()
automation routine.
Other options and argument can be set using set_engine()
.
Parameter Notes:
-
xreg
- This model is not set up to use exogenous regressors. Only univariate models will be fit.
croston
The engine uses forecast::croston()
.
Function Parameters:
#> function (y, h = 10, alpha = 0.1, x = y)
The main arguments are defined using:
-
smooth_level()
: The "alpha" parameter
Parameter Notes:
-
xreg
- This model is not set up to use exogenous regressors. Only univariate models will be fit.
theta
The engine uses forecast::thetaf()
Parameter Notes:
-
xreg
- This model is not set up to use exogenous regressors. Only univariate models will be fit.
smooth_es
The engine uses smooth::es()
.
Function Parameters:
#> function (y, model = "ZZZ", lags = c(frequency(y)), persistence = NULL, #> phi = NULL, initial = c("optimal", "backcasting", "complete"), initialSeason = NULL, #> ic = c("AICc", "AIC", "BIC", "BICc"), loss = c("likelihood", "MSE", #> "MAE", "HAM", "MSEh", "TMSE", "GTMSE", "MSCE"), h = 10, holdout = FALSE, #> bounds = c("usual", "admissible", "none"), silent = TRUE, xreg = NULL, #> regressors = c("use", "select"), initialX = NULL, ...)
The main arguments model
and phi
are defined using:
-
error()
= "auto", "additive" and "multiplicative" are converted to "Z", "A" and "M" -
trend()
= "auto", "additive", "multiplicative", "additive_damped", "multiplicative_damped" and "none" are converted to "Z", "A", "M", "Ad", "Md" and "N". -
season()
= "auto", "additive", "multiplicative", and "none" are converted "Z", "A","M" and "N" -
damping()
- Value of damping parameter. If NULL, then it is estimated. -
smooth_level()
,smooth_trend()
, andsmooth_seasonal()
are automatically determined if not provided. They are mapped to "persistence"("alpha", "beta" and "gamma", respectively).
By default, all arguments are set to "auto" to perform automated Exponential Smoothing using
in-sample data following the underlying smooth::es()
automation routine.
Other options and argument can be set using set_engine()
.
Parameter Notes:
-
xreg
- This is supplied via the parsnip / modeltimefit()
interface (so don't provide this manually). See Fit Details (below).
Fit Details
Date and Date-Time Variable
It's a requirement to have a date or date-time variable as a predictor.
The fit()
interface accepts date and date-time features and handles them internally.
-
fit(y ~ date)
Seasonal Period Specification
The period can be non-seasonal (seasonal_period = 1
or "none"
) or seasonal (e.g. seasonal_period = 12
or seasonal_period = "12 months"
).
There are 3 ways to specify:
-
seasonal_period = "auto"
: A period is selected based on the periodicity of the data (e.g. 12 if monthly) -
seasonal_period = 12
: A numeric frequency. For example, 12 is common for monthly data -
seasonal_period = "1 year"
: A time-based phrase. For example, "1 year" would convert to 12 for monthly data.
Univariate:
For univariate analysis, you must include a date or date-time feature. Simply use:
Formula Interface (recommended):
fit(y ~ date)
will ignore xreg's.XY Interface:
fit_xy(x = data[,"date"], y = data$y)
will ignore xreg's.
Multivariate (xregs, Exogenous Regressors)
Just for smooth
engine.
The xreg
parameter is populated using the fit()
or fit_xy()
function:
Only
factor
,ordered factor
, andnumeric
data will be used as xregs.Date and Date-time variables are not used as xregs
-
character
data should be converted to factor.
Xreg Example: Suppose you have 3 features:
-
y
(target) -
date
(time stamp), -
month.lbl
(labeled month as a ordered factor).
The month.lbl
is an exogenous regressor that can be passed to the arima_reg()
using
fit()
:
-
fit(y ~ date + month.lbl)
will passmonth.lbl
on as an exogenous regressor. -
fit_xy(data[,c("date", "month.lbl")], y = data$y)
will pass x, where x is a data frame containingmonth.lbl
and thedate
feature. Onlymonth.lbl
will be used as an exogenous regressor.
Note that date or date-time class values are excluded from xreg
.
See Also
fit.model_spec()
, set_engine()
Examples
library(dplyr)
library(parsnip)
library(rsample)
library(timetk)
library(smooth)
# Data
m750 <- m4_monthly %>% filter(id == "M750")
m750
# Split Data 80/20
splits <- initial_time_split(m750, prop = 0.8)
# ---- AUTO ETS ----
# Model Spec - The default parameters are all set
# to "auto" if none are provided
model_spec <- exp_smoothing() %>%
set_engine("ets")
# Fit Spec
model_fit <- model_spec %>%
fit(log(value) ~ date, data = training(splits))
model_fit
# ---- STANDARD ETS ----
# Model Spec
model_spec <- exp_smoothing(
seasonal_period = 12,
error = "multiplicative",
trend = "additive",
season = "multiplicative"
) %>%
set_engine("ets")
# Fit Spec
model_fit <- model_spec %>%
fit(log(value) ~ date, data = training(splits))
model_fit
# ---- CROSTON ----
# Model Spec
model_spec <- exp_smoothing(
smooth_level = 0.2
) %>%
set_engine("croston")
# Fit Spec
model_fit <- model_spec %>%
fit(log(value) ~ date, data = training(splits))
model_fit
# ---- THETA ----
#' # Model Spec
model_spec <- exp_smoothing() %>%
set_engine("theta")
# Fit Spec
model_fit <- model_spec %>%
fit(log(value) ~ date, data = training(splits))
model_fit
#' # ---- SMOOTH ----
#' # Model Spec
model_spec <- exp_smoothing(
seasonal_period = 12,
error = "multiplicative",
trend = "additive_damped",
season = "additive"
) %>%
set_engine("smooth_es")
# Fit Spec
model_fit <- model_spec %>%
fit(value ~ date, data = training(splits))
model_fit
Tuning Parameters for Exponential Smoothing Models
Description
Tuning Parameters for Exponential Smoothing Models
Usage
error(values = c("additive", "multiplicative"))
trend(values = c("additive", "multiplicative", "none"))
trend_smooth(
values = c("additive", "multiplicative", "none", "additive_damped",
"multiplicative_damped")
)
season(values = c("additive", "multiplicative", "none"))
damping(values = c("none", "damped"))
damping_smooth(range = c(0, 2), trans = NULL)
smooth_level(range = c(0, 1), trans = NULL)
smooth_trend(range = c(0, 1), trans = NULL)
smooth_seasonal(range = c(0, 1), trans = NULL)
Arguments
values |
A character string of possible values. |
range |
A two-element vector holding the defaults for the smallest and largest possible values, respectively. If a transformation is specified, these values should be in the transformed units. |
trans |
A |
Details
The main parameters for Exponential Smoothing models are:
-
error
: The form of the error term: additive", or "multiplicative". If the error is multiplicative, the data must be non-negative. -
trend
: The form of the trend term: "additive", "multiplicative" or "none". -
season
: The form of the seasonal term: "additive", "multiplicative" or "none".. -
damping
: Apply damping to a trend: "damped", or "none". -
smooth_level
: This is often called the "alpha" parameter used as the base level smoothing factor for exponential smoothing models. -
smooth_trend
: This is often called the "beta" parameter used as the trend smoothing factor for exponential smoothing models. -
smooth_seasonal
: This is often called the "gamma" parameter used as the seasonal smoothing factor for exponential smoothing models.
Examples
error()
trend()
season()
Get model descriptions for Arima objects
Description
Get model descriptions for Arima objects
Usage
get_arima_description(object, padding = FALSE)
Arguments
object |
Objects of class |
padding |
Whether or not to include padding |
Source
Forecast R Package,
forecast:::arima.string()
Examples
library(forecast)
arima_fit <- forecast::Arima(1:10)
get_arima_description(arima_fit)
Get model descriptions for parsnip, workflows & modeltime objects
Description
Get model descriptions for parsnip, workflows & modeltime objects
Usage
get_model_description(object, indicate_training = FALSE, upper_case = TRUE)
Arguments
object |
Parsnip or workflow objects |
indicate_training |
Whether or not to indicate if the model has been trained |
upper_case |
Whether to return upper or lower case model descriptions |
Examples
library(dplyr)
library(timetk)
library(parsnip)
# Model Specification ----
arima_spec <- arima_reg() %>%
set_engine("auto_arima")
get_model_description(arima_spec, indicate_training = TRUE)
# Fitted Model ----
m750 <- m4_monthly %>% filter(id == "M750")
arima_fit <- arima_spec %>%
fit(value ~ date, data = m750)
get_model_description(arima_fit, indicate_training = TRUE)
Get model descriptions for TBATS objects
Description
Get model descriptions for TBATS objects
Usage
get_tbats_description(object)
Arguments
object |
Objects of class |
Source
Forecast R Package,
forecast:::as.character.tbats()
Test if a Modeltime Table has been calibrated
Description
This function returns TRUE
for objects that contains columns
".type" and ".calibration_data"
Usage
is_calibrated(object)
Arguments
object |
An object to detect if is a Calibrated Modeltime Table |
Test if object contains a fitted modeltime model
Description
This function returns TRUE
for trained workflows and parsnip objects
that contain modeltime models
Usage
is_modeltime_model(object)
Arguments
object |
An object to detect if contains a fitted modeltime model |
Test if object is a Modeltime Table
Description
This function returns TRUE
for objects that contain class mdl_time_tbl
Usage
is_modeltime_table(object)
Arguments
object |
An object to detect if is a Modeltime Table |
Test if a table contains residuals.
Description
This function returns TRUE
for objects that contains the column name '.residuals'.
Usage
is_residuals(object)
Arguments
object |
An object to detect if it provides from modeltime::modeltime_residuals(). |
These are not intended for use by the general public.
Description
These are not intended for use by the general public.
Usage
load_namespace(x, full_load)
Arguments
x |
A vector |
full_load |
A vector |
Value
Control information
Log Extractor Functions for Modeltime Nested Tables
Description
Extract logged information calculated during the modeltime_nested_fit()
,
modeltime_nested_select_best()
, and modeltime_nested_refit()
processes.
Usage
extract_nested_test_accuracy(object)
extract_nested_test_forecast(object, .include_actual = TRUE, .id_subset = NULL)
extract_nested_error_report(object)
extract_nested_best_model_report(object)
extract_nested_future_forecast(
object,
.include_actual = TRUE,
.id_subset = NULL
)
extract_nested_modeltime_table(object, .row_id = 1)
extract_nested_train_split(object, .row_id = 1)
extract_nested_test_split(object, .row_id = 1)
Arguments
object |
A nested modeltime table |
.include_actual |
Whether or not to include the actual data in the extracted forecast. Default: TRUE. |
.id_subset |
Can supply a vector of id's to extract forcasts for one or more id's,
rather than extracting all forecasts. If |
.row_id |
The row number to extract from the nested data. |
The 750th Monthly Time Series used in the M4 Competition
Description
The 750th Monthly Time Series used in the M4 Competition
Usage
m750
Format
A tibble
with 306 rows and 3 variables:
-
id
Factor. Unique series identifier -
date
Date. Timestamp information. Monthly format. -
value
Numeric. Value at the corresponding timestamp.
Source
M4 Competition Website: https://www.unic.ac.cy/iff/research/forecasting/m-competitions/m4/
Examples
m750
Three (3) Models trained on the M750 Data (Training Set)
Description
Three (3) Models trained on the M750 Data (Training Set)
Usage
m750_models
Format
An time_series_cv
object with 6 slices of Time Series Cross Validation
resamples made on the training(m750_splits)
Details
m750_models <- modeltime_table( wflw_fit_arima, wflw_fit_prophet, wflw_fit_glmnet )
Examples
m750_models
The results of train/test splitting the M750 Data
Description
The results of train/test splitting the M750 Data
Usage
m750_splits
Format
An rsplit
object split into approximately 23.5-years of training data
and 2-years of testing data
Details
library(timetk) m750_splits <- time_series_split(m750, assess = "2 years", cumulative = TRUE)
Examples
library(rsample)
m750_splits
training(m750_splits)
The Time Series Cross Validation Resamples the M750 Data (Training Set)
Description
The Time Series Cross Validation Resamples the M750 Data (Training Set)
Usage
m750_training_resamples
Format
An time_series_cv
object with 6 slices of Time Series Cross Validation
resamples made on the training(m750_splits)
Details
library(timetk) m750_training_resamples <- time_series_cv( data = training(m750_splits), assess = "2 years", skip = "2 years", cumulative = TRUE, slice_limit = 6 )
Examples
library(rsample)
m750_training_resamples
Mean Arctangent Absolute Percentage Error
Description
Useful when MAPE returns Inf typically due to intermittent data containing zeros.
This is a wrapper to the function of TSrepr::maape()
.
Usage
maape(data, ...)
Arguments
data |
A |
... |
Not currently in use. |
Mean Arctangent Absolute Percentage Error
Description
This is basically a wrapper to the function of TSrepr::maape()
.
Usage
maape_vec(truth, estimate, na_rm = TRUE, ...)
Arguments
truth |
The column identifier for the true results (that is numeric). |
estimate |
The column identifier for the predicted results (that is also numeric). |
na_rm |
Not in use... |
... |
Not currently in use |
Generate a Time Series Train/Test Split Indicies
Description
Makes fast train/test split indicies for time series.
Usage
make_ts_splits(.data, .length_test, .length_train = NULL)
Arguments
.data |
A data frame containing ordered time seried data (ascending) |
.length_test |
The number of rows to include in the test set |
.length_train |
Optional. The number of rows to include in the training set. If NULL, returns all remaining row indicies. |
Value
A list containing train_idx and test_idx
Modeltime Forecast Helpers
Description
Used for low-level forecasting of modeltime, parnsip and workflow models. These functions are not intended for user use.
Usage
mdl_time_forecast(
object,
calibration_data,
new_data = NULL,
h = NULL,
actual_data = NULL,
bind_actual = TRUE,
keep_data = FALSE,
arrange_index = FALSE,
...
)
Arguments
object |
A Modeltime Table |
calibration_data |
Data that has been calibrated from a testing set |
new_data |
A |
h |
The forecast horizon (can be used instead of |
actual_data |
Reference data that is combined with the output tibble and given a |
bind_actual |
Logical. Whether or not to skip rowwise binding of 'actual_data“ |
keep_data |
Whether or not to keep the |
arrange_index |
Whether or not to sort the index in rowwise chronological order (oldest to newest) or to
keep the original order of the data.
Default: |
... |
Not currently used |
Value
A tibble with forecast features
Modeltime Refit Helpers
Description
Used for low-level refitting of modeltime, parnsip and workflow models These functions are not intended for user use.
Usage
mdl_time_refit(object, data, ..., control = NULL)
Arguments
object |
A Modeltime Table |
data |
A |
... |
Additional arguments to control refitting. Ensemble Model Spec ( When making a meta-learner with |
control |
Used to control verbosity and parallel processing.
See |
Value
A tibble with forecast features
Forecast Accuracy Metrics Sets
Description
This is a wrapper for metric_set()
with several common forecast / regression
accuracy metrics included. These are the default time series accuracy
metrics used with modeltime_accuracy()
.
Usage
default_forecast_accuracy_metric_set(...)
extended_forecast_accuracy_metric_set(...)
Arguments
... |
Add additional |
Default Forecast Accuracy Metric Set
The primary purpose is to use the default accuracy metrics to calculate the following
forecast accuracy metrics using modeltime_accuracy()
:
MAE - Mean absolute error,
mae()
MAPE - Mean absolute percentage error,
mape()
MASE - Mean absolute scaled error,
mase()
SMAPE - Symmetric mean absolute percentage error,
smape()
RMSE - Root mean squared error,
rmse()
RSQ - R-squared,
rsq()
Adding additional metrics is possible via ...
.
Extended Forecast Accuracy Metric Set
Extends the default metric set by adding:
MAAPE - Mean Arctangent Absolute Percentage Error,
maape()
. MAAPE is designed for intermittent data where MAPE returnsInf
.
See Also
-
yardstick::metric_tweak()
- For modifyingyardstick
metrics
Examples
library(tibble)
library(dplyr)
library(timetk)
library(yardstick)
fake_data <- tibble(
y = c(1:12, 2*1:12),
yhat = c(1 + 1:12, 2*1:12 - 1)
)
# ---- HOW IT WORKS ----
# Default Forecast Accuracy Metric Specification
default_forecast_accuracy_metric_set()
# Create a metric summarizer function from the metric set
calc_default_metrics <- default_forecast_accuracy_metric_set()
# Apply the metric summarizer to new data
calc_default_metrics(fake_data, y, yhat)
# ---- ADD MORE PARAMETERS ----
# Can create a version of mase() with seasonality = 12 (monthly)
mase12 <- metric_tweak(.name = "mase12", .fn = mase, m = 12)
# Add it to the default metric set
my_metric_set <- default_forecast_accuracy_metric_set(mase12)
my_metric_set
# Apply the newly created metric set
my_metric_set(fake_data, y, yhat)
Calculate Accuracy Metrics
Description
This is a wrapper for yardstick
that simplifies time series regression accuracy metric
calculations from a fitted workflow
(trained workflow) or model_fit
(trained parsnip model).
Usage
modeltime_accuracy(
object,
new_data = NULL,
metric_set = default_forecast_accuracy_metric_set(),
acc_by_id = FALSE,
quiet = TRUE,
...
)
Arguments
object |
A Modeltime Table |
new_data |
A |
metric_set |
A |
acc_by_id |
Should a global or local model accuracy be produced? (Default: FALSE)
|
quiet |
Hide errors ( |
... |
If |
Details
The following accuracy metrics are included by default via default_forecast_accuracy_metric_set()
:
MAE - Mean absolute error,
mae()
MAPE - Mean absolute percentage error,
mape()
MASE - Mean absolute scaled error,
mase()
SMAPE - Symmetric mean absolute percentage error,
smape()
RMSE - Root mean squared error,
rmse()
RSQ - R-squared,
rsq()
Value
A tibble with accuracy estimates.
Examples
library(tidymodels)
library(dplyr)
library(lubridate)
library(timetk)
# Data
m750 <- m4_monthly %>% filter(id == "M750")
# Split Data 80/20
splits <- initial_time_split(m750, prop = 0.8)
# --- MODELS ---
# Model 1: prophet ----
model_fit_prophet <- prophet_reg() %>%
set_engine(engine = "prophet") %>%
fit(value ~ date, data = training(splits))
# ---- MODELTIME TABLE ----
models_tbl <- modeltime_table(
model_fit_prophet
)
# ---- ACCURACY ----
models_tbl %>%
modeltime_calibrate(new_data = testing(splits)) %>%
modeltime_accuracy(
metric_set = metric_set(mae, rmse, rsq)
)
Preparation for forecasting
Description
Calibration sets the stage for accuracy and forecast confidence by computing predictions and residuals from out of sample data.
Usage
modeltime_calibrate(object, new_data, id = NULL, quiet = TRUE, ...)
Arguments
object |
A fitted model object that is either:
|
new_data |
A test data set |
id |
A quoted column name containing an identifier column identifying time series that are grouped. |
quiet |
Hide errors ( |
... |
Additional arguments passed to |
Details
The results of calibration are used for:
-
Forecast Confidence Interval Estimation: The out of sample residual data is used to calculate the confidence interval. Refer to
modeltime_forecast()
. -
Accuracy Calculations: The out of sample actual and prediction values are used to calculate performance metrics. Refer to
modeltime_accuracy()
The calibration steps include:
If not a Modeltime Table, objects are converted to Modeltime Tables internally
Two Columns are added:
-
.type
: Indicates the sample type. This is:"Test" if predicted, or
"Fitted" if residuals were stored during modeling.
-
.calibration_data
:Contains a tibble with Timestamps, Actual Values, Predictions and Residuals calculated from
new_data
(Test Data)If
id
is provided, will contain a 5th column that is the identifier variable.
Value
A Modeltime Table (mdl_time_tbl
) with nested .calibration_data
added
Examples
library(dplyr)
library(lubridate)
library(timetk)
library(parsnip)
library(rsample)
# Data
m750 <- m4_monthly %>% filter(id == "M750")
# Split Data 80/20
splits <- initial_time_split(m750, prop = 0.8)
# --- MODELS ---
# Model 1: prophet ----
model_fit_prophet <- prophet_reg() %>%
set_engine(engine = "prophet") %>%
fit(value ~ date, data = training(splits))
# ---- MODELTIME TABLE ----
models_tbl <- modeltime_table(
model_fit_prophet
)
# ---- CALIBRATE ----
calibration_tbl <- models_tbl %>%
modeltime_calibrate(
new_data = testing(splits)
)
# ---- ACCURACY ----
calibration_tbl %>%
modeltime_accuracy()
# ---- FORECAST ----
calibration_tbl %>%
modeltime_forecast(
new_data = testing(splits),
actual_data = m750
)
Fit a workflowset
object to one or multiple time series
Description
This is a wrapper for fit()
that takes a
workflowset
object and fits each model on one or multiple
time series either sequentially or in parallel.
Usage
modeltime_fit_workflowset(
object,
data,
...,
control = control_fit_workflowset()
)
Arguments
object |
A workflow_set object, generated with the workflowsets::workflow_set function. |
data |
A |
... |
Not currently used. |
control |
An object used to modify the fitting process. See |
Value
A Modeltime Table containing one or more fitted models.
See Also
Examples
library(tidymodels)
library(workflowsets)
library(dplyr)
library(lubridate)
library(timetk)
data_set <- m4_monthly
# SETUP WORKFLOWSETS
rec1 <- recipe(value ~ date + id, data_set) %>%
step_mutate(date_num = as.numeric(date)) %>%
step_mutate(month_lbl = lubridate::month(date, label = TRUE)) %>%
step_dummy(all_nominal(), one_hot = TRUE)
mod1 <- linear_reg() %>% set_engine("lm")
mod2 <- prophet_reg() %>% set_engine("prophet")
wfsets <- workflowsets::workflow_set(
preproc = list(rec1 = rec1),
models = list(
mod1 = mod1,
mod2 = mod2
),
cross = TRUE
)
# FIT WORKFLOWSETS
# - Returns a Modeltime Table with fitted workflowsets
wfsets %>% modeltime_fit_workflowset(data_set)
Forecast future data
Description
The goal of modeltime_forecast()
is to simplify the process of
forecasting future data.
Usage
modeltime_forecast(
object,
new_data = NULL,
h = NULL,
actual_data = NULL,
conf_interval = 0.95,
conf_by_id = FALSE,
conf_method = "conformal_default",
keep_data = FALSE,
arrange_index = FALSE,
...
)
Arguments
object |
A Modeltime Table |
new_data |
A |
h |
The forecast horizon (can be used instead of |
actual_data |
Reference data that is combined with the output tibble and given a |
conf_interval |
An estimated confidence interval based on the calibration data. This is designed to estimate future confidence from out-of-sample prediction error. |
conf_by_id |
Whether or not to produce confidence interval estimates by an ID feature.
|
conf_method |
Algorithm used to produce confidence intervals. All CI's are Conformal Predictions. Choose one of:
|
keep_data |
Whether or not to keep the |
arrange_index |
Whether or not to sort the index in rowwise chronological order (oldest to newest) or to
keep the original order of the data.
Default: |
... |
Not currently used |
Details
The modeltime_forecast()
function prepares a forecast for visualization with
with plot_modeltime_forecast()
. The forecast is controlled by new_data
or h
,
which can be combined with existing data (controlled by actual_data
).
Confidence intervals are included if the incoming Modeltime Table has been
calibrated using modeltime_calibrate()
.
Otherwise confidence intervals are not estimated.
New Data
When forecasting you can specify future data using new_data
.
This is a future tibble with date column and columns for xregs
extending the trained dates and exogonous regressors (xregs) if used.
-
Forecasting Evaluation Data: By default, the
new_data
will use the.calibration_data
ifnew_data
is not provided. This is the equivalent of usingrsample::testing()
for getting test data sets. -
Forecasting Future Data: See
timetk::future_frame()
for creating future tibbles. -
Xregs: Can be used with this method
H (Horizon)
When forecasting, you can specify h
. This is a phrase like "1 year",
which extends the .calibration_data
(1st priority) or the actual_data
(2nd priority)
into the future.
-
Forecasting Future Data: All forecasts using
h
are extended after the calibration data or actual_data. Extending
.calibration_data
- Calibration data is given 1st priority, which is desirable after refitting withmodeltime_refit()
. Internally, a call is made totimetk::future_frame()
to expedite creating new data using the date feature.Extending
actual_data
- Ifh
is provided, and the modeltime table has not been calibrated, the "actual_data" will be extended into the future. This is useful in situations where you want to go directly frommodeltime_table()
tomodeltime_forecast()
without calibrating or refitting.-
Xregs: Cannot be used because future data must include new xregs. If xregs are desired, build a future data frame and use
new_data
.
Actual Data
This is reference data that contains the true values of the time-stamp data. It helps in visualizing the performance of the forecast vs the actual data.
When h
is used and the Modeltime Table has not been calibrated, then the
actual data is extended into the future periods that are defined by h
.
Confidence Interval Estimation
Confidence intervals (.conf_lo
, .conf_hi
) are estimated based on the normal estimation of
the testing errors (out of sample) from modeltime_calibrate()
.
The out-of-sample error estimates are then carried through and
applied to applied to any future forecasts.
The confidence interval can be adjusted with the conf_interval
parameter. The algorithm used
to produce confidence intervals can be changed with the conf_method
parameter.
Conformal Default Method:
When conf_method = "conformal_default"
(default), this method uses qnorm()
to produce a 95% confidence interval by default. It estimates a normal (Gaussian distribution)
based on the out-of-sample errors (residuals).
The confidence interval is mean-adjusted, meaning that if the mean of the residuals is non-zero, the confidence interval is adjusted to widen the interval to capture the difference in means.
Conformal Split Method:
When conf_method = "conformal_split
, this method uses the split conformal inference method
described by Lei et al (2018). This is also implemented in the probably
R package's
int_conformal_split()
function.
What happens to the confidence interval after refitting models?
Refitting has no affect on the confidence interval since this is calculated independently of the refitted model. New observations typically improve future accuracy, which in most cases makes the out-of-sample confidence intervals conservative.
Keep Data
Include the new data (and actual data) as extra columns with the results of the model forecasts. This can be helpful when the new data includes information useful to the forecasts. An example is when forecasting Panel Data and the new data contains ID features related to the time series group that the forecast belongs to.
Arrange Index
By default, modeltime_forecast()
keeps the original order of the data.
If desired, the user can sort the output by .key
, .model_id
and .index
.
Value
A tibble with predictions and time-stamp data. For ease of plotting and calculations, the column names are transformed to:
-
.key
: Values labeled either "prediction" or "actual" -
.index
: The timestamp index. -
.value
: The value being forecasted.
Additionally, if the Modeltime Table has been previously calibrated using modeltime_calibrate()
,
you will gain confidence intervals.
-
.conf_lo
: The lower limit of the confidence interval. -
.conf_hi
: The upper limit of the confidence interval.
Additional descriptive columns are included:
-
.model_id
: Model ID from the Modeltime Table -
.model_desc
: Model Description from the Modeltime Table
Unnecessary columns are dropped to save space:
-
.model
-
.calibration_data
References
Lei, Jing, et al. "Distribution-free predictive inference for regression." Journal of the American Statistical Association 113.523 (2018): 1094-1111.
Examples
library(dplyr)
library(timetk)
library(parsnip)
library(rsample)
# Data
m750 <- m4_monthly %>% filter(id == "M750")
# Split Data 80/20
splits <- initial_time_split(m750, prop = 0.9)
# --- MODELS ---
# Model 1: prophet ----
model_fit_prophet <- prophet_reg() %>%
set_engine(engine = "prophet") %>%
fit(value ~ date, data = training(splits))
# ---- MODELTIME TABLE ----
models_tbl <- modeltime_table(
model_fit_prophet
)
# ---- CALIBRATE ----
calibration_tbl <- models_tbl %>%
modeltime_calibrate(new_data = testing(splits))
# ---- ACCURACY ----
calibration_tbl %>%
modeltime_accuracy()
# ---- FUTURE FORECAST ----
calibration_tbl %>%
modeltime_forecast(
new_data = testing(splits),
actual_data = m750
)
# ---- ALTERNATIVE: FORECAST WITHOUT CONFIDENCE INTERVALS ----
# Skips Calibration Step, No Confidence Intervals
models_tbl %>%
modeltime_forecast(
new_data = testing(splits),
actual_data = m750
)
# ---- KEEP NEW DATA WITH FORECAST ----
# Keeps the new data. Useful if new data has information
# like ID features that should be kept with the forecast data
calibration_tbl %>%
modeltime_forecast(
new_data = testing(splits),
keep_data = TRUE
)
Fit Tidymodels Workflows to Nested Time Series
Description
Fits one or more tidymodels
workflow objects to nested time series data using the following process:
Models are iteratively fit to training splits.
Accuracy is calculated on testing splits and is logged. Accuracy results can be retrieved with
extract_nested_test_accuracy()
Any model that returns an error is logged. Error logs can be retrieved with
extract_nested_error_report()
Forecast is predicted on testing splits and is logged. Forecast results can be retrieved with
extract_nested_test_forecast()
Usage
modeltime_nested_fit(
nested_data,
...,
model_list = NULL,
metric_set = default_forecast_accuracy_metric_set(),
conf_interval = 0.95,
conf_method = "conformal_default",
control = control_nested_fit()
)
Arguments
nested_data |
Nested time series data |
... |
Tidymodels |
model_list |
Optionally, a |
metric_set |
A |
conf_interval |
An estimated confidence interval based on the calibration data. This is designed to estimate future confidence from out-of-sample prediction error. |
conf_method |
Algorithm used to produce confidence intervals. All CI's are Conformal Predictions. Choose one of:
|
control |
Used to control verbosity and parallel processing. See |
Details
Preparing Data for Nested Forecasting
Use extend_timeseries()
, nest_timeseries()
, and split_nested_timeseries()
for preparing
data for Nested Forecasting. The structure must be a nested data frame, which is suppplied in
modeltime_nested_fit(nested_data)
.
Fitting Models
Models must be in the form of tidymodels workflow
objects. The models can be provided in two ways:
Using
...
(dots): The workflow objects can be provided as dots.Using
model_list
parameter: You can supply one or more workflow objects that are wrapped in alist()
.
Controlling the fitting process
A control
object can be provided during fitting to adjust the verbosity and parallel processing.
See control_nested_fit()
.
Modeltime Nested Forecast
Description
Make a new forecast from a Nested Modeltime Table.
Usage
modeltime_nested_forecast(
object,
h = NULL,
include_actual = TRUE,
conf_interval = 0.95,
conf_method = "conformal_default",
id_subset = NULL,
control = control_nested_forecast()
)
Arguments
object |
A Nested Modeltime Table |
h |
The forecast horizon. Extends the "trained on" data "h" periods into the future. |
include_actual |
Whether or not to include the ".actual_data" as part of the forecast. If FALSE, just returns the forecast predictions. |
conf_interval |
An estimated confidence interval based on the calibration data. This is designed to estimate future confidence from out-of-sample prediction error. |
conf_method |
Algorithm used to produce confidence intervals. All CI's are Conformal Predictions. Choose one of:
|
id_subset |
A sequence of ID's from the modeltime table to subset the forecasting process. This can speed forecasts up. |
control |
Used to control verbosity and parallel processing. See |
Details
This function is designed to help users that want to make new forecasts other than those that are created during the logging process as part of the Nested Modeltime Workflow.
Logged Forecasts
The logged forecasts can be extracted using:
-
extract_nested_future_forecast()
: Extracts the future forecast created after refitting withmodeltime_nested_refit()
. -
extract_nested_test_forecast()
: Extracts the test forecast created after initial fitting withmodeltime_nested_fit()
.
The problem is that these forecasts are static. The user would need to redo the fitting, model selection,
and refitting process to obtain new forecasts. This is why modeltime_nested_forecast()
exists. So you can create
a new forecast without retraining any models.
Nested Forecasts
The main arguments is
h
, which is a horizon that specifies how far into the future to make the new forecast.
If
h = NULL
, a logged forecast will be returnedIf
h = 12
, a new forecast will be generated that extends each series 12-periods into the future.If
h = "2 years"
, a new forecast will be generated that extends each series 2-years into the future.
Use the id_subset
to filter the Nested Modeltime Table object
to just the time series of interest.
Use the conf_interval
to override the logged confidence interval.
Note that this will have no effect if h = NULL
as logged forecasts are returned.
So be sure to provide h
if you want to update the confidence interval.
Use the control
argument to apply verbosity during the forecasting process and to run forecasts in parallel.
Generally, parallel is better if many forecasts are being generated.
Refits a Nested Modeltime Table
Description
Refits a Nested Modeltime Table to actual data using the following process:
Models are iteratively refit to .actual_data.
Any model that returns an error is logged. Errors can be retrieved with
extract_nested_error_report()
Forecast is predicted on future_data and is logged. Forecast can be retrieved with
extract_nested_future_forecast()
Usage
modeltime_nested_refit(object, control = control_nested_refit())
Arguments
object |
A Nested Modeltime Table |
control |
Used to control verbosity and parallel processing. See |
Select the Best Models from Nested Modeltime Table
Description
Finds the best models for each time series group in a Nested Modeltime Table using
a metric
that the user specifies.
Logs the best results, which can be accessed with
extract_nested_best_model_report()
If
filter_test_forecasts = TRUE
, updates the test forecast log, which can be accessedextract_nested_test_forecast()
Usage
modeltime_nested_select_best(
object,
metric = "rmse",
minimize = TRUE,
filter_test_forecasts = TRUE
)
Arguments
object |
A Nested Modeltime Table |
metric |
A metric to minimize or maximize. By default available metrics are:
|
minimize |
Whether to minimize or maximize. Default: TRUE (minimize). |
filter_test_forecasts |
Whether or not to update the test forecast log to filter only the best forecasts. Default: TRUE. |
Refit one or more trained models to new data
Description
This is a wrapper for fit()
that takes a
Modeltime Table and retrains each model on new data re-using the parameters
and preprocessing steps used during the training process.
Usage
modeltime_refit(object, data, ..., control = control_refit())
Arguments
object |
A Modeltime Table |
data |
A |
... |
Additional arguments to control refitting. Ensemble Model Spec ( When making a meta-learner with |
control |
Used to control verbosity and parallel processing.
See |
Details
Refitting is an important step prior to forecasting time series models.
The modeltime_refit()
function makes it easy to recycle models,
retraining on new data.
Recycling Parameters
Parameters are recycled during retraining using the following criteria:
-
Automated models (e.g. "auto arima") will have parameters recalculated.
-
Non-automated models (e.g. "arima") will have parameters preserved.
All preprocessing steps will be reused on the data
Refit
The modeltime_refit()
function is used to retrain models trained with fit()
.
Refit XY
The XY format is not supported at this time.
Value
A Modeltime Table containing one or more re-trained models.
See Also
Examples
library(dplyr)
library(lubridate)
library(timetk)
library(parsnip)
library(rsample)
# Data
m750 <- m4_monthly %>% filter(id == "M750")
# Split Data 80/20
splits <- initial_time_split(m750, prop = 0.9)
# --- MODELS ---
model_fit_prophet <- prophet_reg() %>%
set_engine(engine = "prophet") %>%
fit(value ~ date, data = training(splits))
# ---- MODELTIME TABLE ----
models_tbl <- modeltime_table(
model_fit_prophet
)
# ---- CALIBRATE ----
# - Calibrate on training data set
calibration_tbl <- models_tbl %>%
modeltime_calibrate(new_data = testing(splits))
# ---- REFIT ----
# - Refit on full data set
refit_tbl <- calibration_tbl %>%
modeltime_refit(m750)
Extract Residuals Information
Description
This is a convenience function to unnest model residuals
Usage
modeltime_residuals(object, new_data = NULL, quiet = TRUE, ...)
Arguments
object |
A Modeltime Table |
new_data |
A |
quiet |
Hide errors ( |
... |
Not currently used. |
Value
A tibble with residuals.
Examples
library(dplyr)
library(lubridate)
library(timetk)
library(parsnip)
library(rsample)
# Data
m750 <- m4_monthly %>% filter(id == "M750")
# Split Data 80/20
splits <- initial_time_split(m750, prop = 0.9)
# --- MODELS ---
# Model 1: prophet ----
model_fit_prophet <- prophet_reg() %>%
set_engine(engine = "prophet") %>%
fit(value ~ date, data = training(splits))
# ---- MODELTIME TABLE ----
models_tbl <- modeltime_table(
model_fit_prophet
)
# ---- RESIDUALS ----
# In-Sample
models_tbl %>%
modeltime_calibrate(new_data = training(splits)) %>%
modeltime_residuals() %>%
plot_modeltime_residuals(.interactive = FALSE)
# Out-of-Sample
models_tbl %>%
modeltime_calibrate(new_data = testing(splits)) %>%
modeltime_residuals() %>%
plot_modeltime_residuals(.interactive = FALSE)
Apply Statistical Tests to Residuals
Description
This is a convenience function to calculate some statistical tests on the residuals models. Currently, the following statistics are calculated: the shapiro.test to check the normality of the residuals, the box-pierce and ljung-box tests and the durbin watson test to check the autocorrelation of the residuals. In all cases the p-values are returned.
Usage
modeltime_residuals_test(object, new_data = NULL, lag = 1, fitdf = 0, ...)
Arguments
object |
A |
new_data |
A |
lag |
The statistic will be based on lag autocorrelation coefficients. Default: 1 (Applies to Box-Pierce, Ljung-Box, and Durbin-Watson Tests) |
fitdf |
Number of degrees of freedom to be subtracted. Default: 0 (Applies Box-Pierce and Ljung-Box Tests) |
... |
Not currently used |
Details
Shapiro-Wilk Test
The Shapiro-Wilk tests the Normality of the residuals. The Null Hypothesis is that the residuals are normally distributed. A low P-Value below a given significance level indicates the values are NOT Normally Distributed.
If the p-value > 0.05 (good), this implies that the distribution of the data are not significantly different from normal distribution. In other words, we can assume the normality.
Box-Pierce and Ljung-Box Tests Tests
The Ljung-Box and Box-Pierce tests are methods that test for the absense of autocorrelation in residuals. A low p-value below a given significance level indicates the values are autocorrelated.
If the p-value > 0.05 (good), this implies that the residuals of the data are are independent. In other words, we can assume the residuals are not autocorrelated.
For more information about the parameters associated with the Box Pierce and Ljung Box tests check ?Box.Test
Durbin-Watson Test
The Durbin-Watson test is a method that tests for the absense of autocorrelation in residuals. The Durbin Watson test reports a test statistic, with a value from 0 to 4, where:
-
2 is no autocorrelation (good)
From 0 to <2 is positive autocorrelation (common in time series data)
From >2 to 4 is negative autocorrelation (less common in time series data)
Value
A tibble with with the p-values of the calculated statistical tests.
See Also
stats::shapiro.test()
, stats::Box.test()
Examples
library(dplyr)
library(timetk)
library(parsnip)
library(rsample)
# Data
m750 <- m4_monthly %>% filter(id == "M750")
# Split Data 80/20
splits <- initial_time_split(m750, prop = 0.9)
# --- MODELS ---
# Model 1: prophet ----
model_fit_prophet <- prophet_reg() %>%
set_engine(engine = "prophet") %>%
fit(value ~ date, data = training(splits))
# ---- MODELTIME TABLE ----
models_tbl <- modeltime_table(
model_fit_prophet
)
# ---- RESIDUALS ----
# In-Sample
models_tbl %>%
modeltime_calibrate(new_data = training(splits)) %>%
modeltime_residuals() %>%
modeltime_residuals_test()
# Out-of-Sample
models_tbl %>%
modeltime_calibrate(new_data = testing(splits)) %>%
modeltime_residuals() %>%
modeltime_residuals_test()
Scale forecast analysis with a Modeltime Table
Description
Designed to perform forecasts at scale using models created with
modeltime
, parsnip
, workflows
, and regression modeling extensions
in the tidymodels
ecosystem.
Usage
modeltime_table(...)
as_modeltime_table(.l)
Arguments
... |
Fitted |
.l |
A list containing fitted |
Details
modeltime_table()
:
Creates a table of models
Validates that all objects are models (parsnip or workflows objects) and all models have been fitted (trained)
Provides an ID and Description of the models
as_modeltime_table()
:
Converts a list
of models to a modeltime table. Useful if programatically creating
Modeltime Tables from models stored in a list
.
Examples
library(dplyr)
library(timetk)
library(parsnip)
library(rsample)
# Data
m750 <- m4_monthly %>% filter(id == "M750")
# Split Data 80/20
splits <- initial_time_split(m750, prop = 0.9)
# --- MODELS ---
# Model 1: prophet ----
model_fit_prophet <- prophet_reg() %>%
set_engine(engine = "prophet") %>%
fit(value ~ date, data = training(splits))
# ---- MODELTIME TABLE ----
# Make a Modeltime Table
models_tbl <- modeltime_table(
model_fit_prophet
)
# Can also convert a list of models
list(model_fit_prophet) %>%
as_modeltime_table()
# ---- CALIBRATE ----
calibration_tbl <- models_tbl %>%
modeltime_calibrate(new_data = testing(splits))
# ---- ACCURACY ----
calibration_tbl %>%
modeltime_accuracy()
# ---- FORECAST ----
calibration_tbl %>%
modeltime_forecast(
new_data = testing(splits),
actual_data = m750
)
Low-Level NAIVE Forecast
Description
Low-Level NAIVE Forecast
Usage
naive_fit_impl(x, y, id = NULL, seasonal_period = "auto", ...)
Arguments
x |
A dataframe of xreg (exogenous regressors) |
y |
A numeric vector of values to fit |
id |
An optional ID feature to identify different time series. Should be a quoted name. |
seasonal_period |
Not used for NAIVE forecast but here for consistency with SNAIVE |
... |
Not currently used |
Bridge prediction function for NAIVE Models
Description
Bridge prediction function for NAIVE Models
Usage
naive_predict_impl(object, new_data)
Arguments
object |
An object of class |
new_data |
A rectangular data object, such as a data frame. |
General Interface for NAIVE Forecast Models
Description
naive_reg()
is a way to generate a specification of an NAIVE or SNAIVE model
before fitting and allows the model to be created using
different packages.
Usage
naive_reg(mode = "regression", id = NULL, seasonal_period = NULL)
Arguments
mode |
A single character string for the type of model. The only possible value for this model is "regression". |
id |
An optional quoted column name (e.g. "id") for identifying multiple time series (i.e. panel data). |
seasonal_period |
SNAIVE only. A seasonal frequency. Uses "auto" by default. A character phrase of "auto" or time-based phrase of "2 weeks" can be used if a date or date-time variable is provided. See Fit Details below. |
Details
The data given to the function are not saved and are only used
to determine the mode of the model. For naive_reg()
, the
mode will always be "regression".
The model can be created using the fit()
function using the
following engines:
"naive" (default) - Performs a NAIVE forecast
"snaive" - Performs a Seasonal NAIVE forecast
Engine Details
naive (default engine)
The engine uses
naive_fit_impl()
The NAIVE implementation uses the last observation and forecasts this value forward.
The
id
can be used to distinguish multiple time series contained in the dataThe
seasonal_period
is not used but provided for consistency with the SNAIVE implementation
snaive (default engine)
The engine uses
snaive_fit_impl()
The SNAIVE implementation uses the last seasonal series in the data and forecasts this sequence of observations forward
The
id
can be used to distinguish multiple time series contained in the dataThe
seasonal_period
is used to determine how far back to define the repeated series. This can be a numeric value (e.g. 28) or a period (e.g. "1 month")
Fit Details
Date and Date-Time Variable
It's a requirement to have a date or date-time variable as a predictor.
The fit()
interface accepts date and date-time features and handles them internally.
-
fit(y ~ date)
ID features (Multiple Time Series, Panel Data)
The id
parameter is populated using the fit()
or fit_xy()
function:
ID Example: Suppose you have 3 features:
-
y
(target) -
date
(time stamp), -
series_id
(a unique identifer that identifies each time series in your data).
The series_id
can be passed to the naive_reg()
using
fit()
:
-
naive_reg(id = "series_id")
specifes that theseries_id
column should be used to identify each time series. -
fit(y ~ date + series_id)
will passseries_id
on to the underlying naive or snaive functions.
Seasonal Period Specification (snaive)
The period can be non-seasonal (seasonal_period = 1 or "none"
) or
yearly seasonal (e.g. For monthly time stamps, seasonal_period = 12
, seasonal_period = "12 months"
, or seasonal_period = "yearly"
).
There are 3 ways to specify:
-
seasonal_period = "auto"
: A seasonal period is selected based on the periodicity of the data (e.g. 12 if monthly) -
seasonal_period = 12
: A numeric frequency. For example, 12 is common for monthly data -
seasonal_period = "1 year"
: A time-based phrase. For example, "1 year" would convert to 12 for monthly data.
External Regressors (Xregs)
These models are univariate. No xregs are used in the modeling process.
See Also
fit.model_spec()
, set_engine()
Examples
library(dplyr)
library(parsnip)
library(rsample)
library(timetk)
# Data
m750 <- m4_monthly %>% filter(id == "M750")
m750
# Split Data 80/20
splits <- initial_time_split(m750, prop = 0.8)
# ---- NAIVE ----
# Model Spec
model_spec <- naive_reg() %>%
set_engine("naive")
# Fit Spec
model_fit <- model_spec %>%
fit(log(value) ~ date, data = training(splits))
model_fit
# ---- SEASONAL NAIVE ----
# Model Spec
model_spec <- naive_reg(
id = "id",
seasonal_period = 12
) %>%
set_engine("snaive")
# Fit Spec
model_fit <- model_spec %>%
fit(log(value) ~ date + id, data = training(splits))
model_fit
Constructor for creating modeltime models
Description
These functions are used to construct new modeltime
bridge functions that
connect the tidymodels
infrastructure to time-series models containing date or date-time features.
Usage
new_modeltime_bridge(class, models, data, extras = NULL, desc = NULL)
Arguments
class |
A class name that is used for creating custom printing messages |
models |
A list containing one or more models |
data |
A data frame (or tibble) containing 4 columns: (date column with name that matches input data), .actual, .fitted, and .residuals. |
extras |
An optional list that is typically used for transferring preprocessing recipes to the predict method. |
desc |
An optional model description to appear when printing your modeltime objects |
Examples
library(dplyr)
library(lubridate)
library(timetk)
lm_model <- lm(value ~ as.numeric(date) + hour(date) + wday(date, label = TRUE),
data = taylor_30_min)
data = tibble(
date = taylor_30_min$date, # Important - The column name must match the modeled data
# These are standardized names: .actual, .fitted, .residuals
.actual = taylor_30_min$value,
.fitted = lm_model$fitted.values %>% as.numeric(),
.residuals = lm_model$residuals %>% as.numeric()
)
new_modeltime_bridge(
class = "lm_time_series_impl",
models = list(model_1 = lm_model),
data = data,
extras = NULL
)
Low-Level NNETAR function for translating modeltime to forecast
Description
Low-Level NNETAR function for translating modeltime to forecast
Usage
nnetar_fit_impl(
x,
y,
period = "auto",
p = 1,
P = 1,
size = 10,
repeats = 20,
decay = 0,
maxit = 100,
...
)
Arguments
x |
A dataframe of xreg (exogenous regressors) |
y |
A numeric vector of values to fit |
period |
A seasonal frequency. Uses "auto" by default. A character phrase of "auto" or time-based phrase of "2 weeks" can be used if a date or date-time variable is provided. |
p |
Embedding dimension for non-seasonal time series. Number of non-seasonal lags used as inputs. For non-seasonal time series, the default is the optimal number of lags (according to the AIC) for a linear AR(p) model. For seasonal time series, the same method is used but applied to seasonally adjusted data (from an stl decomposition). If set to zero to indicate that no non-seasonal lags should be included, then P must be at least 1 and a model with only seasonal lags will be fit. |
P |
Number of seasonal lags used as inputs. |
size |
Number of nodes in the hidden layer. Default is half of the number of input nodes (including external regressors, if given) plus 1. |
repeats |
Number of networks to fit with different random starting weights. These are then averaged when producing forecasts. |
decay |
Parameter for weight decay. Default 0. |
maxit |
Maximum number of iterations. Default 100. |
... |
Additional arguments passed to |
Tuning Parameters for NNETAR Models
Description
Tuning Parameters for NNETAR Models
Usage
num_networks(range = c(1L, 100L), trans = NULL)
Arguments
range |
A two-element vector holding the defaults for the smallest and largest possible values, respectively. If a transformation is specified, these values should be in the transformed units. |
trans |
A |
Details
The main parameters for NNETAR models are:
-
non_seasonal_ar
: Number of non-seasonal auto-regressive (AR) lags. Often denoted "p" in pdq-notation. -
seasonal_ar
: Number of seasonal auto-regressive (SAR) lags. Often denoted "P" in PDQ-notation. -
hidden_units
: An integer for the number of units in the hidden model. -
num_networks
: Number of networks to fit with different random starting weights. These are then averaged when producing forecasts. -
penalty
: A non-negative numeric value for the amount of weight decay. -
epochs
: An integer for the number of training iterations.
See Also
non_seasonal_ar()
, seasonal_ar()
, dials::hidden_units()
, dials::penalty()
, dials::epochs()
Examples
num_networks()
Bridge prediction function for ARIMA models
Description
Bridge prediction function for ARIMA models
Usage
nnetar_predict_impl(object, new_data, ...)
Arguments
object |
An object of class |
new_data |
A rectangular data object, such as a data frame. |
... |
Additional arguments passed to |
General Interface for NNETAR Regression Models
Description
nnetar_reg()
is a way to generate a specification of an NNETAR model
before fitting and allows the model to be created using
different packages. Currently the only package is forecast
.
Usage
nnetar_reg(
mode = "regression",
seasonal_period = NULL,
non_seasonal_ar = NULL,
seasonal_ar = NULL,
hidden_units = NULL,
num_networks = NULL,
penalty = NULL,
epochs = NULL
)
Arguments
mode |
A single character string for the type of model. The only possible value for this model is "regression". |
seasonal_period |
A seasonal frequency. Uses "auto" by default. A character phrase of "auto" or time-based phrase of "2 weeks" can be used if a date or date-time variable is provided. See Fit Details below. |
non_seasonal_ar |
The order of the non-seasonal auto-regressive (AR) terms. Often denoted "p" in pdq-notation. |
seasonal_ar |
The order of the seasonal auto-regressive (SAR) terms. Often denoted "P" in PDQ-notation. |
An integer for the number of units in the hidden model. | |
num_networks |
Number of networks to fit with different random starting weights. These are then averaged when producing forecasts. |
penalty |
A non-negative numeric value for the amount of weight decay. |
epochs |
An integer for the number of training iterations. |
Details
The data given to the function are not saved and are only used
to determine the mode of the model. For nnetar_reg()
, the
mode will always be "regression".
The model can be created using the fit()
function using the
following engines:
"nnetar" (default) - Connects to
forecast::nnetar()
Main Arguments
The main arguments (tuning parameters) for the model are the parameters in
nnetar_reg()
function. These arguments are converted to their specific names at the
time that the model is fit.
Other options and argument can be
set using set_engine()
(See Engine Details below).
If parameters need to be modified, update()
can be used
in lieu of recreating the object from scratch.
Engine Details
The standardized parameter names in modeltime
can be mapped to their original
names in each engine:
modeltime | forecast::nnetar |
seasonal_period | ts(frequency) |
non_seasonal_ar | p (1) |
seasonal_ar | P (1) |
hidden_units | size (10) |
num_networks | repeats (20) |
epochs | maxit (100) |
penalty | decay (0) |
Other options can be set using set_engine()
.
nnetar
The engine uses forecast::nnetar()
.
Function Parameters:
#> function (y, p, P = 1, size, repeats = 20, xreg = NULL, lambda = NULL, #> model = NULL, subset = NULL, scale.inputs = TRUE, x = y, ...)
Parameter Notes:
-
xreg
- This is supplied via the parsnip / modeltimefit()
interface (so don't provide this manually). See Fit Details (below). -
size
- Is set to 10 by default. This differs from theforecast
implementation -
p
andP
- Are set to 1 by default. -
maxit
anddecay
arennet::nnet
parameters that are exposed in thennetar_reg()
interface. These are key tuning parameters.
Fit Details
Date and Date-Time Variable
It's a requirement to have a date or date-time variable as a predictor.
The fit()
interface accepts date and date-time features and handles them internally.
-
fit(y ~ date)
Seasonal Period Specification
The period can be non-seasonal (seasonal_period = 1 or "none"
) or
yearly seasonal (e.g. For monthly time stamps, seasonal_period = 12
, seasonal_period = "12 months"
, or seasonal_period = "yearly"
).
There are 3 ways to specify:
-
seasonal_period = "auto"
: A seasonal period is selected based on the periodicity of the data (e.g. 12 if monthly) -
seasonal_period = 12
: A numeric frequency. For example, 12 is common for monthly data -
seasonal_period = "1 year"
: A time-based phrase. For example, "1 year" would convert to 12 for monthly data.
Univariate (No xregs, Exogenous Regressors):
For univariate analysis, you must include a date or date-time feature. Simply use:
Formula Interface (recommended):
fit(y ~ date)
will ignore xreg's.XY Interface:
fit_xy(x = data[,"date"], y = data$y)
will ignore xreg's.
Multivariate (xregs, Exogenous Regressors)
The xreg
parameter is populated using the fit()
or fit_xy()
function:
Only
factor
,ordered factor
, andnumeric
data will be used as xregs.Date and Date-time variables are not used as xregs
-
character
data should be converted to factor.
Xreg Example: Suppose you have 3 features:
-
y
(target) -
date
(time stamp), -
month.lbl
(labeled month as a ordered factor).
The month.lbl
is an exogenous regressor that can be passed to the nnetar_reg()
using
fit()
:
-
fit(y ~ date + month.lbl)
will passmonth.lbl
on as an exogenous regressor. -
fit_xy(data[,c("date", "month.lbl")], y = data$y)
will pass x, where x is a data frame containingmonth.lbl
and thedate
feature. Onlymonth.lbl
will be used as an exogenous regressor.
Note that date or date-time class values are excluded from xreg
.
See Also
fit.model_spec()
, set_engine()
Examples
library(dplyr)
library(parsnip)
library(rsample)
library(timetk)
# Data
m750 <- m4_monthly %>% filter(id == "M750")
m750
# Split Data 80/20
splits <- initial_time_split(m750, prop = 0.8)
# ---- NNETAR ----
# Model Spec
model_spec <- nnetar_reg() %>%
set_engine("nnetar")
# Fit Spec
set.seed(123)
model_fit <- model_spec %>%
fit(log(value) ~ date, data = training(splits))
model_fit
Filter the last N rows (Tail) for multiple time series
Description
Filter the last N rows (Tail) for multiple time series
Usage
panel_tail(data, id, n)
Arguments
data |
A data frame |
id |
An "id" feature indicating which column differentiates the time series panels |
n |
The number of rows to filter |
Value
A data frame
See Also
-
recursive()
- used to generate recursive autoregressive models
Examples
library(timetk)
# Get the last 6 observations from each group
m4_monthly %>%
panel_tail(id = id, n = 6)
Start parallel clusters using parallel
package
Description
Start parallel clusters using parallel
package
Usage
parallel_start(
...,
.method = c("parallel", "spark"),
.export_vars = NULL,
.packages = NULL
)
parallel_stop()
Arguments
... |
Parameters passed to underlying functions (See Details Section) |
.method |
The method to create the parallel backend. Supports:
|
.export_vars |
Environment variables that can be sent to the workers |
.packages |
Packages that can be sent to the workers |
Parallel (.method = "parallel"
)
Performs 3 Steps:
Makes clusters using
parallel::makeCluster(...)
. Theparallel_start(...)
are passed toparallel::makeCluster(...)
.Registers clusters using
doParallel::registerDoParallel()
.Adds
.libPaths()
usingparallel::clusterCall()
.
Spark (.method = "spark"
)
Important, make sure to create a spark connection using
sparklyr::spark_connect()
.Pass the connection object as the first argument. For example,
parallel_start(sc, .method = "spark")
.The
parallel_start(...)
are passed tosparklyr::registerDoSpark(...)
.
Examples
# Starts 2 clusters
parallel_start(2)
# Returns to sequential processing
parallel_stop()
Developer Tools for parsing date and date-time information
Description
These functions are designed to assist developers in extending the modeltime
package.
Usage
parse_index_from_data(data)
parse_period_from_index(data, period)
Arguments
data |
A data frame |
period |
A period to calculate from the time index. Numeric values are returned as-is. "auto" guesses a numeric value from the index. A time-based phrase (e.g. "7 days") calculates the number of timestamps that typically occur within the time-based phrase. |
Value
parse_index_from_data(): Returns a tibble containing the date or date-time column.
parse_period_from_index(): Returns the numeric period from a tibble containing the index.
Examples
library(dplyr)
library(timetk)
predictors <- m4_monthly %>%
filter(id == "M750") %>%
select(-value)
index_tbl <- parse_index_from_data(predictors)
index_tbl
period <- parse_period_from_index(index_tbl, period = "1 year")
period
Interactive Forecast Visualization
Description
This is a wrapper for timetk::plot_time_series()
that generates an interactive (plotly
) or static
(ggplot2
) plot with the forecasted data.
Usage
plot_modeltime_forecast(
.data,
.conf_interval_show = TRUE,
.conf_interval_fill = "grey20",
.conf_interval_alpha = 0.2,
.smooth = FALSE,
.legend_show = TRUE,
.legend_max_width = 40,
.facet_ncol = 1,
.facet_nrow = 1,
.facet_scales = "free_y",
.title = "Forecast Plot",
.x_lab = "",
.y_lab = "",
.color_lab = "Legend",
.interactive = TRUE,
.plotly_slider = FALSE,
.trelliscope = FALSE,
.trelliscope_params = list(),
...
)
Arguments
.data |
A |
.conf_interval_show |
Logical. Whether or not to include the confidence interval as a ribbon. |
.conf_interval_fill |
Fill color for the confidence interval |
.conf_interval_alpha |
Fill opacity for the confidence interval. Range (0, 1). |
.smooth |
Logical - Whether or not to include a trendline smoother.
Uses See |
.legend_show |
Logical. Whether or not to show the legend. Can save space with long model descriptions. |
.legend_max_width |
Numeric. The width of truncation to apply to the legend text. |
.facet_ncol |
Number of facet columns. |
.facet_nrow |
Number of facet rows (only used for |
.facet_scales |
Control facet x & y-axis ranges. Options include "fixed", "free", "free_y", "free_x" |
.title |
Title for the plot |
.x_lab |
X-axis label for the plot |
.y_lab |
Y-axis label for the plot |
.color_lab |
Legend label if a |
.interactive |
Returns either a static ( |
.plotly_slider |
If |
.trelliscope |
Returns either a normal plot or a trelliscopejs plot (great for many time series)
Must have |
.trelliscope_params |
Pass parameters to the
|
... |
Additional arguments passed to |
Value
A static ggplot2
plot or an interactive plotly
plot containing a forecast
Examples
library(dplyr)
library(lubridate)
library(timetk)
library(parsnip)
library(rsample)
# Data
m750 <- m4_monthly %>% filter(id == "M750")
# Split Data 80/20
splits <- initial_time_split(m750, prop = 0.9)
# --- MODELS ---
# Model 1: prophet ----
model_fit_prophet <- prophet_reg() %>%
set_engine(engine = "prophet") %>%
fit(value ~ date, data = training(splits))
# ---- MODELTIME TABLE ----
models_tbl <- modeltime_table(
model_fit_prophet
)
# ---- FORECAST ----
models_tbl %>%
modeltime_calibrate(new_data = testing(splits)) %>%
modeltime_forecast(
new_data = testing(splits),
actual_data = m750
) %>%
plot_modeltime_forecast(.interactive = FALSE)
Interactive Residuals Visualization
Description
This is a wrapper for examining residuals using:
Time Plot:
timetk::plot_time_series()
ACF Plot:
timetk::plot_acf_diagnostics()
Seasonality Plot:
timetk::plot_seasonal_diagnostics()
Usage
plot_modeltime_residuals(
.data,
.type = c("timeplot", "acf", "seasonality"),
.smooth = FALSE,
.legend_show = TRUE,
.legend_max_width = 40,
.title = "Residuals Plot",
.x_lab = "",
.y_lab = "",
.color_lab = "Legend",
.interactive = TRUE,
...
)
Arguments
.data |
A |
.type |
One of "timeplot", "acf", or "seasonality". The default is "timeplot". |
.smooth |
Logical - Whether or not to include a trendline smoother.
Uses See |
.legend_show |
Logical. Whether or not to show the legend. Can save space with long model descriptions. |
.legend_max_width |
Numeric. The width of truncation to apply to the legend text. |
.title |
Title for the plot |
.x_lab |
X-axis label for the plot |
.y_lab |
Y-axis label for the plot |
.color_lab |
Legend label if a |
.interactive |
Returns either a static ( |
... |
Additional arguments passed to:
|
Value
A static ggplot2
plot or an interactive plotly
plot containing residuals vs time
Examples
library(dplyr)
library(timetk)
library(parsnip)
library(rsample)
# Data
m750 <- m4_monthly %>% filter(id == "M750")
# Split Data 80/20
splits <- initial_time_split(m750, prop = 0.9)
# --- MODELS ---
# Model 1: prophet ----
model_fit_prophet <- prophet_reg() %>%
set_engine(engine = "prophet") %>%
fit(value ~ date, data = training(splits))
# ---- MODELTIME TABLE ----
models_tbl <- modeltime_table(
model_fit_prophet
)
# ---- RESIDUALS ----
residuals_tbl <- models_tbl %>%
modeltime_calibrate(new_data = testing(splits)) %>%
modeltime_residuals()
residuals_tbl %>%
plot_modeltime_residuals(
.type = "timeplot",
.interactive = FALSE
)
Extract model by model id in a Modeltime Table
Description
The pull_modeltime_model()
and pluck_modeltime_model()
functions are synonymns.
Usage
pluck_modeltime_model(object, .model_id)
## S3 method for class 'mdl_time_tbl'
pluck_modeltime_model(object, .model_id)
pull_modeltime_model(object, .model_id)
Arguments
object |
A Modeltime Table |
.model_id |
A numeric value matching the .model_id that you want to update |
See Also
-
combine_modeltime_tables()
: Combine 2 or more Modeltime Tables together -
add_modeltime_model()
: Adds a new row with a new model to a Modeltime Table -
drop_modeltime_model()
: Drop one or more models from a Modeltime Table -
update_modeltime_description()
: Updates a description for a model inside a Modeltime Table -
update_modeltime_model()
: Updates a model inside a Modeltime Table -
pull_modeltime_model()
: Extracts a model from a Modeltime Table
Examples
m750_models %>%
pluck_modeltime_model(2)
Prepared Nested Modeltime Data
Description
A set of functions to simplify preparation of nested data for iterative (nested) forecasting with Nested Modeltime Tables.
Usage
extend_timeseries(.data, .id_var, .date_var, .length_future, ...)
nest_timeseries(.data, .id_var, .length_future, .length_actual = NULL)
split_nested_timeseries(.data, .length_test, .length_train = NULL, ...)
Arguments
.data |
A data frame or tibble containing time series data. The data should have:
|
.id_var |
An id column |
.date_var |
A date or datetime column |
.length_future |
Varies based on the function:
|
... |
Additional arguments passed to the helper function. See details. |
.length_actual |
Can be used to slice the |
.length_test |
Defines the length of the test split for evaluation. |
.length_train |
Defines the length of the training split for evaluation. |
Details
Preparation of nested time series follows a 3-Step Process:
Step 1: Extend the Time Series
extend_timeseries()
: A wrapper for timetk::future_frame()
that extends a time series
group-wise into the future.
The group column is specified by
.id_var
.The date column is specified by
.date_var
.The length into the future is specified with
.length_future
.The
...
are additional parameters that can be passed totimetk::future_frame()
Step 2: Nest the Time Series
nest_timeseries()
: A helper for nesting your data into .actual_data
and .future_data
.
The group column is specified by
.id_var
The
.length_future
defines the length of the.future_data
.The remaining data is converted to the
.actual_data
.The
.length_actual
can be used to slice the.actual_data
to a most recent number of observations.
The result is a "nested data frame".
Step 3: Split the Actual Data into Train/Test Splits
split_nested_timeseries()
: A wrapper for timetk::time_series_split()
that generates
training/testing splits from the .actual_data
column.
The
.length_test
is the primary argument that identifies the size of the testing sample. This is typically the same size as the.future_data
.The
.length_train
is an optional size of the training data.The
...
(dots) are additional arguments that can be passed totimetk::time_series_split()
.
Helpers
extract_nested_train_split()
and extract_nested_test_split()
are used to simplify extracting
the training and testing data from the actual data. This can be helpful when making
preprocessing recipes using the recipes
package.
Examples
library(dplyr)
library(timetk)
nested_data_tbl <- walmart_sales_weekly %>%
select(id, date = Date, value = Weekly_Sales) %>%
# Step 1: Extends the time series by id
extend_timeseries(
.id_var = id,
.date_var = date,
.length_future = 52
) %>%
# Step 2: Nests the time series into .actual_data and .future_data
nest_timeseries(
.id_var = id,
.length_future = 52
) %>%
# Step 3: Adds a column .splits that contains training/testing indices
split_nested_timeseries(
.length_test = 52
)
nested_data_tbl
# Helpers: Getting the Train/Test Sets
extract_nested_train_split(nested_data_tbl, .row_id = 1)
General Interface for Boosted PROPHET Time Series Models
Description
prophet_boost()
is a way to generate a specification of a Boosted PROPHET model
before fitting and allows the model to be created using
different packages. Currently the only package is prophet
.
Usage
prophet_boost(
mode = "regression",
growth = NULL,
changepoint_num = NULL,
changepoint_range = NULL,
seasonality_yearly = NULL,
seasonality_weekly = NULL,
seasonality_daily = NULL,
season = NULL,
prior_scale_changepoints = NULL,
prior_scale_seasonality = NULL,
prior_scale_holidays = NULL,
logistic_cap = NULL,
logistic_floor = NULL,
mtry = NULL,
trees = NULL,
min_n = NULL,
tree_depth = NULL,
learn_rate = NULL,
loss_reduction = NULL,
sample_size = NULL,
stop_iter = NULL
)
Arguments
mode |
A single character string for the type of model. The only possible value for this model is "regression". |
growth |
String 'linear' or 'logistic' to specify a linear or logistic trend. |
changepoint_num |
Number of potential changepoints to include for modeling trend. |
changepoint_range |
Adjusts the flexibility of the trend component by limiting to a percentage of data before the end of the time series. 0.80 means that a changepoint cannot exist after the first 80% of the data. |
seasonality_yearly |
One of "auto", TRUE or FALSE. Toggles on/off a seasonal component that models year-over-year seasonality. |
seasonality_weekly |
One of "auto", TRUE or FALSE. Toggles on/off a seasonal component that models week-over-week seasonality. |
seasonality_daily |
One of "auto", TRUE or FALSE. Toggles on/off a seasonal componet that models day-over-day seasonality. |
season |
'additive' (default) or 'multiplicative'. |
prior_scale_changepoints |
Parameter modulating the flexibility of the automatic changepoint selection. Large values will allow many changepoints, small values will allow few changepoints. |
prior_scale_seasonality |
Parameter modulating the strength of the seasonality model. Larger values allow the model to fit larger seasonal fluctuations, smaller values dampen the seasonality. |
prior_scale_holidays |
Parameter modulating the strength of the holiday components model, unless overridden in the holidays input. |
logistic_cap |
When growth is logistic, the upper-bound for "saturation". |
logistic_floor |
When growth is logistic, the lower-bound for "saturation". |
mtry |
A number for the number (or proportion) of predictors that will be randomly sampled at each split when creating the tree models (specific engines only). |
trees |
An integer for the number of trees contained in the ensemble. |
min_n |
An integer for the minimum number of data points in a node that is required for the node to be split further. |
tree_depth |
An integer for the maximum depth of the tree (i.e. number of splits) (specific engines only). |
learn_rate |
A number for the rate at which the boosting algorithm adapts from iteration-to-iteration (specific engines only). This is sometimes referred to as the shrinkage parameter. |
loss_reduction |
A number for the reduction in the loss function required to split further (specific engines only). |
sample_size |
number for the number (or proportion) of data that is exposed to the fitting routine. |
stop_iter |
The number of iterations without improvement before
stopping ( |
Details
The data given to the function are not saved and are only used
to determine the mode of the model. For prophet_boost()
, the
mode will always be "regression".
The model can be created using the fit()
function using the
following engines:
"prophet_xgboost" (default) - Connects to
prophet::prophet()
andxgboost::xgb.train()
Main Arguments
The main arguments (tuning parameters) for the PROPHET model are:
-
growth
: String 'linear' or 'logistic' to specify a linear or logistic trend. -
changepoint_num
: Number of potential changepoints to include for modeling trend. -
changepoint_range
: Range changepoints that adjusts how close to the end the last changepoint can be located. -
season
: 'additive' (default) or 'multiplicative'. -
prior_scale_changepoints
: Parameter modulating the flexibility of the automatic changepoint selection. Large values will allow many changepoints, small values will allow few changepoints. -
prior_scale_seasonality
: Parameter modulating the strength of the seasonality model. Larger values allow the model to fit larger seasonal fluctuations, smaller values dampen the seasonality. -
prior_scale_holidays
: Parameter modulating the strength of the holiday components model, unless overridden in the holidays input. -
logistic_cap
: When growth is logistic, the upper-bound for "saturation". -
logistic_floor
: When growth is logistic, the lower-bound for "saturation".
The main arguments (tuning parameters) for the model XGBoost model are:
-
mtry
: The number of predictors that will be randomly sampled at each split when creating the tree models. -
trees
: The number of trees contained in the ensemble. -
min_n
: The minimum number of data points in a node that are required for the node to be split further. -
tree_depth
: The maximum depth of the tree (i.e. number of splits). -
learn_rate
: The rate at which the boosting algorithm adapts from iteration-to-iteration. -
loss_reduction
: The reduction in the loss function required to split further. -
sample_size
: The amount of data exposed to the fitting routine. -
stop_iter
: The number of iterations without improvement before stopping.
These arguments are converted to their specific names at the time that the model is fit.
Other options and argument can be
set using set_engine()
(See Engine Details below).
If parameters need to be modified, update()
can be used
in lieu of recreating the object from scratch.
Engine Details
The standardized parameter names in modeltime
can be mapped to their original
names in each engine:
Model 1: PROPHET:
modeltime | prophet |
growth | growth ('linear') |
changepoint_num | n.changepoints (25) |
changepoint_range | changepoints.range (0.8) |
seasonality_yearly | yearly.seasonality ('auto') |
seasonality_weekly | weekly.seasonality ('auto') |
seasonality_daily | daily.seasonality ('auto') |
season | seasonality.mode ('additive') |
prior_scale_changepoints | changepoint.prior.scale (0.05) |
prior_scale_seasonality | seasonality.prior.scale (10) |
prior_scale_holidays | holidays.prior.scale (10) |
logistic_cap | df$cap (NULL) |
logistic_floor | df$floor (NULL) |
Model 2: XGBoost:
modeltime | xgboost::xgb.train |
tree_depth | max_depth (6) |
trees | nrounds (15) |
learn_rate | eta (0.3) |
mtry | colsample_bynode (1) |
min_n | min_child_weight (1) |
loss_reduction | gamma (0) |
sample_size | subsample (1) |
stop_iter | early_stop |
Other options can be set using set_engine()
.
prophet_xgboost
Model 1: PROPHET (prophet::prophet
):
#> function (df = NULL, growth = "linear", changepoints = NULL, n.changepoints = 25, #> changepoint.range = 0.8, yearly.seasonality = "auto", weekly.seasonality = "auto", #> daily.seasonality = "auto", holidays = NULL, seasonality.mode = "additive", #> seasonality.prior.scale = 10, holidays.prior.scale = 10, changepoint.prior.scale = 0.05, #> mcmc.samples = 0, interval.width = 0.8, uncertainty.samples = 1000, #> fit = TRUE, ...)
Parameter Notes:
-
df
: This is supplied via the parsnip / modeltimefit()
interface (so don't provide this manually). See Fit Details (below). -
holidays
: A data.frame of holidays can be supplied viaset_engine()
-
uncertainty.samples
: The default is set to 0 because the prophet uncertainty intervals are not used as part of the Modeltime Workflow. You can override this setting if you plan to use prophet's uncertainty tools.
Logistic Growth and Saturation Levels:
For
growth = "logistic"
, simply add numeric values forlogistic_cap
and / orlogistic_floor
. There is no need to add additional columns for "cap" and "floor" to your data frame.
Limitations:
-
prophet::add_seasonality()
is not currently implemented. It's used to specify non-standard seasonalities using fourier series. An alternative is to usestep_fourier()
and supply custom seasonalities as Extra Regressors.
Model 2: XGBoost (xgboost::xgb.train
):
#> function (params = list(), data, nrounds, watchlist = list(), obj = NULL, #> feval = NULL, verbose = 1, print_every_n = 1L, early_stopping_rounds = NULL, #> maximize = NULL, save_period = NULL, save_name = "xgboost.model", xgb_model = NULL, #> callbacks = list(), ...)
Parameter Notes:
XGBoost uses a
params = list()
to capture. Parsnip / Modeltime automatically sends any args provided as...
inside ofset_engine()
to theparams = list(...)
.
Fit Details
Date and Date-Time Variable
It's a requirement to have a date or date-time variable as a predictor.
The fit()
interface accepts date and date-time features and handles them internally.
-
fit(y ~ date)
Univariate (No Extra Regressors):
For univariate analysis, you must include a date or date-time feature. Simply use:
Formula Interface (recommended):
fit(y ~ date)
will ignore xreg's.XY Interface:
fit_xy(x = data[,"date"], y = data$y)
will ignore xreg's.
Multivariate (Extra Regressors)
Extra Regressors parameter is populated using the fit()
or fit_xy()
function:
Only
factor
,ordered factor
, andnumeric
data will be used as xregs.Date and Date-time variables are not used as xregs
-
character
data should be converted to factor.
Xreg Example: Suppose you have 3 features:
-
y
(target) -
date
(time stamp), -
month.lbl
(labeled month as a ordered factor).
The month.lbl
is an exogenous regressor that can be passed to the arima_reg()
using
fit()
:
-
fit(y ~ date + month.lbl)
will passmonth.lbl
on as an exogenous regressor. -
fit_xy(data[,c("date", "month.lbl")], y = data$y)
will pass x, where x is a data frame containingmonth.lbl
and thedate
feature. Onlymonth.lbl
will be used as an exogenous regressor.
Note that date or date-time class values are excluded from xreg
.
See Also
fit.model_spec()
, set_engine()
Examples
library(dplyr)
library(lubridate)
library(parsnip)
library(rsample)
library(timetk)
# Data
m750 <- m4_monthly %>% filter(id == "M750")
m750
# Split Data 80/20
splits <- initial_time_split(m750, prop = 0.8)
# ---- PROPHET ----
# Model Spec
model_spec <- prophet_boost(
learn_rate = 0.1
) %>%
set_engine("prophet_xgboost")
# Fit Spec
model_fit <- model_spec %>%
fit(log(value) ~ date + as.numeric(date) + month(date, label = TRUE),
data = training(splits))
model_fit
Low-Level PROPHET function for translating modeltime to PROPHET
Description
Low-Level PROPHET function for translating modeltime to PROPHET
Usage
prophet_fit_impl(
x,
y,
growth = "linear",
n.changepoints = 25,
changepoint.range = 0.8,
yearly.seasonality = "auto",
weekly.seasonality = "auto",
daily.seasonality = "auto",
seasonality.mode = "additive",
changepoint.prior.scale = 0.05,
seasonality.prior.scale = 10,
holidays.prior.scale = 10,
regressors.prior.scale = 10000,
regressors.standardize = "auto",
regressors.mode = NULL,
logistic_cap = NULL,
logistic_floor = NULL,
...
)
Arguments
x |
A dataframe of xreg (exogenous regressors) |
y |
A numeric vector of values to fit |
growth |
String 'linear', 'logistic', or 'flat' to specify a linear, logistic or flat trend. |
n.changepoints |
Number of potential changepoints to include. Not used if input 'changepoints' is supplied. If 'changepoints' is not supplied, then n.changepoints potential changepoints are selected uniformly from the first 'changepoint.range' proportion of df$ds. |
changepoint.range |
Proportion of history in which trend changepoints will be estimated. Defaults to 0.8 for the first 80 'changepoints' is specified. |
yearly.seasonality |
Fit yearly seasonality. Can be 'auto', TRUE, FALSE, or a number of Fourier terms to generate. |
weekly.seasonality |
Fit weekly seasonality. Can be 'auto', TRUE, FALSE, or a number of Fourier terms to generate. |
daily.seasonality |
Fit daily seasonality. Can be 'auto', TRUE, FALSE, or a number of Fourier terms to generate. |
seasonality.mode |
'additive' (default) or 'multiplicative'. |
changepoint.prior.scale |
Parameter modulating the flexibility of the automatic changepoint selection. Large values will allow many changepoints, small values will allow few changepoints. |
seasonality.prior.scale |
Parameter modulating the strength of the seasonality model. Larger values allow the model to fit larger seasonal fluctuations, smaller values dampen the seasonality. Can be specified for individual seasonalities using add_seasonality. |
holidays.prior.scale |
Parameter modulating the strength of the holiday components model, unless overridden in the holidays input. |
regressors.prior.scale |
Float scale for the normal prior.
Default is 10,000.
Gets passed to |
regressors.standardize |
Bool, specify whether this regressor will be
standardized prior to fitting.
Can be 'auto' (standardize if not binary), True, or False.
Gets passed to |
regressors.mode |
Optional, 'additive' or 'multiplicative'.
Defaults to |
logistic_cap |
When growth is logistic, the upper-bound for "saturation". |
logistic_floor |
When growth is logistic, the lower-bound for "saturation". |
... |
Additional arguments passed to |
Tuning Parameters for Prophet Models
Description
Tuning Parameters for Prophet Models
Usage
growth(values = c("linear", "logistic"))
changepoint_num(range = c(0L, 50L), trans = NULL)
changepoint_range(range = c(0.6, 0.9), trans = NULL)
seasonality_yearly(values = c(TRUE, FALSE))
seasonality_weekly(values = c(TRUE, FALSE))
seasonality_daily(values = c(TRUE, FALSE))
prior_scale_changepoints(range = c(-3, 2), trans = log10_trans())
prior_scale_seasonality(range = c(-3, 2), trans = log10_trans())
prior_scale_holidays(range = c(-3, 2), trans = log10_trans())
Arguments
values |
A character string of possible values. |
range |
A two-element vector holding the defaults for the smallest and largest possible values, respectively. If a transformation is specified, these values should be in the transformed units. |
trans |
A |
Details
The main parameters for Prophet models are:
-
growth
: The form of the trend: "linear", or "logistic". -
changepoint_num
: The maximum number of trend changepoints allowed when modeling the trend -
changepoint_range
: The range affects how close the changepoints can go to the end of the time series. The larger the value, the more flexible the trend. Yearly, Weekly, and Daily Seasonality:
-
Yearly:
seasonality_yearly
- Useful when seasonal patterns appear year-over-year -
Weekly:
seasonality_weekly
- Useful when seasonal patterns appear week-over-week (e.g. daily data) -
Daily:
seasonality_daily
- Useful when seasonal patterns appear day-over-day (e.g. hourly data)
-
-
season
:The form of the seasonal term: "additive" or "multiplicative".
See
season()
.
"Prior Scale": Controls flexibility of
-
Changepoints:
prior_scale_changepoints
-
Seasonality:
prior_scale_seasonality
-
Holidays:
prior_scale_holidays
The
log10_trans()
converts priors to a scale from 0.001 to 100, which effectively weights lower values more heavily than larger values.
-
Examples
growth()
changepoint_num()
season()
prior_scale_changepoints()
Bridge prediction function for PROPHET models
Description
Bridge prediction function for PROPHET models
Usage
prophet_predict_impl(object, new_data, ...)
Arguments
object |
An object of class |
new_data |
A rectangular data object, such as a data frame. |
... |
Additional arguments passed to |
General Interface for PROPHET Time Series Models
Description
prophet_reg()
is a way to generate a specification of a PROPHET model
before fitting and allows the model to be created using
different packages. Currently the only package is prophet
.
Usage
prophet_reg(
mode = "regression",
growth = NULL,
changepoint_num = NULL,
changepoint_range = NULL,
seasonality_yearly = NULL,
seasonality_weekly = NULL,
seasonality_daily = NULL,
season = NULL,
prior_scale_changepoints = NULL,
prior_scale_seasonality = NULL,
prior_scale_holidays = NULL,
logistic_cap = NULL,
logistic_floor = NULL
)
Arguments
mode |
A single character string for the type of model. The only possible value for this model is "regression". |
growth |
String 'linear' or 'logistic' to specify a linear or logistic trend. |
changepoint_num |
Number of potential changepoints to include for modeling trend. |
changepoint_range |
Adjusts the flexibility of the trend component by limiting to a percentage of data before the end of the time series. 0.80 means that a changepoint cannot exist after the first 80% of the data. |
seasonality_yearly |
One of "auto", TRUE or FALSE. Toggles on/off a seasonal component that models year-over-year seasonality. |
seasonality_weekly |
One of "auto", TRUE or FALSE. Toggles on/off a seasonal component that models week-over-week seasonality. |
seasonality_daily |
One of "auto", TRUE or FALSE. Toggles on/off a seasonal componet that models day-over-day seasonality. |
season |
'additive' (default) or 'multiplicative'. |
prior_scale_changepoints |
Parameter modulating the flexibility of the automatic changepoint selection. Large values will allow many changepoints, small values will allow few changepoints. |
prior_scale_seasonality |
Parameter modulating the strength of the seasonality model. Larger values allow the model to fit larger seasonal fluctuations, smaller values dampen the seasonality. |
prior_scale_holidays |
Parameter modulating the strength of the holiday components model, unless overridden in the holidays input. |
logistic_cap |
When growth is logistic, the upper-bound for "saturation". |
logistic_floor |
When growth is logistic, the lower-bound for "saturation". |
Details
The data given to the function are not saved and are only used
to determine the mode of the model. For prophet_reg()
, the
mode will always be "regression".
The model can be created using the fit()
function using the
following engines:
"prophet" (default) - Connects to
prophet::prophet()
Main Arguments
The main arguments (tuning parameters) for the model are:
-
growth
: String 'linear' or 'logistic' to specify a linear or logistic trend. -
changepoint_num
: Number of potential changepoints to include for modeling trend. -
changepoint_range
: Range changepoints that adjusts how close to the end the last changepoint can be located. -
season
: 'additive' (default) or 'multiplicative'. -
prior_scale_changepoints
: Parameter modulating the flexibility of the automatic changepoint selection. Large values will allow many changepoints, small values will allow few changepoints. -
prior_scale_seasonality
: Parameter modulating the strength of the seasonality model. Larger values allow the model to fit larger seasonal fluctuations, smaller values dampen the seasonality. -
prior_scale_holidays
: Parameter modulating the strength of the holiday components model, unless overridden in the holidays input. -
logistic_cap
: When growth is logistic, the upper-bound for "saturation". -
logistic_floor
: When growth is logistic, the lower-bound for "saturation".
These arguments are converted to their specific names at the time that the model is fit.
Other options and argument can be
set using set_engine()
(See Engine Details below).
If parameters need to be modified, update()
can be used
in lieu of recreating the object from scratch.
Engine Details
The standardized parameter names in modeltime
can be mapped to their original
names in each engine:
modeltime | prophet |
growth | growth ('linear') |
changepoint_num | n.changepoints (25) |
changepoint_range | changepoints.range (0.8) |
seasonality_yearly | yearly.seasonality ('auto') |
seasonality_weekly | weekly.seasonality ('auto') |
seasonality_daily | daily.seasonality ('auto') |
season | seasonality.mode ('additive') |
prior_scale_changepoints | changepoint.prior.scale (0.05) |
prior_scale_seasonality | seasonality.prior.scale (10) |
prior_scale_holidays | holidays.prior.scale (10) |
logistic_cap | df$cap (NULL) |
logistic_floor | df$floor (NULL) |
Other options can be set using set_engine()
.
prophet
The engine uses prophet::prophet()
.
Function Parameters:
#> function (df = NULL, growth = "linear", changepoints = NULL, n.changepoints = 25, #> changepoint.range = 0.8, yearly.seasonality = "auto", weekly.seasonality = "auto", #> daily.seasonality = "auto", holidays = NULL, seasonality.mode = "additive", #> seasonality.prior.scale = 10, holidays.prior.scale = 10, changepoint.prior.scale = 0.05, #> mcmc.samples = 0, interval.width = 0.8, uncertainty.samples = 1000, #> fit = TRUE, ...)
Parameter Notes:
-
df
: This is supplied via the parsnip / modeltimefit()
interface (so don't provide this manually). See Fit Details (below). -
holidays
: A data.frame of holidays can be supplied viaset_engine()
-
uncertainty.samples
: The default is set to 0 because the prophet uncertainty intervals are not used as part of the Modeltime Workflow. You can override this setting if you plan to use prophet's uncertainty tools.
Regressors:
Regressors are provided via the
fit()
orrecipes
interface, which passes regressors toprophet::add_regressor()
Parameters can be controlled in
set_engine()
via:regressors.prior.scale
,regressors.standardize
, andregressors.mode
The regressor prior scale implementation default is
regressors.prior.scale = 1e4
, which deviates from theprophet
implementation (defaults to holidays.prior.scale)
Logistic Growth and Saturation Levels:
For
growth = "logistic"
, simply add numeric values forlogistic_cap
and / orlogistic_floor
. There is no need to add additional columns for "cap" and "floor" to your data frame.
Limitations:
-
prophet::add_seasonality()
is not currently implemented. It's used to specify non-standard seasonalities using fourier series. An alternative is to usestep_fourier()
and supply custom seasonalities as Extra Regressors.
Fit Details
Date and Date-Time Variable
It's a requirement to have a date or date-time variable as a predictor.
The fit()
interface accepts date and date-time features and handles them internally.
-
fit(y ~ date)
Univariate (No Extra Regressors):
For univariate analysis, you must include a date or date-time feature. Simply use:
Formula Interface (recommended):
fit(y ~ date)
will ignore xreg's.XY Interface:
fit_xy(x = data[,"date"], y = data$y)
will ignore xreg's.
Multivariate (Extra Regressors)
Extra Regressors parameter is populated using the fit()
or fit_xy()
function:
Only
factor
,ordered factor
, andnumeric
data will be used as xregs.Date and Date-time variables are not used as xregs
-
character
data should be converted to factor.
Xreg Example: Suppose you have 3 features:
-
y
(target) -
date
(time stamp), -
month.lbl
(labeled month as a ordered factor).
The month.lbl
is an exogenous regressor that can be passed to the arima_reg()
using
fit()
:
-
fit(y ~ date + month.lbl)
will passmonth.lbl
on as an exogenous regressor. -
fit_xy(data[,c("date", "month.lbl")], y = data$y)
will pass x, where x is a data frame containingmonth.lbl
and thedate
feature. Onlymonth.lbl
will be used as an exogenous regressor.
Note that date or date-time class values are excluded from xreg
.
See Also
fit.model_spec()
, set_engine()
Examples
library(dplyr)
library(parsnip)
library(rsample)
library(timetk)
# Data
m750 <- m4_monthly %>% filter(id == "M750")
m750
# Split Data 80/20
splits <- initial_time_split(m750, prop = 0.8)
# ---- PROPHET ----
# Model Spec
model_spec <- prophet_reg() %>%
set_engine("prophet")
# Fit Spec
model_fit <- model_spec %>%
fit(log(value) ~ date, data = training(splits))
model_fit
Low-Level PROPHET function for translating modeltime to Boosted PROPHET
Description
Low-Level PROPHET function for translating modeltime to Boosted PROPHET
Usage
prophet_xgboost_fit_impl(
x,
y,
df = NULL,
growth = "linear",
changepoints = NULL,
n.changepoints = 25,
changepoint.range = 0.8,
yearly.seasonality = "auto",
weekly.seasonality = "auto",
daily.seasonality = "auto",
holidays = NULL,
seasonality.mode = "additive",
seasonality.prior.scale = 10,
holidays.prior.scale = 10,
changepoint.prior.scale = 0.05,
logistic_cap = NULL,
logistic_floor = NULL,
mcmc.samples = 0,
interval.width = 0.8,
uncertainty.samples = 1000,
fit = TRUE,
max_depth = 6,
nrounds = 15,
eta = 0.3,
colsample_bytree = NULL,
colsample_bynode = NULL,
min_child_weight = 1,
gamma = 0,
subsample = 1,
validation = 0,
early_stop = NULL,
...
)
Arguments
x |
A dataframe of xreg (exogenous regressors) |
y |
A numeric vector of values to fit |
df |
(optional) Dataframe containing the history. Must have columns ds (date type) and y, the time series. If growth is logistic, then df must also have a column cap that specifies the capacity at each ds. If not provided, then the model object will be instantiated but not fit; use fit.prophet(m, df) to fit the model. |
growth |
String 'linear', 'logistic', or 'flat' to specify a linear, logistic or flat trend. |
changepoints |
Vector of dates at which to include potential changepoints. If not specified, potential changepoints are selected automatically. |
n.changepoints |
Number of potential changepoints to include. Not used if input 'changepoints' is supplied. If 'changepoints' is not supplied, then n.changepoints potential changepoints are selected uniformly from the first 'changepoint.range' proportion of df$ds. |
changepoint.range |
Proportion of history in which trend changepoints will be estimated. Defaults to 0.8 for the first 80 'changepoints' is specified. |
yearly.seasonality |
Fit yearly seasonality. Can be 'auto', TRUE, FALSE, or a number of Fourier terms to generate. |
weekly.seasonality |
Fit weekly seasonality. Can be 'auto', TRUE, FALSE, or a number of Fourier terms to generate. |
daily.seasonality |
Fit daily seasonality. Can be 'auto', TRUE, FALSE, or a number of Fourier terms to generate. |
holidays |
data frame with columns holiday (character) and ds (date type)and optionally columns lower_window and upper_window which specify a range of days around the date to be included as holidays. lower_window=-2 will include 2 days prior to the date as holidays. Also optionally can have a column prior_scale specifying the prior scale for each holiday. |
seasonality.mode |
'additive' (default) or 'multiplicative'. |
seasonality.prior.scale |
Parameter modulating the strength of the seasonality model. Larger values allow the model to fit larger seasonal fluctuations, smaller values dampen the seasonality. Can be specified for individual seasonalities using add_seasonality. |
holidays.prior.scale |
Parameter modulating the strength of the holiday components model, unless overridden in the holidays input. |
changepoint.prior.scale |
Parameter modulating the flexibility of the automatic changepoint selection. Large values will allow many changepoints, small values will allow few changepoints. |
logistic_cap |
When growth is logistic, the upper-bound for "saturation". |
logistic_floor |
When growth is logistic, the lower-bound for "saturation". |
mcmc.samples |
Integer, if greater than 0, will do full Bayesian inference with the specified number of MCMC samples. If 0, will do MAP estimation. |
interval.width |
Numeric, width of the uncertainty intervals provided for the forecast. If mcmc.samples=0, this will be only the uncertainty in the trend using the MAP estimate of the extrapolated generative model. If mcmc.samples>0, this will be integrated over all model parameters, which will include uncertainty in seasonality. |
uncertainty.samples |
Number of simulated draws used to estimate uncertainty intervals. Settings this value to 0 or False will disable uncertainty estimation and speed up the calculation. |
fit |
Boolean, if FALSE the model is initialized but not fit. |
max_depth |
An integer for the maximum depth of the tree. |
nrounds |
An integer for the number of boosting iterations. |
eta |
A numeric value between zero and one to control the learning rate. |
colsample_bytree |
Subsampling proportion of columns. |
colsample_bynode |
Subsampling proportion of columns for each node
within each tree. See the |
min_child_weight |
A numeric value for the minimum sum of instance weights needed in a child to continue to split. |
gamma |
A number for the minimum loss reduction required to make a further partition on a leaf node of the tree |
subsample |
Subsampling proportion of rows. |
validation |
A positive number. If on |
early_stop |
An integer or |
... |
Additional arguments passed to |
Bridge prediction function for Boosted PROPHET models
Description
Bridge prediction function for Boosted PROPHET models
Usage
prophet_xgboost_predict_impl(object, new_data, ...)
Arguments
object |
An object of class |
new_data |
A rectangular data object, such as a data frame. |
... |
Additional arguments passed to |
Extracts modeltime residuals data from a Modeltime Model
Description
If a modeltime model contains data
with residuals information,
this function will extract the data frame.
Usage
pull_modeltime_residuals(object)
Arguments
object |
A fitted |
Value
A tibble
containing the model timestamp, actual, fitted, and residuals data
Pulls the Formula from a Fitted Parsnip Model Object
Description
Pulls the Formula from a Fitted Parsnip Model Object
Usage
pull_parsnip_preprocessor(object)
Arguments
object |
A fitted parsnip model |
Value
A formula using stats::formula()
Developer Tools for processing XREGS (Regressors)
Description
Wrappers for using recipes::bake
and recipes::juice
to process data
returning data in either data frame
or matrix
format (Common formats needed
for machine learning algorithms).
Usage
juice_xreg_recipe(recipe, format = c("tbl", "matrix"))
bake_xreg_recipe(recipe, new_data, format = c("tbl", "matrix"))
Arguments
recipe |
A prepared recipe |
format |
One of:
|
new_data |
Data to be processed by a recipe |
Value
Data in either the tbl
(data.frame) or matrix
formats
Examples
library(dplyr)
library(timetk)
library(recipes)
library(lubridate)
predictors <- m4_monthly %>%
filter(id == "M750") %>%
select(-value) %>%
mutate(month = month(date, label = TRUE))
predictors
# Create default recipe
xreg_recipe_spec <- create_xreg_recipe(predictors, prepare = TRUE)
# Extracts the preprocessed training data from the recipe (used in your fit function)
juice_xreg_recipe(xreg_recipe_spec)
# Applies the prepared recipe to new data (used in your predict function)
bake_xreg_recipe(xreg_recipe_spec, new_data = predictors)
Create a Recursive Time Series Model from a Parsnip or Workflow Regression Model
Description
Create a Recursive Time Series Model from a Parsnip or Workflow Regression Model
Usage
recursive(object, transform, train_tail, id = NULL, chunk_size = 1, ...)
Arguments
object |
An object of |
transform |
A transformation performed on
|
train_tail |
A tibble with tail of training data set. In most cases it'll be required to create some variables based on dependent variable. |
id |
(Optional) An identifier that can be provided to perform a panel forecast.
A single quoted column name (e.g. |
chunk_size |
The size of the smallest lag used in |
... |
Not currently used. |
Details
What is a Recursive Model?
A recursive model uses predictions to generate new values for independent features. These features are typically lags used in autoregressive models. It's important to understand that a recursive model is only needed when the Lag Size < Forecast Horizon.
Why is Recursive needed for Autoregressive Models with Lag Size < Forecast Horizon?
When the lag length is less than the forecast horizon,
a problem exists were missing values (NA
) are
generated in the future data. A solution that recursive()
implements
is to iteratively fill these missing values in with values generated
from predictions.
Recursive Process
When producing forecast, the following steps are performed:
Computing forecast for first row of new data. The first row cannot contain NA in any required column.
Filling i-th place of the dependent variable column with already computed forecast.
Computing missing features for next step, based on already calculated prediction. These features are computed with on a tibble object made from binded
train_tail
(i.e. tail of training data set) andnew_data
(which is an argument of predict function).Jumping into point 2., and repeating rest of steps till the for-loop is ended.
Recursion for Panel Data
Panel data is time series data with multiple groups identified by an ID column.
The recursive()
function can be used for Panel Data with the following modifications:
Supply an
id
column as a quoted column nameReplace
tail()
withpanel_tail()
to use tails for each time series group.
Value
An object with added recursive
class
See Also
-
panel_tail()
- Used to generate tails for multiple time series groups.
Examples
# Libraries & Setup ----
library(tidymodels)
library(dplyr)
library(tidyr)
library(timetk)
library(slider)
# ---- SINGLE TIME SERIES (NON-PANEL) -----
m750
FORECAST_HORIZON <- 24
m750_extended <- m750 %>%
group_by(id) %>%
future_frame(
.length_out = FORECAST_HORIZON,
.bind_data = TRUE
) %>%
ungroup()
# TRANSFORM FUNCTION ----
# - Function runs recursively that updates the forecasted dataset
lag_roll_transformer <- function(data){
data %>%
# Lags
tk_augment_lags(value, .lags = 1:12) %>%
# Rolling Features
mutate(rolling_mean_12 = lag(slide_dbl(
value, .f = mean, .before = 12, .complete = FALSE
), 1))
}
# Data Preparation
m750_rolling <- m750_extended %>%
lag_roll_transformer() %>%
select(-id)
train_data <- m750_rolling %>%
drop_na()
future_data <- m750_rolling %>%
filter(is.na(value))
# Modeling
# Straight-Line Forecast
model_fit_lm <- linear_reg() %>%
set_engine("lm") %>%
# Use only date feature as regressor
fit(value ~ date, data = train_data)
# Autoregressive Forecast
model_fit_lm_recursive <- linear_reg() %>%
set_engine("lm") %>%
# Use date plus all lagged features
fit(value ~ ., data = train_data) %>%
# Add recursive() w/ transformer and train_tail
recursive(
transform = lag_roll_transformer,
train_tail = tail(train_data, FORECAST_HORIZON)
)
model_fit_lm_recursive
# Forecasting
modeltime_table(
model_fit_lm,
model_fit_lm_recursive
) %>%
update_model_description(2, "LM - Lag Roll") %>%
modeltime_forecast(
new_data = future_data,
actual_data = m750
) %>%
plot_modeltime_forecast(
.interactive = FALSE,
.conf_interval_show = FALSE
)
# MULTIPLE TIME SERIES (PANEL DATA) -----
m4_monthly
FORECAST_HORIZON <- 24
m4_extended <- m4_monthly %>%
group_by(id) %>%
future_frame(
.length_out = FORECAST_HORIZON,
.bind_data = TRUE
) %>%
ungroup()
# TRANSFORM FUNCTION ----
# - NOTE - We create lags by group
lag_transformer_grouped <- function(data){
data %>%
group_by(id) %>%
tk_augment_lags(value, .lags = 1:FORECAST_HORIZON) %>%
ungroup()
}
m4_lags <- m4_extended %>%
lag_transformer_grouped()
train_data <- m4_lags %>%
drop_na()
future_data <- m4_lags %>%
filter(is.na(value))
# Modeling Autoregressive Panel Data
model_fit_lm_recursive <- linear_reg() %>%
set_engine("lm") %>%
fit(value ~ ., data = train_data) %>%
recursive(
id = "id", # We add an id = "id" to specify the groups
transform = lag_transformer_grouped,
# We use panel_tail() to grab tail by groups
train_tail = panel_tail(train_data, id, FORECAST_HORIZON)
)
modeltime_table(
model_fit_lm_recursive
) %>%
modeltime_forecast(
new_data = future_data,
actual_data = m4_monthly,
keep_data = TRUE
) %>%
group_by(id) %>%
plot_modeltime_forecast(
.interactive = FALSE,
.conf_interval_show = FALSE
)
General Interface for Multiple Seasonality Regression Models (TBATS, STLM)
Description
seasonal_reg()
is a way to generate a specification of an
Seasonal Decomposition model
before fitting and allows the model to be created using
different packages. Currently the only package is forecast
.
Usage
seasonal_reg(
mode = "regression",
seasonal_period_1 = NULL,
seasonal_period_2 = NULL,
seasonal_period_3 = NULL
)
Arguments
mode |
A single character string for the type of model. The only possible value for this model is "regression". |
seasonal_period_1 |
(required) The primary seasonal frequency.
Uses |
seasonal_period_2 |
(optional) A second seasonal frequency.
Is |
seasonal_period_3 |
(optional) A third seasonal frequency.
Is |
Details
The data given to the function are not saved and are only used
to determine the mode of the model. For seasonal_reg()
, the
mode will always be "regression".
The model can be created using the fit()
function using the
following engines:
"tbats" - Connects to
forecast::tbats()
"stlm_ets" - Connects to
forecast::stlm()
,method = "ets"
"stlm_arima" - Connects to
forecast::stlm()
,method = "arima"
Engine Details
The standardized parameter names in modeltime
can be mapped to their original
names in each engine:
modeltime | forecast::stlm | forecast::tbats |
seasonal_period_1, seasonal_period_2, seasonal_period_3 | msts(seasonal.periods) | msts(seasonal.periods) |
Other options can be set using set_engine()
.
The engines use forecast::stlm()
.
Function Parameters:
#> function (y, s.window = 7 + 4 * seq(6), robust = FALSE, method = c("ets", #> "arima"), modelfunction = NULL, model = NULL, etsmodel = "ZZN", lambda = NULL, #> biasadj = FALSE, xreg = NULL, allow.multiplicative.trend = FALSE, x = y, #> ...)
tbats
-
Method: Uses
method = "tbats"
, which by default is auto-TBATS. -
Xregs: Univariate. Cannot accept Exogenous Regressors (xregs). Xregs are ignored.
stlm_ets
-
Method: Uses
method = "stlm_ets"
, which by default is auto-ETS. -
Xregs: Univariate. Cannot accept Exogenous Regressors (xregs). Xregs are ignored.
stlm_arima
-
Method: Uses
method = "stlm_arima"
, which by default is auto-ARIMA. -
Xregs: Multivariate. Can accept Exogenous Regressors (xregs).
Fit Details
Date and Date-Time Variable
It's a requirement to have a date or date-time variable as a predictor.
The fit()
interface accepts date and date-time features and handles them internally.
-
fit(y ~ date)
Seasonal Period Specification
The period can be non-seasonal (seasonal_period = 1 or "none"
) or
yearly seasonal (e.g. For monthly time stamps, seasonal_period = 12
, seasonal_period = "12 months"
, or seasonal_period = "yearly"
).
There are 3 ways to specify:
-
seasonal_period = "auto"
: A seasonal period is selected based on the periodicity of the data (e.g. 12 if monthly) -
seasonal_period = 12
: A numeric frequency. For example, 12 is common for monthly data -
seasonal_period = "1 year"
: A time-based phrase. For example, "1 year" would convert to 12 for monthly data.
Univariate (No xregs, Exogenous Regressors):
For univariate analysis, you must include a date or date-time feature. Simply use:
Formula Interface (recommended):
fit(y ~ date)
will ignore xreg's.XY Interface:
fit_xy(x = data[,"date"], y = data$y)
will ignore xreg's.
Multivariate (xregs, Exogenous Regressors)
The
tbats
engine cannot accept Xregs.The
stlm_ets
engine cannot accept Xregs.The
stlm_arima
engine can accept Xregs
The xreg
parameter is populated using the fit()
or fit_xy()
function:
Only
factor
,ordered factor
, andnumeric
data will be used as xregs.Date and Date-time variables are not used as xregs
-
character
data should be converted to factor.
Xreg Example: Suppose you have 3 features:
-
y
(target) -
date
(time stamp), -
month.lbl
(labeled month as a ordered factor).
The month.lbl
is an exogenous regressor that can be passed to the seasonal_reg()
using
fit()
:
-
fit(y ~ date + month.lbl)
will passmonth.lbl
on as an exogenous regressor. -
fit_xy(data[,c("date", "month.lbl")], y = data$y)
will pass x, where x is a data frame containingmonth.lbl
and thedate
feature. Onlymonth.lbl
will be used as an exogenous regressor.
Note that date or date-time class values are excluded from xreg
.
See Also
fit.model_spec()
, set_engine()
Examples
library(dplyr)
library(parsnip)
library(rsample)
library(timetk)
# Data
taylor_30_min
# Split Data 80/20
splits <- initial_time_split(taylor_30_min, prop = 0.8)
# ---- STLM ETS ----
# Model Spec
model_spec <- seasonal_reg() %>%
set_engine("stlm_ets")
# Fit Spec
model_fit <- model_spec %>%
fit(log(value) ~ date, data = training(splits))
model_fit
# ---- STLM ARIMA ----
# Model Spec
model_spec <- seasonal_reg() %>%
set_engine("stlm_arima")
# Fit Spec
model_fit <- model_spec %>%
fit(log(value) ~ date, data = training(splits))
model_fit
Low-Level Exponential Smoothing function for translating modeltime to forecast
Description
Low-Level Exponential Smoothing function for translating modeltime to forecast
Usage
smooth_fit_impl(
x,
y,
period = "auto",
error = "auto",
trend = "auto",
season = "auto",
damping = NULL,
alpha = NULL,
beta = NULL,
gamma = NULL,
...
)
Arguments
x |
A dataframe of xreg (exogenous regressors) |
y |
A numeric vector of values to fit |
period |
A seasonal frequency. Uses "auto" by default. A character phrase of "auto" or time-based phrase of "2 weeks" can be used if a date or date-time variable is provided. |
error |
The form of the error term: "auto", "additive", or "multiplicative". If the error is multiplicative, the data must be non-negative. |
trend |
The form of the trend term: "auto", "additive", "multiplicative" or "none". |
season |
The form of the seasonal term: "auto", "additive", "multiplicative" or "none". |
damping |
Apply damping to a trend: "auto", "damped", or "none". |
alpha |
Value of alpha. If NULL, it is estimated. |
beta |
Value of beta. If NULL, it is estimated. |
gamma |
Value of gamma. If NULL, it is estimated. |
... |
Additional arguments passed to |
Bridge prediction function for Exponential Smoothing models
Description
Bridge prediction function for Exponential Smoothing models
Usage
smooth_predict_impl(object, new_data, ...)
Arguments
object |
An object of class |
new_data |
A rectangular data object, such as a data frame. |
... |
Additional arguments passed to |
Low-Level SNAIVE Forecast
Description
Low-Level SNAIVE Forecast
Usage
snaive_fit_impl(x, y, id = NULL, seasonal_period = "auto", ...)
Arguments
x |
A dataframe of xreg (exogenous regressors) |
y |
A numeric vector of values to fit |
id |
An optional ID feature to identify different time series. Should be a quoted name. |
seasonal_period |
The seasonal period to forecast into the future |
... |
Not currently used |
Bridge prediction function for SNAIVE Models
Description
Bridge prediction function for SNAIVE Models
Usage
snaive_predict_impl(object, new_data)
Arguments
object |
An object of class |
new_data |
A rectangular data object, such as a data frame. |
Low-Level stlm function for translating modeltime to forecast
Description
Low-Level stlm function for translating modeltime to forecast
Usage
stlm_arima_fit_impl(
x,
y,
period_1 = "auto",
period_2 = NULL,
period_3 = NULL,
...
)
Arguments
x |
A dataframe of xreg (exogenous regressors) |
y |
A numeric vector of values to fit |
period_1 |
(required) First seasonal frequency. Uses "auto" by default. A character phrase of "auto" or time-based phrase of "2 weeks" can be used if a date or date-time variable is provided. |
period_2 |
(optional) First seasonal frequency. Uses "auto" by default. A character phrase of "auto" or time-based phrase of "2 weeks" can be used if a date or date-time variable is provided. |
period_3 |
(optional) First seasonal frequency. Uses "auto" by default. A character phrase of "auto" or time-based phrase of "2 weeks" can be used if a date or date-time variable is provided. |
... |
Additional arguments passed to |
Bridge prediction function for ARIMA models
Description
Bridge prediction function for ARIMA models
Usage
stlm_arima_predict_impl(object, new_data, ...)
Arguments
object |
An object of class |
new_data |
A rectangular data object, such as a data frame. |
... |
Additional arguments passed to |
Low-Level stlm function for translating modeltime to forecast
Description
Low-Level stlm function for translating modeltime to forecast
Usage
stlm_ets_fit_impl(
x,
y,
period_1 = "auto",
period_2 = NULL,
period_3 = NULL,
...
)
Arguments
x |
A dataframe of xreg (exogenous regressors) |
y |
A numeric vector of values to fit |
period_1 |
(required) First seasonal frequency. Uses "auto" by default. A character phrase of "auto" or time-based phrase of "2 weeks" can be used if a date or date-time variable is provided. |
period_2 |
(optional) First seasonal frequency. Uses "auto" by default. A character phrase of "auto" or time-based phrase of "2 weeks" can be used if a date or date-time variable is provided. |
period_3 |
(optional) First seasonal frequency. Uses "auto" by default. A character phrase of "auto" or time-based phrase of "2 weeks" can be used if a date or date-time variable is provided. |
... |
Additional arguments passed to |
Bridge prediction function for ARIMA models
Description
Bridge prediction function for ARIMA models
Usage
stlm_ets_predict_impl(object, new_data, ...)
Arguments
object |
An object of class |
new_data |
A rectangular data object, such as a data frame. |
... |
Additional arguments passed to |
Summarize Accuracy Metrics
Description
This is an internal function used by modeltime_accuracy()
.
Usage
summarize_accuracy_metrics(data, truth, estimate, metric_set)
Arguments
data |
A |
truth |
The column identifier for the true results (that is numeric). |
estimate |
The column identifier for the predicted results (that is also numeric). |
metric_set |
A |
Examples
library(dplyr)
predictions_tbl <- tibble(
group = c("model 1", "model 1", "model 1",
"model 2", "model 2", "model 2"),
truth = c(1, 2, 3,
1, 2, 3),
estimate = c(1.2, 2.0, 2.5,
0.9, 1.9, 3.3)
)
predictions_tbl %>%
group_by(group) %>%
summarize_accuracy_metrics(
truth, estimate,
metric_set = default_forecast_accuracy_metric_set()
)
Interactive Accuracy Tables
Description
Converts results from modeltime_accuracy()
into
either interactive (reactable
) or static (gt
) tables.
Usage
table_modeltime_accuracy(
.data,
.round_digits = 2,
.sortable = TRUE,
.show_sortable = TRUE,
.searchable = TRUE,
.filterable = FALSE,
.expand_groups = TRUE,
.title = "Accuracy Table",
.interactive = TRUE,
...
)
Arguments
.data |
A |
.round_digits |
Rounds accuracy metrics to a specified number of digits.
If |
.sortable |
Allows sorting by columns.
Only applied to |
.show_sortable |
Shows sorting.
Only applied to |
.searchable |
Adds search input.
Only applied to |
.filterable |
Adds filters to table columns.
Only applied to |
.expand_groups |
Expands groups dropdowns.
Only applied to |
.title |
A title for static ( |
.interactive |
Return interactive or static tables. If |
... |
Additional arguments passed to |
Details
Groups
The function respects dplyr::group_by()
groups and thus scales with multiple groups.
Reactable Output
A reactable()
table is an interactive format that enables live searching and sorting.
When .interactive = TRUE
, a call is made to reactable::reactable()
.
table_modeltime_accuracy()
includes several common options like toggles for sorting and searching.
Additional arguments can be passed to reactable::reactable()
via ...
.
GT Output
A gt
table is an HTML-based table that is "static" (e.g. non-searchable, non-sortable). It's
commonly used in PDF and Word documents that does not support interactive content.
When .interactive = FALSE
, a call is made to gt::gt()
. Arguments can be passed via ...
.
Table customization is implemented using a piping workflow (%>%
).
For more information, refer to the GT Documentation.
Value
A static gt
table or an interactive reactable
table containing
the accuracy information.
Examples
library(dplyr)
library(lubridate)
library(timetk)
library(parsnip)
library(rsample)
# Data
m750 <- m4_monthly %>% filter(id == "M750")
# Split Data 80/20
splits <- initial_time_split(m750, prop = 0.9)
# --- MODELS ---
# Model 1: prophet ----
model_fit_prophet <- prophet_reg() %>%
set_engine(engine = "prophet") %>%
fit(value ~ date, data = training(splits))
# ---- MODELTIME TABLE ----
models_tbl <- modeltime_table(
model_fit_prophet
)
# ---- ACCURACY ----
models_tbl %>%
modeltime_calibrate(new_data = testing(splits)) %>%
modeltime_accuracy() %>%
table_modeltime_accuracy()
Low-Level tbats function for translating modeltime to forecast
Description
Low-Level tbats function for translating modeltime to forecast
Usage
tbats_fit_impl(
x,
y,
period_1 = "auto",
period_2 = NULL,
period_3 = NULL,
use.parallel = length(y) > 1000,
...
)
Arguments
x |
A dataframe of xreg (exogenous regressors) |
y |
A numeric vector of values to fit |
period_1 |
(required) First seasonal frequency. Uses "auto" by default. A character phrase of "auto" or time-based phrase of "2 weeks" can be used if a date or date-time variable is provided. |
period_2 |
(optional) First seasonal frequency. Uses "auto" by default. A character phrase of "auto" or time-based phrase of "2 weeks" can be used if a date or date-time variable is provided. |
period_3 |
(optional) First seasonal frequency. Uses "auto" by default. A character phrase of "auto" or time-based phrase of "2 weeks" can be used if a date or date-time variable is provided. |
use.parallel |
|
... |
Additional arguments passed to |
Bridge prediction function for ARIMA models
Description
Bridge prediction function for ARIMA models
Usage
tbats_predict_impl(object, new_data, ...)
Arguments
object |
An object of class |
new_data |
A rectangular data object, such as a data frame. |
... |
Additional arguments passed to |
Low-Level Temporaral Hierarchical function for translating modeltime to forecast
Description
Low-Level Temporaral Hierarchical function for translating modeltime to forecast
Usage
temporal_hier_fit_impl(
x,
y,
period = "auto",
comb = c("struc", "mse", "ols", "bu", "shr", "sam"),
usemodel = c("ets", "arima", "theta", "naive", "snaive"),
...
)
Arguments
x |
A dataframe of xreg (exogenous regressors) |
y |
A numeric vector of values to fit |
period |
A seasonal frequency. Uses "auto" by default. A character phrase of "auto" or time-based phrase of "2 weeks" can be used if a date or date-time variable is provided. |
comb |
Combination method of temporal hierarchies |
usemodel |
Model used for forecasting each aggregation level |
... |
Additional arguments passed to |
Bridge prediction function for TEMPORAL HIERARCHICAL models
Description
Bridge prediction function for TEMPORAL HIERARCHICAL models
Usage
temporal_hier_predict_impl(object, new_data, ...)
Arguments
object |
An object of class |
new_data |
A rectangular data object, such as a data frame. |
... |
Additional arguments passed to |
General Interface for Temporal Hierarchical Forecasting (THIEF) Models
Description
temporal_hierarchy()
is a way to generate a specification of an Temporal Hierarchical Forecasting model
before fitting and allows the model to be created using
different packages. Currently the only package is thief
. Note this
function requires the thief
package to be installed.
Usage
temporal_hierarchy(
mode = "regression",
seasonal_period = NULL,
combination_method = NULL,
use_model = NULL
)
Arguments
mode |
A single character string for the type of model. The only possible value for this model is "regression". |
seasonal_period |
A seasonal frequency. Uses "auto" by default. A character phrase of "auto" or time-based phrase of "2 weeks" can be used if a date or date-time variable is provided. See Fit Details below. |
combination_method |
Combination method of temporal hierarchies, taking one of the following values:
|
use_model |
Model used for forecasting each aggregation level:
|
Details
Models can be created using the following engines:
"thief" (default) - Connects to
thief::thief()
Engine Details
The standardized parameter names in modeltime
can be mapped to their original
names in each engine:
modeltime | thief::thief() |
combination_method | comb |
use_model | usemodel |
Other options can be set using set_engine()
.
thief (default engine)
The engine uses thief::thief()
.
Function Parameters:
#> function (y, m = frequency(y), h = m * 2, comb = c("struc", "mse", "ols", #> "bu", "shr", "sam"), usemodel = c("ets", "arima", "theta", "naive", #> "snaive"), forecastfunction = NULL, aggregatelist = NULL, ...)
Other options and argument can be set using set_engine()
.
Parameter Notes:
-
xreg
- This model is not set up to use exogenous regressors. Only univariate models will be fit.
Fit Details
Date and Date-Time Variable
It's a requirement to have a date or date-time variable as a predictor.
The fit()
interface accepts date and date-time features and handles them internally.
-
fit(y ~ date)
Univariate:
For univariate analysis, you must include a date or date-time feature. Simply use:
Formula Interface (recommended):
fit(y ~ date)
will ignore xreg's.XY Interface:
fit_xy(x = data[,"date"], y = data$y)
will ignore xreg's.
Multivariate (xregs, Exogenous Regressors)
This model is not set up for use with exogenous regressors.
References
For forecasting with temporal hierarchies see: Athanasopoulos G., Hyndman R.J., Kourentzes N., Petropoulos F. (2017) Forecasting with Temporal Hierarchies. European Journal of Operational research, 262(1), 60-74.
For combination operators see: Kourentzes N., Barrow B.K., Crone S.F. (2014) Neural network ensemble operators for time series forecasting. Expert Systems with Applications, 41(9), 4235-4244.
See Also
fit.model_spec()
, set_engine()
Examples
library(dplyr)
library(parsnip)
library(rsample)
library(timetk)
library(thief)
# Data
m750 <- m4_monthly %>% filter(id == "M750")
m750
# Split Data 80/20
splits <- initial_time_split(m750, prop = 0.8)
# ---- HIERARCHICAL ----
# Model Spec - The default parameters are all set
# to "auto" if none are provided
model_spec <- temporal_hierarchy() %>%
set_engine("thief")
# Fit Spec
model_fit <- model_spec %>%
fit(log(value) ~ date, data = training(splits))
model_fit
Tuning Parameters for TEMPORAL HIERARCHICAL Models
Description
Tuning Parameters for TEMPORAL HIERARCHICAL Models
Usage
combination_method(values = c("struc", "mse", "ols", "bu", "shr", "sam"))
use_model()
Arguments
values |
A character string of possible values. |
Details
The main parameters for Temporal Hierarchical models are:
-
combination_method
: Combination method of temporal hierarchies. -
use_model
: Model used for forecasting each aggregation level.
Examples
combination_method()
use_model()
Low-Level Exponential Smoothing function for translating modeltime to forecast
Description
Low-Level Exponential Smoothing function for translating modeltime to forecast
Usage
theta_fit_impl(x, y, ...)
Arguments
x |
A dataframe of xreg (exogenous regressors) |
y |
A numeric vector of values to fit |
... |
Additional arguments passed to |
Bridge prediction function for THETA models
Description
Bridge prediction function for THETA models
Usage
theta_predict_impl(object, new_data, ...)
Arguments
object |
An object of class |
new_data |
A rectangular data object, such as a data frame. |
... |
Additional arguments passed to |
Tidy eval helpers
Description
-
sym()
creates a symbol from a string andsyms()
creates a list of symbols from a character vector. -
enquo()
andenquos()
delay the execution of one or several function arguments.enquo()
returns a single quoted expression, which is like a blueprint for the delayed computation.enquos()
returns a list of such quoted expressions. -
expr()
quotes a new expression locally. It is mostly useful to build new expressions around arguments captured withenquo()
orenquos()
:expr(mean(!!enquo(arg), na.rm = TRUE))
. -
as_name()
transforms a quoted variable name into a string. Supplying something else than a quoted variable name is an error.That's unlike
as_label()
which also returns a single string but supports any kind of R object as input, including quoted function calls and vectors. Its purpose is to summarise that object into a single label. That label is often suitable as a default name.If you don't know what a quoted expression contains (for instance expressions captured with
enquo()
could be a variable name, a call to a function, or an unquoted constant), then useas_label()
. If you know you have quoted a simple variable name, or would like to enforce this, useas_name()
.
To learn more about tidy eval and how to use these tools, visit the Metaprogramming section of Advanced R.
Tuning Parameters for Time Series (ts-class) Models
Description
Tuning Parameters for Time Series (ts-class) Models
Usage
seasonal_period(values = c("none", "daily", "weekly", "yearly"))
Arguments
values |
A time-based phrase |
Details
Time series models (e.g. Arima()
and ets()
) use stats::ts()
or forecast::msts()
to apply seasonality. We can do the same process using the following
general time series parameter:
-
period
: The periodic nature of the seasonality.
It's usually best practice to not tune this parameter, but rather set to obvious values based on the seasonality of the data:
-
Daily Seasonality: Often used with hourly data (e.g. 24 hourly timestamps per day)
-
Weekly Seasonality: Often used with daily data (e.g. 7 daily timestamps per week)
-
Yearly Seasonalty: Often used with weekly, monthly, and quarterly data (e.g. 12 monthly observations per year).
However, in the event that users want to experiment with period tuning, you
can do so with seasonal_period()
.
Examples
seasonal_period()
Succinct summary of Modeltime Tables
Description
type_sum
controls how objects are shown when inside tibble
columns.
Usage
## S3 method for class 'mdl_time_tbl'
type_sum(x)
Arguments
x |
A |
Value
A character value.
Update the model description by model id in a Modeltime Table
Description
The update_model_description()
and update_modeltime_description()
functions
are synonyms.
Usage
update_model_description(object, .model_id, .new_model_desc)
update_modeltime_description(object, .model_id, .new_model_desc)
Arguments
object |
A Modeltime Table |
.model_id |
A numeric value matching the .model_id that you want to update |
.new_model_desc |
Text describing the new model description |
See Also
-
combine_modeltime_tables()
: Combine 2 or more Modeltime Tables together -
add_modeltime_model()
: Adds a new row with a new model to a Modeltime Table -
drop_modeltime_model()
: Drop one or more models from a Modeltime Table -
update_modeltime_description()
: Updates a description for a model inside a Modeltime Table -
update_modeltime_model()
: Updates a model inside a Modeltime Table -
pull_modeltime_model()
: Extracts a model from a Modeltime Table
Examples
m750_models %>%
update_modeltime_description(2, "PROPHET - No Regressors")
Update the model by model id in a Modeltime Table
Description
Update the model by model id in a Modeltime Table
Usage
update_modeltime_model(object, .model_id, .new_model)
Arguments
object |
A Modeltime Table |
.model_id |
A numeric value matching the .model_id that you want to update |
.new_model |
A fitted workflow, model_fit, or mdl_time_ensmble object |
See Also
-
combine_modeltime_tables()
: Combine 2 or more Modeltime Tables together -
add_modeltime_model()
: Adds a new row with a new model to a Modeltime Table -
drop_modeltime_model()
: Drop one or more models from a Modeltime Table -
update_modeltime_description()
: Updates a description for a model inside a Modeltime Table -
update_modeltime_model()
: Updates a model inside a Modeltime Table -
pull_modeltime_model()
: Extracts a model from a Modeltime Table
Examples
library(tidymodels)
model_fit_ets <- exp_smoothing() %>%
set_engine("ets") %>%
fit(value ~ date, training(m750_splits))
m750_models %>%
update_modeltime_model(1, model_fit_ets)
Low-Level Window Forecast
Description
Low-Level Window Forecast
Usage
window_function_fit_impl(
x,
y,
id = NULL,
window_size = "all",
window_function = NULL,
...
)
Arguments
x |
A dataframe of xreg (exogenous regressors) |
y |
A numeric vector of values to fit |
id |
An optional ID feature to identify different time series. Should be a quoted name. |
window_size |
The period to apply the window function to |
window_function |
A function to apply to the window. The default is |
... |
Additional arguments for the |
Bridge prediction function for window Models
Description
Bridge prediction function for window Models
Usage
window_function_predict_impl(object, new_data)
Arguments
object |
An object of class |
new_data |
A rectangular data object, such as a data frame. |
General Interface for Window Forecast Models
Description
window_reg()
is a way to generate a specification of a window model
before fitting and allows the model to be created using
different backends.
Usage
window_reg(mode = "regression", id = NULL, window_size = NULL)
Arguments
mode |
A single character string for the type of model. The only possible value for this model is "regression". |
id |
An optional quoted column name (e.g. "id") for identifying multiple time series (i.e. panel data). |
window_size |
A window to apply the window function. By default, the window uses the full data set, which is rarely the best choice. |
Details
A time series window regression is derived using window_reg()
.
The model can be created using the fit()
function using the
following engines:
-
"window_function" (default) - Performs a Window Forecast applying a
window_function
(engine parameter) to a window of size defined bywindow_size
Engine Details
function (default engine)
The engine uses window_function_fit_impl()
. A time series window function
applies a window_function
to a window of the data (last N observations).
The function can return a scalar (single value) or multiple values that are repeated for each window
Common use cases:
-
Moving Average Forecasts: Forecast forward a 20-day average
-
Weighted Average Forecasts: Exponentially weighting the most recent observations
-
Median Forecasts: Forecasting forward a 20-day median
-
Repeating Forecasts: Simulating a Seasonal Naive Forecast by broadcasting the last 12 observations of a monthly dataset into the future
-
The key engine parameter is the window_function
. A function / formula:
If a function, e.g.
mean
, the function is used with any additional arguments,...
inset_engine()
.If a formula, e.g.
~ mean(., na.rm = TRUE)
, it is converted to a function.
This syntax allows you to create very compact anonymous functions.
Fit Details
Date and Date-Time Variable
It's a requirement to have a date or date-time variable as a predictor.
The fit()
interface accepts date and date-time features and handles them internally.
-
fit(y ~ date)
ID features (Multiple Time Series, Panel Data)
The id
parameter is populated using the fit()
or fit_xy()
function:
ID Example: Suppose you have 3 features:
-
y
(target) -
date
(time stamp), -
series_id
(a unique identifer that identifies each time series in your data).
The series_id
can be passed to the window_reg()
using
fit()
:
-
window_reg(id = "series_id")
specifes that theseries_id
column should be used to identify each time series. -
fit(y ~ date + series_id)
will passseries_id
on to the underlying functions.
Window Function Specification (window_function)
You can specify a function / formula using purrr
syntax.
If a function, e.g.
mean
, the function is used with any additional arguments,...
inset_engine()
.If a formula, e.g.
~ mean(., na.rm = TRUE)
, it is converted to a function.
This syntax allows you to create very compact anonymous functions.
Window Size Specification (window_size)
The period can be non-seasonal (window_size = 1 or "none"
) or
yearly seasonal (e.g. For monthly time stamps, window_size = 12
, window_size = "12 months"
, or window_size = "yearly"
).
There are 3 ways to specify:
-
window_size = "all"
: A seasonal period is selected based on the periodicity of the data (e.g. 12 if monthly) -
window_size = 12
: A numeric frequency. For example, 12 is common for monthly data -
window_size = "1 year"
: A time-based phrase. For example, "1 year" would convert to 12 for monthly data.
External Regressors (Xregs)
These models are univariate. No xregs are used in the modeling process.
See Also
fit.model_spec()
, set_engine()
Examples
library(dplyr)
library(parsnip)
library(rsample)
library(timetk)
# Data
m750 <- m4_monthly %>% filter(id == "M750")
m750
# Split Data 80/20
splits <- initial_time_split(m750, prop = 0.8)
# ---- WINDOW FUNCTION -----
# Used to make:
# - Mean/Median forecasts
# - Simple repeating forecasts
# Median Forecast ----
# Model Spec
model_spec <- window_reg(
window_size = 12
) %>%
# Extra parameters passed as: set_engine(...)
set_engine(
engine = "window_function",
window_function = median,
na.rm = TRUE
)
# Fit Spec
model_fit <- model_spec %>%
fit(log(value) ~ date, data = training(splits))
model_fit
# Predict
# - The 12-month median repeats going forward
predict(model_fit, testing(splits))
# ---- PANEL FORECAST - WINDOW FUNCTION ----
# Weighted Average Forecast
model_spec <- window_reg(
# Specify the ID column for Panel Data
id = "id",
window_size = 12
) %>%
set_engine(
engine = "window_function",
# Create a Weighted Average
window_function = ~ sum(tail(.x, 3) * c(0.1, 0.3, 0.6)),
)
# Fit Spec
model_fit <- model_spec %>%
fit(log(value) ~ date + id, data = training(splits))
model_fit
# Predict: The weighted average (scalar) repeats going forward
predict(model_fit, testing(splits))
# ---- BROADCASTING PANELS (REPEATING) ----
# Simulating a Seasonal Naive Forecast by
# broadcasted model the last 12 observations into the future
model_spec <- window_reg(
id = "id",
window_size = Inf
) %>%
set_engine(
engine = "window_function",
window_function = ~ tail(.x, 12),
)
# Fit Spec
model_fit <- model_spec %>%
fit(log(value) ~ date + id, data = training(splits))
model_fit
# Predict: The sequence is broadcasted (repeated) during prediction
predict(model_fit, testing(splits))
Wrapper for parsnip::xgb_train
Description
Wrapper for parsnip::xgb_train
Usage
xgboost_impl(
x,
y,
max_depth = 6,
nrounds = 15,
eta = 0.3,
colsample_bynode = NULL,
colsample_bytree = NULL,
min_child_weight = 1,
gamma = 0,
subsample = 1,
validation = 0,
early_stop = NULL,
objective = NULL,
counts = TRUE,
event_level = c("first", "second"),
...
)
Arguments
x |
A data frame or matrix of predictors |
y |
A vector (factor or numeric) or matrix (numeric) of outcome data. |
max_depth |
An integer for the maximum depth of the tree. |
nrounds |
An integer for the number of boosting iterations. |
eta |
A numeric value between zero and one to control the learning rate. |
colsample_bynode |
Subsampling proportion of columns for each node
within each tree. See the |
colsample_bytree |
Subsampling proportion of columns for each tree.
See the |
min_child_weight |
A numeric value for the minimum sum of instance weights needed in a child to continue to split. |
gamma |
A number for the minimum loss reduction required to make a further partition on a leaf node of the tree |
subsample |
Subsampling proportion of rows. By default, all of the training data are used. |
validation |
A positive number. If on |
early_stop |
An integer or |
counts |
A logical. If |
event_level |
For binary classification, this is a single string of either
|
... |
Other options to pass to |
Wrapper for xgboost::predict
Description
Wrapper for xgboost::predict
Usage
xgboost_predict(object, newdata, ...)
Arguments
object |
a model object for which prediction is desired. |
newdata |
New data to be predicted |
... |
additional arguments affecting the predictions produced. |