Title: Interactive Assessments of Models
Version: 0.1.1
Description: Launch a 'shiny' application for 'tidymodels' results. For classification or regression models, the app can be used to determine if there is lack of fit or poorly predicted points.
License: MIT + file LICENSE
URL: https://shinymodels.tidymodels.org, https://github.com/tidymodels/shinymodels
BugReports: https://github.com/tidymodels/shinymodels/issues
Depends: ggplot2, R (≥ 2.10)
Imports: dplyr, DT, generics (≥ 0.1.0), glue, htmltools, magrittr, parsnip, plotly, purrr, rlang, scales, shiny, shinydashboard, stats, tidyr, tidyselect, tune, yardstick
Suggests: covr, finetune, knitr, markdown, modeldata, rmarkdown, shinytest, spelling, testthat (≥ 3.0.0), vdiffr, withr
Config/Needs/website: tidyverse/tidytemplate
Config/testthat/edition: 3
Encoding: UTF-8
Language: en-US
LazyData: true
RoxygenNote: 7.3.0
NeedsCompilation: no
Packaged: 2024-01-31 14:25:03 UTC; simoncouch
Author: Max Kuhn ORCID iD [aut], Shisham Adhikari [aut], Julia Silge ORCID iD [aut], Simon Couch ORCID iD [aut, cre], Posit Software, PBC [cph, fnd]
Maintainer: Simon Couch <simon.couch@posit.co>
Repository: CRAN
Date/Publication: 2024-01-31 15:10:05 UTC

shinymodels: Interactive Assessments of Models

Description

Launch a 'shiny' application for 'tidymodels' results. For classification or regression models, the app can be used to determine if there is lack of fit or poorly predicted points.

Author(s)

Maintainer: Simon Couch simon.couch@posit.co (ORCID)

Authors:

Other contributors:

See Also

Useful links:


Pipe operator

Description

See magrittr::%>% for details.

Usage

lhs %>% rhs

Arguments

lhs

A value or the magrittr placeholder.

rhs

A function call using the magrittr semantics.

Value

The result of calling rhs(lhs).


Iterative optimization of neural network

Description

This object has the results when a neural network was tuned using Bayesian optimization and a validation set.

Details

The code used to produce this object:

  data(ames)

  ames <-
    ames %>%
    select(Sale_Price, Neighborhood, Longitude, Latitude, Year_Built) %>%
    mutate(Sale_Price = log10(ames$Sale_Price))

  set.seed(1)
  ames_rs <- validation_split(ames)

  ames_rec <-
    recipe(Sale_Price ~ ., data = ames) %>%
    step_dummy(all_nominal_predictors()) %>%
    step_zv(all_predictors()) %>%
    step_normalize(all_predictors())

  mlp_spec <-
    mlp(hidden_units = tune(),
        penalty = tune(),
        epochs = tune()) %>%
    set_mode("regression")

  set.seed(1)
  ames_mlp_itr <-
    mlp_spec %>%
    tune_bayes(
      ames_rec,
      resamples = ames_rs,
      initial = 5,
      iter = 4,
      control = control_bayes(save_pred = TRUE)
    )

Value

An object with primary class iteration_results.


Resampled bagged tree results

Description

This object has the results when a bagged regression tree was resampled using 10-fold cross-validation.

Details

The code used to produce this object:

  library(tidymodels)
  library(baguette)
  tidymodels_prefer()

  # ------------------------------------------------------------------------------

  ctrl_rs <- control_resamples(save_pred = TRUE)

  # ------------------------------------------------------------------------------

  set.seed(1)
  cars_rs <- vfold_cv(mtcars)

  cars_bag_vfld <-
    bag_tree() %>%
    set_engine("rpart", times = 5) %>%
    set_mode("regression") %>%
    fit_resamples(
      mpg ~ .,
      resamples = cars_rs,
      control = ctrl_rs
    )

Value

An object with primary class resample_results.


A CART classification tree tuned via racing

Description

This object has the results when a CART classification tree model was tuned over the cost-complexity parameter using racing.

Details

To reduce the object size, a smaller subset of the data were used.

The code used to produce this object:

  library(tidymodels)
  library(finetune)
  tidymodels_prefer()

  ctrl_rc <- control_race(save_pred = TRUE)

  # ------------------------------------------------------------------------------

  data(cells)

  set.seed(1)
  cells <-
    cells %>%
    select(-case) %>%
    sample_n(200)

  # ------------------------------------------------------------------------------

  set.seed(2)
  cell_rs <- vfold_cv(cells)

  # ------------------------------------------------------------------------------

  set.seed(3)
  cell_race <-
    decision_tree(cost_complexity = tune()) %>%
    set_mode("classification") %>%
    tune_race_anova(
      class ~ .,
      resamples = cell_rs,
      grid = tibble(cost_complexity = 10^seq(-2, -1, by = 0.2)),
      control = ctrl_rc
    )

Value

An object with primary class tune_race.


Gets the config and translate to a sentence with the parameter values

Description

This function takes result of organize_data, predictions across all models, and the names of the tuning parameters to return a sentence with the default parameter values.

Usage

display_selected(x, performance, predictions, tuning_param, input)

Arguments

x

The organize_data() result.

performance

The dataframe with performance metrics for each candidate model.

predictions

The dataframe with predictions across all models.

tuning_param

The names of the tuning parameters.

input

The DT::datatable object.

Value

A sentence.


Explore model results

Description

explore() launches a Shiny application to interact with results from some tidymodels functions.

To investigate model fit(s), explore() can be used on objects produced by

The application starts in a new window and allows users to see how predicted values align with the true, observed data. There are 2-3 tabs in the application (depending on the object):

To quit the Shiny application, use the Esc key.

Usage

## Default S3 method:
explore(x, ...)

## S3 method for class 'tune_results'
explore(x, hover_cols = NULL, hover_only = FALSE, ...)

Arguments

x

An object with class tune_results.

...

Other parameters not currently used.

hover_cols

The columns to display while hovering in the Shiny app. This argument can be:

  • A dplyr selector (such as dplyr::starts_with()) or a set of selector if they are enclosed with in c().

  • A character vector.

hover_only

A logical to determine if interactive highlighting of points is enabled (the default) or not. This can be helpful for very large data sets.

Details

For resampling methods that produce more than one hold-out prediction per row (e.g. the bootstrap, repeated V-fold cross-validation), the predicted values shown in the plots are averages of the predictions for that specific row.

The ggplot2 theme used in the Shiny application corresponds to the current theme in the R session. Run ggplot2::theme_set() to change the theme for the plots in the Shiny application.

For classification models, there is a toggle on the bottom left of the application to choose between "Unscaled (i.e. linear)" and "Logit scaled" probability scaling. The first options plots the raw probabilities while the logit scaling uses scales::logit_trans() to rescale the axis. This can be helpful when a model with a linear predictor is used (e.g. logistic or multinomial regression) since it can show linear effects from a feature more easily.

When using the application, there may be warnings printed in the console about "event tied a source ID ... not registered". These can be ignored.

When racing results are explored, the shiny application will only allow tuning parameter combinations that were fully resampled. As a result, parameter combinations that were discarded during the race will not be able to be selected.

Value

A shiny application.

Examples

data(ames_mlp_itr)

if (interactive()) {
  explore(ames_mlp_itr, hover_cols = dplyr::contains("tude"))
}

Returns the name of predictions column for the first level variable

Description

This function takes prediction data, the event level, and the outcome name as arguments and returns the predictions column for the first level variable.

Usage

first_class_prob_name(dat, event_level, y_name)

Arguments

dat

The predictions data frame in the organize_data() result. Following variables are required: .outcome, .pred, .color, and .hover.

event_level

A single character value for the level corresponding to the event.

y_name

The y/response variable for the model.

Value

A symbol.


Returns the first level of a classification model

Description

This function takes data, event_level and y_name, as arguments and returns the first level in a classification data.

Usage

first_level(dat, event_level = c("first", "second"), y_name)

Arguments

dat

The predictions data frame in the organize_data() result. Following variables are required: .outcome, .pred, .color, and .hover.

event_level

A single character value for the level corresponding to the event.

y_name

The y/response variable for the model.

Value

A string.


Returns the hover columns to be displayed in interactive plots

Description

This function takes .hover argument and returns the output that can be used as a test aesthetics in a ggplot2::ggplot() object to customize tooltip.

Usage

format_hover(x, ...)

Arguments

x

A data frame with columns to be displayed in the hover.

...

Arguments passed to format() to the column(s) selected to be seen in the hover/tooltip.

Value

A character vector.


Extract data from objects to use in a shiny app

Description

This function joins the result of tune::fit_resamples() to the original dataset to give a list that can be an input for the Shiny app.

Usage

organize_data(x, hover_cols = NULL, ...)

## Default S3 method:
organize_data(x, hover_cols = NULL, ...)

## S3 method for class 'tune_results'
organize_data(x, hover_cols = NULL, ...)

Arguments

x

The tune::fit_resamples() result.

hover_cols

The columns to display while hovering.

...

Other parameters not currently used.

Details

The default configuration is based on the optimal value of the first metric.

Value

A list with elements data frame and character vectors. The data frame includes an outcome variable .outcome, a prediction variable .pred, model configuration variable .config, and hovering columns .hover.


This function takes result of organize_data to calculate and reformat performance metrics for each candidate model.

Description

This function takes result of organize_data to calculate and reformat performance metrics for each candidate model.

Usage

performance_object(x)

Arguments

x

The organize_data() result.

Value

A dataframe.


Visualizing the confusion matrix for a classification model

Description

This function plots the confusion matrix for a classification model.

Usage

plot_multiclass_conf_mat(dat)

Arguments

dat

The predictions data frame in the organize_data() result. Following variables are required: .outcome, .pred, .color, and .hover.

Value

A plotly::ggplotly() object.


Visualizing predicted probability vs. true class for a multi-class classification model

Description

This function plots the predicted probabilities against the observed class based on tidymodels results for a multi-class classification model.

Usage

plot_multiclass_obs_pred(dat, y_name, prob_bins = 0.05)

Arguments

dat

The predictions data frame in the organize_data() result. Following variables are required: .outcome, .pred, .color, and .hover.

y_name

The y/response variable for the model.

prob_bins

The desired binwidth for histogram.

Value

A plotly::ggplotly() object.


Visualizing the PR curve for a classification model

Description

This function plots the full precision recall curve.

Usage

plot_multiclass_pr(dat, y_name)

Arguments

dat

The predictions data frame in the organize_data() result. Following variables are required: .outcome, .pred, .color, and .hover.

y_name

The y/response variable for the model.

Value

A plotly::ggplotly() object.


Visualizing the predicted probabilities vs. a factor variable for a classification model

Description

This function plots the predicted probabilities against a factor column based on tidymodels results for a multi-class classification model.

Usage

plot_multiclass_pred_factorcol(
  dat,
  y_name,
  factorcol,
  alpha = 1,
  size = 1,
  prob_scaling = FALSE,
  prob_eps = 0.001,
  source = NULL
)

Arguments

dat

The predictions data frame in the organize_data() result. Following variables are required: .outcome, .pred, .color, and .hover.

y_name

The y/response variable for the model.

factorcol

The factor column to plot against the predicted probabilities.

alpha

The opacity for the geom points.

size

The size for the geom points.

prob_scaling

The boolean to turn on or off the logit scale for probability.

prob_eps

A small numerical constant to prevent division by zero.

source

A character string of length 1 that matches the source argument in event_data().

Value

A plotly::ggplotly() object.


Visualizing the predicted probabilities vs. a numeric column for a classification model

Description

This function plots the predicted probabilities against a numeric column based on tidymodels results for a multi-class classification model.

Usage

plot_multiclass_pred_numcol(
  dat,
  y_name,
  numcol,
  alpha = 1,
  size = 1,
  prob_scaling = FALSE,
  prob_eps = 0.001,
  source = NULL
)

Arguments

dat

The predictions data frame in the organize_data() result. Following variables are required: .outcome, .pred, .color, and .hover.

y_name

The y/response variable for the model.

numcol

The numerical column to plot against the predicted probabilities.

alpha

The opacity for the geom points.

size

The size for the geom points.

prob_scaling

The boolean to turn on or off the logit scale for probability.

prob_eps

A small numerical constant to prevent division by zero.

source

A character string of length 1 that matches the source argument in event_data().

Value

A plotly::ggplotly() object.


Visualizing the ROC curve for a classification model

Description

This function plots the ROC curve for a classification model.

Usage

plot_multiclass_roc(dat, y_name)

Arguments

dat

The predictions data frame in the organize_data() result. Following variables are required: .outcome, .pred, .color, and .hover.

y_name

The y/response variable for the model.

Value

A plotly::ggplotly() object.


Visualizing observed vs. predicted values for a regression model

Description

This function plots the predicted values against the observed values based on tidymodels results for a regression model.

Usage

plot_numeric_obs_pred(dat, y_name, alpha = 1, size = 1, source = NULL)

Arguments

dat

The predictions data frame in the organize_data() result. Following variables are required: .outcome, .pred, .color, and .hover.

y_name

The y/response variable for the model.

alpha

The opacity for the geom points.

size

The size for the geom points.

source

A character string of length 1 that matches the source argument in event_data().

Value

A plotly::ggplotly() object.


Visualizing residuals vs. a factor column for a regression model

Description

This function plots the residuals against a factor column based on tidymodels results for a regression model.

Usage

plot_numeric_res_factorcol(
  dat,
  y_name,
  factorcol,
  alpha = 1,
  size = 1,
  source = NULL
)

Arguments

dat

The predictions data frame in the organize_data() result. Following variables are required: .outcome, .pred, .color, and .hover.

y_name

The y/response variable for the model.

factorcol

The factor column to plot against the residuals.

alpha

The opacity for the geom points.

size

The size for the geom points.

source

A character string of length 1 that matches the source argument in event_data().

Value

A plotly::ggplotly() object.


Visualizing residuals vs. a numeric column for a regression model

Description

This function plots the residuals against a numeric column based on tidymodels results for a regression model.

Usage

plot_numeric_res_numcol(
  dat,
  y_name,
  numcol,
  alpha = 1,
  size = 1,
  source = NULL
)

Arguments

dat

The predictions data frame in the organize_data() result. Following variables are required: .outcome, .pred, .color, and .hover.

y_name

The y/response variable for the model.

numcol

The numerical column to plot against the residuals.

alpha

The opacity for the geom points.

size

The size for the geom points.

source

A character string of length 1 that matches the source argument in event_data().

Value

A plotly::ggplotly() object.


Visualizing residuals vs. predicted values for a regression model

Description

This function plots the predicted values against the residuals based on tidymodels results for a regression model.

Usage

plot_numeric_res_pred(dat, y_name, size = 1, source = NULL)

Arguments

dat

The predictions data frame in the organize_data() result. Following variables are required: .outcome, .pred, .color, and .hover.

y_name

The y/response variable for the model.

size

The size for the geom points.

source

A character string of length 1 that matches the source argument in event_data().

Value

A plotly::ggplotly() object.


Visualizing the confusion matrix for a classification model

Description

This function plots the confusion matrix for a classification model.

Usage

plot_twoclass_conf_mat(dat)

Arguments

dat

The predictions data frame in the organize_data() result. Following variables are required: .outcome, .pred, .color, and .hover.

Value

A plotly::ggplotly() object.


Visualizing predicted probability vs. true class for a two-class classification model

Description

This function plots the predicted probabilities against the observed class based on tidymodels results for a two-class classification model.

Usage

plot_twoclass_obs_pred(dat, y_name, event_level = "first", prob_bins = 0.05)

Arguments

dat

The predictions data frame in the organize_data() result. Following variables are required: .outcome, .pred, .color, and .hover.

y_name

The y/response variable for the model.

event_level

A single character value for the level corresponding to the event.

prob_bins

The desired binwidth for histogram.

Value

A plotly::ggplotly() object.


Visualizing the PR curve for a classification model

Description

This function plots the full precision recall curve.

Usage

plot_twoclass_pr(dat, y_name, event_level = "first")

Arguments

dat

The predictions data frame in the organize_data() result. Following variables are required: .outcome, .pred, .color, and .hover.

y_name

The y/response variable for the model.

event_level

A single character value for the level corresponding to the event.

Value

A plotly::ggplotly() object.


Visualizing the predicted probabilities vs. a factor variable for a classification model

Description

This function plots the predicted probabilities against a factor column based on tidymodels results for a two-class classification model.

Usage

plot_twoclass_pred_factorcol(
  dat,
  y_name,
  factorcol,
  alpha = 1,
  size = 1,
  prob_scaling = FALSE,
  event_level = "first",
  prob_eps = 0.001,
  source = NULL
)

Arguments

dat

The predictions data frame in the organize_data() result. Following variables are required: .outcome, .pred, .color, and .hover.

y_name

The y/response variable for the model.

factorcol

The factor column to plot against the predicted probabilities.

alpha

The opacity for the geom points.

size

The size for the geom points.

prob_scaling

The boolean to turn on or off the logit scale for probability.

event_level

A single character value for the level corresponding to the event.

prob_eps

A small numerical constant to prevent division by zero.

source

A character string of length 1 that matches the source argument in event_data().

Value

A plotly::ggplotly() object.


Visualizing the predicted probabilities vs. a numeric column for a classification model

Description

This function plots the predicted probabilities against a numeric column based on tidymodels results for a two-class classification model.

Usage

plot_twoclass_pred_numcol(
  dat,
  y_name,
  numcol,
  alpha = 1,
  size = 1,
  prob_scaling = FALSE,
  event_level = "first",
  prob_eps = 0.001,
  source = NULL
)

Arguments

dat

The predictions data frame in the organize_data() result. Following variables are required: .outcome, .pred, .color, and .hover.

y_name

The y/response variable for the model.

numcol

The numerical column to plot against the predicted probabilities.

alpha

The opacity for the geom points.

size

The size for the geom points.

prob_scaling

The boolean to turn on or off the logit scale for probability.

event_level

A single character value for the level corresponding to the event.

prob_eps

A small numerical constant to prevent division by zero.

source

A character string of length 1 that matches the source argument in event_data().

Value

A plotly::ggplotly() object.


Visualizing the ROC curve for a classification model

Description

This function plots the ROC curve for a classification model.

Usage

plot_twoclass_roc(dat, y_name, event_level = "first")

Arguments

dat

The predictions data frame in the organize_data() result. Following variables are required: .outcome, .pred, .color, and .hover.

y_name

The y/response variable for the model.

event_level

A single character value for the level corresponding to the event.

Value

A plotly::ggplotly() object.


Returns the class, app type, y name, and the number of rows of an object of shiny_data class

Description

This is a print method for a shiny_data class

Usage

## S3 method for class 'shiny_data'
print(x, ...)

Arguments

x

an object of class shiny_data

...

Other parameters not currently used.

Value

x invisibly.


Objects exported from other packages

Description

These objects are imported from other packages. Follow the links below to see their documentation.

generics

explore


Tuned flexible discriminant analysis results

Description

This object has the results when a flexible discriminant analysis model was tuned over the interaction degree parameters.

Details

To reduce the object size, five bootstraps were used for resampling and missing data were removed.

The code used to produce this object:

  library(tidymodels)
  library(discrim)
  tidymodels_prefer()

  # ------------------------------------------------------------------------------

  ctrl_gr <- control_grid(save_pred = TRUE)

  # ------------------------------------------------------------------------------

  data(scat)
  scat <- scat[complete.cases(scat), ]

  # ------------------------------------------------------------------------------

  set.seed(1)
  scat_rs <- bootstraps(scat, times = 5)

  scat_fda_bt <-
    discrim_flexible(prod_degree = tune()) %>%
    tune_grid(
      Species ~ .,
      resamples = scat_rs,
      control = ctrl_gr
    )

Value

An object with primary class tune_results.


Internal function to run shiny application on an object of shiny_data class

Description

This function takes the organize_data() result to shiny_models a Shiny app.

Usage

shiny_models(x, hover_cols = NULL, hover_only = NULL, ...)

## Default S3 method:
shiny_models(x, hover_cols = NULL, hover_only = NULL, ...)

## S3 method for class 'multi_cls_shiny_data'
shiny_models(x, hover_cols = NULL, hover_only = FALSE, ...)

## S3 method for class 'reg_shiny_data'
shiny_models(x, hover_cols = NULL, hover_only = FALSE, ...)

## S3 method for class 'two_cls_shiny_data'
shiny_models(x, hover_cols = NULL, hover_only = FALSE, ...)

Arguments

x

The organize_data() result.

hover_cols

The columns to display while hovering in the Shiny app. This argument can be:

  • A dplyr selector (such as dplyr::starts_with()) or a set of selector if they are enclosed with in c().

  • A character vector.

hover_only

A logical to determine if interactive highlighting of points is enabled (the default) or not. This can be helpful for very large data sets.

...

Other parameters not currently used.

Value

A shiny application.


Test set results for logistic regression

Description

This object has the results when a logistic regression model is fit to the training set and is evaluated on the test set.

Details

The code used to produce this object:

  library(tidymodels)
  tidymodels_prefer()

  # ------------------------------------------------------------------------------

  set.seed(1)
  data(two_class_dat)

  # ------------------------------------------------------------------------------

  two_class_split <- initial_split(two_class_dat)
  # ------------------------------------------------------------------------------

  glm_spec <- logistic_reg()

  two_class_final <-
    glm_spec %>%
    last_fit(
      Class  ~ .,
      split = two_class_split
    )

Value

An object with primary class last_fit.

mirror server hosted at Truenetwork, Russian Federation.