Type: | Package |
Title: | Heatmap-Integrated Decision Tree Visualizations |
Version: | 0.2.1 |
Maintainer: | Trang Le <grixor@gmail.com> |
Description: | Creates interpretable decision tree visualizations with the data represented as a heatmap at the tree's leaf nodes. 'treeheatr' utilizes the customizable 'ggparty' package for drawing decision trees. |
License: | MIT + file LICENSE |
Encoding: | UTF-8 |
LazyData: | true |
RoxygenNote: | 7.1.1 |
Depends: | R (≥ 3.5.0) |
Imports: | ggparty, ggplot2, partykit, dplyr, ggnewscale, gtable, stats, tidyr, cluster, grid, yardstick, seriation |
Suggests: | forcats, knitr, rmarkdown, rpart, testthat |
URL: | https://trang1618.github.io/treeheatr/index.html, https://trang1618.github.io/treeheatr-manuscript/ |
BugReports: | https://github.com/trang1618/treeheatr/issues |
VignetteBuilder: | knitr |
NeedsCompilation: | no |
Packaged: | 2020-11-19 20:45:18 UTC; ttle |
Author: | Trang Le [aut, cre] (https://trang.page/), Jason Moore [aut] (http://www.epistasisblog.org/), University of Pennsylvania [cph] |
Repository: | CRAN |
Date/Publication: | 2020-11-19 21:00:03 UTC |
Align decision tree and heatmap:
Description
Align decision tree and heatmap:
Usage
align_plots(
dheat,
dtree,
heat_rel_height,
show = c("heat-tree", "heat-only", "tree-only")
)
Arguments
dheat |
ggplot2 grob object of the heatmap. |
dtree |
ggplot2 grob object of the decision tree |
heat_rel_height |
Relative height of heatmap compared to whole figure (with tree). |
show |
Character string indicating which components of the decision tree-heatmap should be drawn. Can be 'heat-tree', 'heat-only' or 'tree-only'. |
Value
A gtable/grob object of the decision tree (top) and heatmap (bottom).
Performs clustering or features.
Description
Performs clustering or features.
Usage
clust_feat_func(dat, clust_vec, clust_feats = TRUE)
Arguments
dat |
Dataframe of the original dataset. Samples may be reordered. |
clust_vec |
Character vector of variable names to be applied clustering on. Can include class labels. |
clust_feats |
if TRUE clusters displayed features (passed through 'clust_vec') using the the Gower metric based on the values of all samples and returns the ordered features. When 'clust_samps = FALSE' and 'clust_feats = FALSE', no clustering is performed. |
Value
Character vector of reordered features when 'clust_feats == TRUE'.
Performs clustering of samples.
Description
Performs clustering of samples.
Usage
clust_samp_func(leaf_node = NULL, dat, clust_vec, clust_samps = TRUE)
Arguments
leaf_node |
Integer value indicating terminal node id. |
dat |
Dataframe of the original dataset. Samples may be reordered. |
clust_vec |
Character vector of variable names to be applied clustering on. Can include class labels. |
clust_samps |
Logical. If TRUE, hierarchical clustering would be performed among samples within each leaf node. |
Value
Dataframe of reordered original dataset when clust_samps == TRUE.
Compute decision tree from data set
Description
Compute decision tree from data set
Usage
compute_tree(
x,
data_test = NULL,
target_lab = NULL,
task = c("classification", "regression"),
feat_types = NULL,
label_map = NULL,
clust_samps = TRUE,
clust_target = TRUE,
custom_layout = NULL,
lev_fac = 1.3,
panel_space = 0.001
)
Arguments
x |
Dataframe or a 'party' or 'partynode' object representing a custom tree. If a dataframe is supplied, conditional inference tree is computed. If a custom tree is supplied, it must follow the partykit syntax: https://cran.r-project.org/web/packages/partykit/vignettes/partykit.pdf |
data_test |
Tidy test dataset. Required if 'x' is a 'partynode' object. If NULL, heatmap displays (training) data 'x'. |
target_lab |
Name of the column in data that contains target/label information. |
task |
Character string indicating the type of problem, either 'classification' (categorical outcome) or 'regression' (continuous outcome). |
feat_types |
Named vector indicating the type of each features, e.g., c(sex = 'factor', age = 'numeric'). If feature types are not supplied, infer from column type. |
label_map |
Named vector of the meaning of the target values, e.g., c(‘0' = ’Edible', ‘1' = ’Poisonous'). |
clust_samps |
Logical. If TRUE, hierarchical clustering would be performed among samples within each leaf node. |
clust_target |
Logical. If TRUE, target/label is included in hierarchical clustering of samples within each leaf node and might yield a more interpretable heatmap. |
custom_layout |
Dataframe with 3 columns: id, x and y for manually input custom layout. |
lev_fac |
Relative weight of child node positions according to their levels, commonly ranges from 1 to 1.5. 1 for parent node perfectly in the middle of child nodes. |
panel_space |
Spacing between facets relative to viewport, recommended to range from 0.001 to 0.01. |
Value
A list of results from 'partykit::ctree' or provided custom tree, including fit, estimates, smart layout and terminal data.
Examples
fit_tree <- compute_tree(penguins, target_lab = 'species')
fit_tree$fit
fit_tree$layout
dplyr::select(fit_tree$term_dat, - contains('nodedata'))
Diabetes patient records.
Description
http://archive.ics.uci.edu/ml/datasets/diabetes https://www.kaggle.com/uciml/pima-indians-diabetes-database
Usage
diabetes
Format
A data frame with 768 observations and 9 variables:
Pregnancies
, Glucose
, BloodPressure
, SkinThickness
, Insulin
,
BMI
, DiabetesPedigreeFunction
, Age
and Outcome
.
Draws the heatmap.
Description
Draws the heatmap to be placed below the decision tree.
Usage
draw_heat(
dat,
fit,
feat_types = NULL,
target_cols = NULL,
target_lab_disp = fit$target_lab,
trans_type = c("percentize", "normalize", "scale", "none"),
clust_feats = TRUE,
feats = NULL,
show_all_feats = FALSE,
p_thres = 0.05,
cont_legend = FALSE,
cate_legend = FALSE,
cont_cols = ggplot2::scale_fill_viridis_c,
cate_cols = ggplot2::scale_fill_viridis_d,
panel_space = 0.001,
target_space = 0.05,
target_pos = "top"
)
Arguments
dat |
Dataframe with samples from original dataset ordered according to the clustering within each leaf node. |
fit |
party object, e.g., as output from partykit::ctree() |
feat_types |
Named vector indicating the type of each features, e.g., c(sex = 'factor', age = 'numeric'). If feature types are not supplied, infer from column type. |
target_cols |
Character vectors representing the hex values of different level colors for targets, defaults to viridis option B. |
target_lab_disp |
Character string for displaying the label of target label. If not provided, use 'target_lab'. |
trans_type |
Character string of 'normalize', 'scale' or 'none'. If 'scale', subtract the mean and divide by the standard deviation. If 'normalize', i.e., max-min normalize, subtract the min and divide by the max. If 'none', no transformation is applied. More information on what transformation to choose can be acquired here: https://cran.rstudio.com/package=heatmaply/vignettes/heatmaply.html#data-transformation-scaling-normalize-and-percentize |
clust_feats |
Logical. If TRUE, performs cluster on the features. |
feats |
Character vector of feature names to be displayed in the heatmap. If NULL, display features of which P values are less than 'p_thres'. |
show_all_feats |
Logical. If TRUE, show all features regardless of 'p_thres'. |
p_thres |
Numeric value indicating the p-value threshold of feature importance. Feature with p-values computed from the decision tree below this value will be displayed on the heatmap. |
cont_legend |
Function determining the options for legend of continuous variables, defaults to FALSE. If TRUE, use 'guide_colorbar(barwidth = 10, barheight = 0.5, title = NULL)'. Any other ['guides()'](https://ggplot2.tidyverse.org/reference/guides.html) functions would also work. |
cate_legend |
Function determining the options for legend of categorical variables, defaults to FALSE. If TRUE, use 'guide_legend(title = NULL)'. Any other ['guides()'](https://ggplot2.tidyverse.org/reference/guides.html) functions would also work. |
cont_cols |
Function determining color scale for continuous variable, defaults to 'scale_fill_viridis_c(guide = cont_legend)'. |
cate_cols |
Function determining color scale for nominal categorical variable, defaults to 'scale_fill_viridis_d(begin = 0.3, end = 0.9)'. |
panel_space |
Spacing between facets relative to viewport, recommended to range from 0.001 to 0.01. |
target_space |
Numeric value indicating spacing between the target label and the rest of the features |
target_pos |
Character string specifying the position of the target label on heatmap, can be 'top', 'bottom' or 'none'. |
Value
A ggplot2 grob object of the heatmap.
Examples
x <- compute_tree(penguins, target_lab = 'species')
draw_heat(x$dat, x$fit)
Draws the conditional decision tree.
Description
Draws the conditional decision tree output from partykit::ctree(), utilizing ggparty geoms: geom_edge, geom_edge_label, geom_node_label.
Usage
draw_tree(
dat,
fit,
term_dat,
layout,
target_cols = NULL,
title = NULL,
tree_space_top = 0.05,
tree_space_bottom = 0.05,
print_eval = FALSE,
metrics = NULL,
x_eval = 0,
y_eval = 0.9,
task = c("classification", "regression"),
par_node_vars = list(label.size = 0, label.padding = unit(0.15, "lines"), line_list =
list(aes(label = splitvar)), line_gpar = list(list(size = 9)), ids = "inner"),
terminal_vars = list(label.padding = unit(0.25, "lines"), size = 3, col = "white"),
edge_vars = list(color = "grey70", size = 0.5),
edge_text_vars = list(color = "grey30", size = 3, mapping = aes(label =
paste(breaks_label, "*NA")))
)
Arguments
dat |
Dataframe with samples from original dataset ordered according to the clustering within each leaf node. |
fit |
party object, e.g., as output from partykit::ctree() |
term_dat |
Dataframe for terminal nodes, must include these columns: id, x, y and y_hat. |
layout |
Dataframe of layout of all nodes, must include these columns: id, x, y and y_hat. |
target_cols |
Character vectors representing the hex values of different level colors for targets, defaults to viridis option B. |
title |
Character string for plot title. |
tree_space_top |
Numeric value to pass to expand for top margin of tree. |
tree_space_bottom |
Numeric value to pass to expand for bottom margin of tree. |
print_eval |
Logical. If TRUE, print evaluation of the tree performance. |
metrics |
A set of metric functions to evaluate decision tree, defaults to common metrics for classification/regression problems. Can be defined with 'yardstick::metric_set'. |
x_eval |
Numeric value indicating x position to print performance statistics. |
y_eval |
Numeric value indicating y position to print performance statistics. |
task |
Character string indicating the type of problem, either 'classification' (categorical outcome) or 'regression' (continuous outcome). |
par_node_vars |
Named list containing arguments to be passed to the 'geom_node_label()' call for non-terminal nodes. |
terminal_vars |
Named list containing arguments to be passed to the 'geom_node_label()' call for terminal nodes. |
edge_vars |
Named list containing arguments to be passed to the 'geom_edge()' call for tree edges. |
edge_text_vars |
Named list containing arguments to be passed to the 'geom_edge_label()' call for tree edge annotations. |
Value
A ggplot2 grob object of the decision tree.
Examples
x <- compute_tree(penguins, target_lab = 'species')
draw_tree(x$dat, x$fit, x$term_dat, x$layout)
Print decision tree performance according to different metrics.
Description
Print decision tree performance according to different metrics.
Usage
eval_tree(
dat,
target_lab = colnames(dat)[1],
task = c("classification", "regression"),
metrics = NULL
)
Arguments
dat |
Dataframe with truths (column 'target_lab') and estimates (column 'y_hat') of samples from original dataset. |
target_lab |
Name of the column in data that contains target/label information. |
task |
Character string indicating the type of problem, either 'classification' (categorical outcome) or 'regression' (continuous outcome). |
metrics |
A set of metric functions to evaluate decision tree, defaults to common metrics for classification/regression problems. Can be defined with 'yardstick::metric_set'. |
Value
Character string of the decision tree evaluation.
Examples
eval_tree(compute_tree(penguins, target_lab = 'species')$dat)
Galaxy dataset for regression.
Description
Fetched from PMLB.
Usage
galaxy
Format
An object of class data.frame
with 323 rows and 5 columns.
Details
#' @format A data frame with 323 observations and 5 variables:
eastwest
, northsouth
, angle
, radialposition
and target
(velocity).
https://www.openml.org/d/690
Get color functions from character vectors
Description
Get color functions from character vectors
Usage
get_cols(my_cols, task, guide = FALSE)
Arguments
my_cols |
Character vectors of different hex values |
task |
Character string indicating the type of problem, either 'classification' (categorical outcome) or 'regression' (continuous outcome). |
guide |
A function used to create a guide or its name. Inherit from ['ggplot2::guides()'](https://ggplot2.tidyverse.org/reference/guides.html). |
Select the important features to be displayed.
Description
Select features with p-value (computed from decision tree) < 'p_thres' or all features if 'show_all_feats == TRUE'.
Usage
get_disp_feats(fit, feat_names, show_all_feats, p_thres)
Arguments
fit |
constparty object of the decision tree. |
feat_names |
Character vector specifying the feature names in dat. |
show_all_feats |
Logical. If TRUE, show all features regardless of 'p_thres'. |
p_thres |
Numeric value indicating the p-value threshold of feature importance. Feature with p-values computed from the decision tree below this value will be displayed on the heatmap. |
Value
A character vector of feature names.
———————————————————————————— Get the fitted tree depending on the input 'x'.
Description
If 'x' is a data.frame object, computes conditional tree from partkit::ctree(). If 'x' is a partynode object specifying the customized tree, fit 'x' on 'data_test'. If 'x' is a party (or constparty) object specifying the precomputed tree, simply coerce 'x' to have class constparty.
Usage
get_fit(x, ...)
## Default S3 method:
get_fit(x, ...)
## S3 method for class 'partynode'
get_fit(x, data_test, target_lab, ...)
## S3 method for class 'party'
get_fit(x, data_test, target_lab, task, ...)
## S3 method for class 'data.frame'
get_fit(x, data_test, target_lab, ...)
Arguments
x |
Dataframe or a 'party' or 'partynode' object representing a custom tree. If a dataframe is supplied, conditional inference tree is computed. If a custom tree is supplied, it must follow the partykit syntax: https://cran.r-project.org/web/packages/partykit/vignettes/partykit.pdf |
... |
Further arguments passed to each method. |
data_test |
Tidy test dataset. Required if 'x' is a 'partynode' object. If NULL, heatmap displays (training) data 'x'. |
target_lab |
Name of the column in data that contains target/label information. |
task |
Character string indicating the type of problem, either 'classification' (categorical outcome) or 'regression' (continuous outcome). |
Value
Fitted object as a list with prepped 'data_test' if available.
Draws and aligns decision tree and heatmap.
Description
heat_tree() alias.
Usage
heat_tree(
x,
target_lab = NULL,
data_test = NULL,
task = c("classification", "regression"),
feat_types = NULL,
label_map = NULL,
target_cols = NULL,
target_legend = FALSE,
clust_samps = TRUE,
clust_target = TRUE,
custom_layout = NULL,
show = "heat-tree",
heat_rel_height = 0.2,
lev_fac = 1.3,
panel_space = 0.001,
print_eval = (!is.null(data_test)),
...
)
treeheatr(
x,
target_lab = NULL,
data_test = NULL,
task = c("classification", "regression"),
feat_types = NULL,
label_map = NULL,
target_cols = NULL,
target_legend = FALSE,
clust_samps = TRUE,
clust_target = TRUE,
custom_layout = NULL,
show = "heat-tree",
heat_rel_height = 0.2,
lev_fac = 1.3,
panel_space = 0.001,
print_eval = (!is.null(data_test)),
...
)
Arguments
x |
Dataframe or a 'party' or 'partynode' object representing a custom tree. If a dataframe is supplied, conditional inference tree is computed. If a custom tree is supplied, it must follow the partykit syntax: https://cran.r-project.org/web/packages/partykit/vignettes/partykit.pdf |
target_lab |
Name of the column in data that contains target/label information. |
data_test |
Tidy test dataset. Required if 'x' is a 'partynode' object. If NULL, heatmap displays (training) data 'x'. |
task |
Character string indicating the type of problem, either 'classification' (categorical outcome) or 'regression' (continuous outcome). |
feat_types |
Named vector indicating the type of each features, e.g., c(sex = 'factor', age = 'numeric'). If feature types are not supplied, infer from column type. |
label_map |
Named vector of the meaning of the target values, e.g., c(‘0' = ’Edible', ‘1' = ’Poisonous'). |
target_cols |
Character vectors representing the hex values of different level colors for targets, defaults to viridis option B. |
target_legend |
Logical. If TRUE, target legend is drawn. |
clust_samps |
Logical. If TRUE, hierarchical clustering would be performed among samples within each leaf node. |
clust_target |
Logical. If TRUE, target/label is included in hierarchical clustering of samples within each leaf node and might yield a more interpretable heatmap. |
custom_layout |
Dataframe with 3 columns: id, x and y for manually input custom layout. |
show |
Character string indicating which components of the decision tree-heatmap should be drawn. Can be 'heat-tree', 'heat-only' or 'tree-only'. |
heat_rel_height |
Relative height of heatmap compared to whole figure (with tree). |
lev_fac |
Relative weight of child node positions according to their levels, commonly ranges from 1 to 1.5. 1 for parent node perfectly in the middle of child nodes. |
panel_space |
Spacing between facets relative to viewport, recommended to range from 0.001 to 0.01. |
print_eval |
Logical. If TRUE, print evaluation of the tree performance. Defaults to TRUE when 'data_test' is supplied. |
... |
Further arguments passed to 'draw_tree()' and/or 'draw_heat()'. |
Value
A gtable/grob object of the decision tree (top) and heatmap (bottom).
Examples
heat_tree(penguins, target_lab = 'species')
heat_tree(
x = galaxy[1:100, ],
target_lab = 'target',
task = 'regression',
terminal_vars = NULL,
tree_space_bottom = 0)
treeheatr(penguins, target_lab = 'species')
treeheatr(
x = galaxy[1:100, ],
target_lab = 'target',
task = 'regression',
terminal_vars = NULL,
tree_space_bottom = 0)
Data of three different species of penguins.
Description
Collected and made available by Dr. Kristen Gorman and the Palmer Station, Antarctica LTER, a member of the Long Term Ecological Research Network.
Usage
penguins
Format
A data frame with 344 observations and 7 variables:
species
, island
, culmen_length_mm
, culmen_depth_mm
,
flipper_length_mm
, body_mass_g
and sex
.
Gorman KB, Williams TD, Fraser WR (2014). Ecological Sexual Dimorphism and Environmental Variability within a Community of Antarctic Penguins (Genus Pygoscelis). PLoS ONE 9(3): e90081. doi:10.1371/journal.pone.0090081
Details
Fetched from https://github.com/allisonhorst/penguins.
Creates smart node layout.
Description
Create node layout using a bottom-up approach (literally) and overwrites ggparty-precomputed positions in plot_data.
Usage
position_nodes(plot_data, terminal_data, custom_layout, lev_fac, panel_space)
Arguments
plot_data |
Dataframe output of 'ggparty:::get_plot_data()'. |
terminal_data |
Dataframe of terminal node information including id and raw terminal node size. |
custom_layout |
Dataframe with 3 columns: id, x and y for manually input custom layout. |
lev_fac |
Relative weight of child node positions according to their levels, commonly ranges from 1 to 1.5. 1 for parent node perfectly in the middle of child nodes. |
panel_space |
Spacing between facets relative to viewport, recommended to range from 0.001 to 0.01. |
Value
Dataframe with 3 columns: id, x and y of smart layout combined with custom_layout.
Apply the predicted tree on either new test data or training data.
Description
Select features with p-value (computed from decision tree) < 'p_thres' or all features if 'show_all_feats == TRUE'.
Usage
prediction_df(fit, task, clust_samps, clust_target)
Arguments
fit |
constparty object of the decision tree. |
task |
Character string indicating the type of problem, either 'classification' (categorical outcome) or 'regression' (continuous outcome). |
clust_samps |
Logical. If TRUE, hierarchical clustering would be performed among samples within each leaf node. |
clust_target |
Logical. If TRUE, target/label is included in hierarchical clustering of samples within each leaf node and might yield a more interpretable heatmap. |
Value
A dataframe of prediction values with scaled columns and clustered samples.
———————————————————————————— Prepare dataset
Description
———————————————————————————— Prepare dataset
Usage
prep_data(data, target_lab, task, feat_types = NULL)
Arguments
data |
Original data frame with features to be converted to correct types. |
target_lab |
Name of the column in data that contains target/label information. |
task |
Character string indicating the type of problem, either 'classification' (categorical outcome) or 'regression' (continuous outcome). |
feat_types |
Named vector indicating the type of each features, e.g., c(sex = 'factor', age = 'numeric'). If feature types are not supplied, infer from column type. |
Value
List of dataframes (training + test) with proper feature types and target name.
Prepares the feature dataframes for tiles.
Description
If R does not recognize a categorical feature (input from user) as factor, converts to factor.
Usage
prepare_feats(dat, disp_feats, feat_types, clust_feats, trans_type)
Arguments
dat |
Dataframe with samples from original dataset ordered according to the clustering within each leaf node. |
disp_feats |
Character vector specifying features to be displayed. |
feat_types |
Named vector indicating the type of each features, e.g., c(sex = 'factor', age = 'numeric'). If feature types are not supplied, infer from column type. |
clust_feats |
Logical. If TRUE, performs cluster on the features. |
trans_type |
Character string of 'normalize', 'scale' or 'none'. If 'scale', subtract the mean and divide by the standard deviation. If 'normalize', i.e., max-min normalize, subtract the min and divide by the max. If 'none', no transformation is applied. More information on what transformation to choose can be acquired here: https://cran.rstudio.com/package=heatmaply/vignettes/heatmaply.html#data-transformation-scaling-normalize-and-percentize |
Value
A list of two dataframes (continuous and categorical) from the original dataset.
Print a ggHeatTree object. Adopted from https://github.com/daattali/ggExtra/blob/master/R/ggMarginal.R#L207-L244.
Description
ggHeatTree
objects are created from heat_tree()
. This is the S3
generic print method to print the result of the scatterplot with its marginal
plots.
Usage
## S3 method for class 'ggHeatTree'
print(x, newpage = is.null(vp), vp = NULL, ...)
Arguments
x |
ggHeatTree (gtable grob) object. |
newpage |
Should a new page (i.e., an empty page) be drawn before the ggHeatTree is drawn? |
vp |
viewpoint |
... |
ignored |
Performs transformation on continuous variables.
Description
Performs transformation on continuous variables for the heatmap color scales.
Usage
scale_norm(x, trans_type = c("percentize", "normalize", "scale", "none"))
Arguments
x |
Numeric vector. |
trans_type |
Character string of 'normalize', 'scale' or 'none'. If 'scale', subtract the mean and divide by the standard deviation. If 'normalize', i.e., max-min normalize, subtract the min and divide by the max. If 'none', no transformation is applied. More information on what transformation to choose can be acquired here: https://cran.rstudio.com/package=heatmaply/vignettes/heatmaply.html#data-transformation-scaling-normalize-and-percentize |
Value
Numeric vector of the transformed 'x'.
Examples
scale_norm(1:5)
scale_norm(1:5, 'normalize')
Determines terminal node position.
Description
Create node layout using a bottom-up approach (literally) and overwrites ggparty-precomputed positions in plot_data.
Usage
term_node_pos(plot_data, dat)
Arguments
plot_data |
Dataframe output of 'ggparty:::get_plot_data()'. |
dat |
Dataframe of prediction values with scaled columns and clustered samples. |
Value
Dataframe with terminal node information.
External test dataset. Medical information of Wuhan patients collected between 2020-01-10 and 2020-02-18.
Description
External test dataset. Medical information of Wuhan patients collected between 2020-01-10 and 2020-02-18.
Usage
test_covid
Format
A data frame with 110 observations and 7 XGBoost-selected variables:
PATIENT_ID
, Lactate dehydrogenase
,
High sensitivity C-reactive protein
, (%)lymphocyte
,
Admission time
, Discharge time
and outcome
.
An interpretable mortality prediction model for COVID-19 patients. Yan et al. https://doi.org/10.1038/s42256-020-0180-7 https://github.com/HAIRLAB/Pre_Surv_COVID_19
Training dataset. Medical information of Wuhan patients collected between 2020-01-10 and 2020-02-18. Containing NAs.
Description
Training dataset. Medical information of Wuhan patients collected between 2020-01-10 and 2020-02-18. Containing NAs.
Usage
train_covid
Format
A data frame with 375 observations and 77 variables.
An interpretable mortality prediction model for COVID-19 patients. Yan et al. https://doi.org/10.1038/s42256-020-0180-7 https://github.com/HAIRLAB/Pre_Surv_COVID_19
Results of a chemical analysis of wines grown in a specific area of Italy.
Description
Three types of wine are represented in the 178 samples, with the results of 13 chemical analyses recorded for each sample.
Usage
wine
Format
A data frame with 178 observations and 14 variables:
Alcohol
, Malic
, Ash
, Alcalinity
,
Magnesium
, Phenols
, Flavanoids
, Nonflavanoids
,
Proanthocyanins
, Color
, Hue
, Dilution
, Proline
and Type
(target).
Details
Import with data(wine, package = 'rattle'). Dependent variable: Type. https://rdrr.io/cran/rattle.data/man/wine.html http://archive.ics.uci.edu/ml/datasets/wine
Red variant of the Portuguese "Vinho Verde" wine.
Description
Fetched from PMLB. Physicochemical and quality of wine.
Usage
wine_quality_red
Format
A data frame with 1599 observations and 12 variables:
fixed.acidity
, volatile.acidity
,
citric.acid
, residual.sugar
, chlorides
, free.sulfur.dioxide
,
total.sulfur.dioxide
, density
, pH
, sulphates
,
alcohol
and target
(quality).
http://archive.ics.uci.edu/ml/datasets/Wine+Quality
P. Cortez, A. Cerdeira, F. Almeida, T. Matos and J. Reis. Modeling wine preferences by data mining from physicochemical properties. In Decision Support Systems, Elsevier, 47(4):547-553, 2009.