library(lessR)
#>
#> lessR 4.4.5 feedback: gerbing@pdx.edu
#> --------------------------------------------------------------
#> > d <- Read("") Read data file, many formats available, e.g., Excel
#> d is default data frame, data= in analysis routines optional
#>
#> Many examples of reading, writing, and manipulating data,
#> graphics, testing means and proportions, regression, factor analysis,
#> customization, forecasting, and aggregation from pivot tables
#> Enter: browseVignettes("lessR")
#>
#> View lessR updates, now including time series forecasting
#> Enter: news(package="lessR")
#>
#> Interactive data analysis
#> Enter: interact()
The vignette examples of using lessR became so extensive that the maximum R package installation size was exceeded. Find a limited number of examples below. Find many more vignette examples at:
more examples of reading and writing data files
Many of the following examples analyze data in the Employee
data set, included with lessR. To read an internal
lessR data set, just pass the name of the data set to
the lessR function Read()
. Read the
Employee data into the data frame d. For data sets
other than those provided by lessR, enter the path name
or URL between the quotes, or leave the quotes empty to browse for the
data file on your computer system. See the Read and Write
vignette for more details.
d <- Read("Employee")
#>
#> >>> Suggestions
#> Recommended binary format for data files: feather
#> Create with Write(d, "your_file", format="feather")
#> More details about your data, Enter: details() for d, or details(name)
#>
#> Data Types
#> ------------------------------------------------------------
#> character: Non-numeric data values
#> integer: Numeric data values, integers only
#> double: Numeric data values with decimal digits
#> ------------------------------------------------------------
#>
#> Variable Missing Unique
#> Name Type Values Values Values First and last values
#> ------------------------------------------------------------------------------------------
#> 1 Years integer 36 1 16 7 NA 7 ... 1 2 10
#> 2 Gender character 37 0 2 M M W ... W W M
#> 3 Dept character 36 1 5 ADMN SALE FINC ... MKTG SALE FINC
#> 4 Salary double 37 0 37 63788.26 104494.58 ... 66508.32 67562.36
#> 5 JobSat character 35 2 3 med low high ... high low high
#> 6 Plan integer 37 0 3 1 1 2 ... 2 2 1
#> 7 Pre integer 37 0 27 82 62 90 ... 83 59 80
#> 8 Post integer 37 0 22 92 74 86 ... 90 71 87
#> ------------------------------------------------------------------------------------------
d is the default name of the data frame for the
lessR data analysis functions. Explicitly access the
data frame with the data
parameter in the analysis
functions.
As an option, also read the table of variable labels. Create the
table formatted as two columns. The first column is the variable name
and the second column is the corresponding variable label. Not all
variables need be entered into the table. The table can be a
csv
file or an Excel file.
Read the file of variable labels into the l data frame, currently the only permitted name. The labels will be displayed on both the text and visualization output. Each displayed label is the variable name juxtaposed with the corresponding label, as shown in the following output.
l <- rd("Employee_lbl")
#>
#> >>> Suggestions
#> Recommended binary format for data files: feather
#> Create with Write(d, "your_file", format="feather")
#> More details about your data, Enter: details() for d, or details(name)
#>
#> Data Types
#> ------------------------------------------------------------
#> character: Non-numeric data values
#> ------------------------------------------------------------
#>
#> Variable Missing Unique
#> Name Type Values Values Values First and last values
#> ------------------------------------------------------------------------------------------
#> 1 label character 8 0 8 Time of Company Employment ... Test score on legal issues after instruction
#> ------------------------------------------------------------------------------------------
l
#> label
#> Years Time of Company Employment
#> Gender Man or Woman
#> Dept Department Employed
#> Salary Annual Salary (USD)
#> JobSat Satisfaction with Work Environment
#> Plan 1=GoodHealth, 2=GetWell, 3=BestCare
#> Pre Test score on legal issues before instruction
#> Post Test score on legal issues after instruction
more examples of bar charts and pie charts
Consider the categorical variable Dept in the Employee data table.
Use BarChart()
to tabulate and display the visualization of
the number of employees in each department, here relying upon the
default data frame (table) named d. Otherwise add the
data=
option for a data frame with another name.
Bar chart of tablulated counts of employees in each department.
#> >>> Suggestions
#> BarChart(Dept, horiz=TRUE) # horizontal bar chart
#> BarChart(Dept, fill="reds") # red bars of varying lightness
#> PieChart(Dept) # doughnut (ring) chart
#> Plot(Dept) # bubble plot
#> Plot(Dept, stat="count") # lollipop plot
#>
#> --- Dept ---
#>
#> Missing Values: 1
#>
#> ACCT ADMN FINC MKTG SALE Total
#> Frequencies: 5 6 4 6 15 36
#> Proportions: 0.139 0.167 0.111 0.167 0.417 1.000
#>
#> Chi-squared test of null hypothesis of equal probabilities
#> Chisq = 10.944, df = 4, p-value = 0.027
Specify a single fill color with the fill
parameter, the
edge color of the bars with color
. Set the transparency
level with transparency
. Against a lighter background,
display the value for each bar with a darker color using the
labels_color
parameter. To specify a color, use color
names, specify a color with either its rgb()
or
hcl()
color space coordinates, or use the
lessR custom color palette function
getColors()
.
#> >>> Suggestions
#> BarChart(Dept, horiz=TRUE) # horizontal bar chart
#> BarChart(Dept, fill="reds") # red bars of varying lightness
#> PieChart(Dept) # doughnut (ring) chart
#> Plot(Dept) # bubble plot
#> Plot(Dept, stat="count") # lollipop plot
#>
#> --- Dept ---
#>
#> Missing Values: 1
#>
#> ACCT ADMN FINC MKTG SALE Total
#> Frequencies: 5 6 4 6 15 36
#> Proportions: 0.139 0.167 0.111 0.167 0.417 1.000
#>
#> Chi-squared test of null hypothesis of equal probabilities
#> Chisq = 10.944, df = 4, p-value = 0.027
Use the theme
parameter to change the entire color
theme: “colors”, “lightbronze”, “dodgerblue”, “slatered”, “darkred”,
“gray”, “gold”, “darkgreen”, “blue”, “red”, “rose”, “green”, “purple”,
“sienna”, “brown”, “orange”, “white”, and “light”. In this example,
changing the full theme accomplishes the same as changing the fill
color. Turn off the displayed value on each bar with the parameter
labels
set to off
. Specify a horizontal bar
chart with base R parameter horiz
.
#> >>> Suggestions
#> BarChart(Dept, horiz=TRUE) # horizontal bar chart
#> BarChart(Dept, fill="reds") # red bars of varying lightness
#> PieChart(Dept) # doughnut (ring) chart
#> Plot(Dept) # bubble plot
#> Plot(Dept, stat="count") # lollipop plot
#>
#> --- Dept ---
#>
#> Missing Values: 1
#>
#> ACCT ADMN FINC MKTG SALE Total
#> Frequencies: 5 6 4 6 15 36
#> Proportions: 0.139 0.167 0.111 0.167 0.417 1.000
#>
#> Chi-squared test of null hypothesis of equal probabilities
#> Chisq = 10.944, df = 4, p-value = 0.027
Consider the continuous variable Salary in the Employee data
table. Use Histogram()
to tabulate and display the number
of employees in each department, here relying upon the default data
frame (table) named d, so the data=
parameter is
not needed.
Histogram of tablulated counts for the bins of Salary.
#> >>> Suggestions
#> bin_width: set the width of each bin
#> bin_start: set the start of the first bin
#> bin_end: set the end of the last bin
#> Histogram(Salary, density=TRUE) # smoothed curve + histogram
#> Plot(Salary) # Violin/Box/Scatterplot (VBS) plot
#>
#> --- Salary ---
#>
#> n miss mean sd min mdn max
#> 37 0 83795.557 21799.533 56124.970 79547.600 144419.230
#>
#>
#>
#> --- Outliers --- from the box plot: 1
#>
#> Small Large
#> ----- -----
#> 144419.2
#>
#>
#> Bin Width: 10000
#> Number of Bins: 10
#>
#> Bin Midpnt Count Prop Cumul.c Cumul.p
#> ---------------------------------------------------------
#> 50000 > 60000 55000 4 0.11 4 0.11
#> 60000 > 70000 65000 8 0.22 12 0.32
#> 70000 > 80000 75000 8 0.22 20 0.54
#> 80000 > 90000 85000 5 0.14 25 0.68
#> 90000 > 100000 95000 3 0.08 28 0.76
#> 100000 > 110000 105000 5 0.14 33 0.89
#> 110000 > 120000 115000 1 0.03 34 0.92
#> 120000 > 130000 125000 1 0.03 35 0.95
#> 130000 > 140000 135000 1 0.03 36 0.97
#> 140000 > 150000 145000 1 0.03 37 1.00
#>
By default, the Histogram()
function provides a color
theme according to the current, active theme. The function also provides
the corresponding frequency distribution, summary statistics, the table
that lists the count of each category, from which the histogram is
constructed, as well as an outlier analysis based on Tukey’s outlier
detection rules for box plots.
Use the parameters bin_start
, bin_width
,
and bin_end
to customize the histogram.
Easy to change the color, either by changing the color theme with
style()
, or just change the fill color with
fill
. Can refer to standard R colors, as shown with
lessR function showColors()
, or implicitly
invoke the lessR color palette generating function
getColors()
. Each 30 degrees of the color wheel is named,
such as "greens"
, "rusts"
, etc, and implements
a sequential color palette.
Customized histogram.
#> >>> Suggestions
#> bin_end: set the end of the last bin
#> Histogram(Salary, density=TRUE) # smoothed curve + histogram
#> Plot(Salary) # Violin/Box/Scatterplot (VBS) plot
#>
#> --- Salary ---
#>
#> n miss mean sd min mdn max
#> 37 0 83795.557 21799.533 56124.970 79547.600 144419.230
#>
#>
#>
#> --- Outliers --- from the box plot: 1
#>
#> Small Large
#> ----- -----
#> 144419.2
#>
#>
#> Bin Width: 14000
#> Number of Bins: 8
#>
#> Bin Midpnt Count Prop Cumul.c Cumul.p
#> ---------------------------------------------------------
#> 35000 > 49000 42000 0 0.00 0 0.00
#> 49000 > 63000 56000 5 0.14 5 0.14
#> 63000 > 77000 70000 12 0.32 17 0.46
#> 77000 > 91000 84000 8 0.22 25 0.68
#> 91000 > 105000 98000 6 0.16 31 0.84
#> 105000 > 119000 112000 3 0.08 34 0.92
#> 119000 > 133000 126000 2 0.05 36 0.97
#> 133000 > 147000 140000 1 0.03 37 1.00
#>
more examples of scatter plots and related
Specify an X and Y variable with the plot function to obtain a scatter plot. For two variables, both variables can be any combination of continuous or categorical. One variable can also be specified. A scatterplot of two categorical variables yields a bubble plot. Below is a scatterplot of two continuous variables.
#>
#> >>> Suggestions or enter: style(suggest=FALSE)
#> Plot(Years, Salary, enhance=TRUE) # many options
#> Plot(Years, Salary, color="red") # exterior edge color of points
#> Plot(Years, Salary, fit="lm", fit_se=c(.90,.99)) # fit line, stnd errors
#> Plot(Years, Salary, out_cut=.10) # label top 10% from center as outliers
#>
#>
#> >>> Pearson's product-moment correlation
#>
#> Years: Time of Company Employment
#> Salary: Annual Salary (USD)
#>
#> Number of paired values with neither missing, n = 36
#> Sample Correlation of Years and Salary: r = 0.852
#>
#> Hypothesis Test of 0 Correlation: t = 9.501, df = 34, p-value = 0.000
#> 95% Confidence Interval for Correlation: 0.727 to 0.923
#>
Enhance the default scatterplot with parameter enhance
.
The visualization includes the mean of each variable indicated by the
respective line through the scatterplot, the 95% confidence ellipse,
labeled outliers, least-squares regression line with 95% confidence
interval, and the corresponding regression line with the outliers
removed.
Plot(Years, Salary, enhance=TRUE)
#> [Ellipse with Murdoch and Chow's function ellipse from their ellipse package]
#>
#>
#> >>> Suggestions or enter: style(suggest=FALSE)
#> Plot(Years, Salary, fill="skyblue") # interior fill color of points
#> Plot(Years, Salary, fit="lm", fit_se=c(.90,.99)) # fit line, stnd errors
#> Plot(Years, Salary, MD_cut=6) # Mahalanobis distance from center > 6 is an outlier
#>
#> >>> Outlier analysis with Mahalanobis Distance
#>
#> MD ID
#> ----- -----
#> 8.14 Correll, Trevon
#> 7.84 Capelle, Adam
#>
#> 5.63 Korhalkar, Jessica
#> 5.58 James, Leslie
#> 3.75 Hoang, Binh
#> ... ...
#>
#>
#> >>> Pearson's product-moment correlation
#>
#> Years: Time of Company Employment
#> Salary: Annual Salary (USD)
#>
#> Number of paired values with neither missing, n = 36
#> Sample Correlation of Years and Salary: r = 0.852
#>
#> Hypothesis Test of 0 Correlation: t = 9.501, df = 34, p-value = 0.000
#> 95% Confidence Interval for Correlation: 0.727 to 0.923
#>
The default plot for a single continuous variable includes not only the scatterplot, but also the superimposed violin plot and box plot, with outliers identified. Call this plot the VBS plot.
Plot(Salary)
#> [Violin/Box/Scatterplot graphics from Deepayan Sarkar's lattice package]
#>
#> >>> Suggestions
#> Plot(Salary, out_cut=2, fences=TRUE, vbs_mean=TRUE) # Label two outliers ...
#> Plot(Salary, box_adj=TRUE) # Adjust boxplot whiskers for asymmetry
Following is a scatterplot in the form of a bubble plot for two categorical variables.
#>
#> >>> Suggestions or enter: style(suggest=FALSE)
#> Plot(JobSat, Gender, size_cut=FALSE)
#> Plot(JobSat, Gender, trans=.8, bg="off", grid="off")
#> SummaryStats(JobSat, Gender) # or ss
#>
#> Joint and Marginal Frequencies
#> ------------------------------
#>
#> JobSat
#> Gender high low med Sum
#> M 3 11 4 18
#> W 8 2 7 17
#> Sum 11 13 11 35
#>
#> Cramer's V: 0.515
#>
#> Chi-square Test of Independence:
#> Chisq = 9.301, df = 2, p-value = 0.010
#>
#> Some Parameter values (can be manually set)
#> -------------------------------------------------------
#> radius: 0.22 size of largest bubble
#> power: 0.50 relative bubble sizes
more examples of t-tests and ANOVA
For the independent-groups t-test, specify the response variable to
the left of the tilde, ~
, and the categorical variable with
two groups, the grouping variable, to the right of the tilde.
ttest(Salary ~ Gender)
#>
#> Compare Salary across Gender with levels M and W
#> Grouping Variable: Gender, Man or Woman
#> Response Variable: Salary, Annual Salary (USD)
#>
#>
#> ------ Describe ------
#>
#> Salary for Gender M: n.miss = 0, n = 18, mean = 91147.458, sd = 23128.436
#> Salary for Gender W: n.miss = 0, n = 19, mean = 76830.598, sd = 18438.456
#>
#> Mean Difference of Salary: 14316.860
#>
#> Weighted Average Standard Deviation: 20848.636
#>
#>
#> ------ Assumptions ------
#>
#> Note: These hypothesis tests can perform poorly, and the
#> t-test is typically robust to violations of assumptions.
#> Use as heuristic guides instead of interpreting literally.
#>
#> Null hypothesis, for each group, is a normal distribution of Salary.
#> Group M Shapiro-Wilk normality test: W = 0.962, p-value = 0.647
#> Group W Shapiro-Wilk normality test: W = 0.828, p-value = 0.003
#>
#> Null hypothesis is equal variances of Salary, homogeneous.
#> Variance Ratio test: F = 534924536.348/339976675.129 = 1.573, df = 17;18, p-value = 0.349
#> Levene's test, Brown-Forsythe: t = 1.302, df = 35, p-value = 0.201
#>
#>
#> ------ Infer ------
#>
#> --- Assume equal population variances of Salary for each Gender
#>
#> t-cutoff for 95% range of variation: tcut = 2.030
#> Standard Error of Mean Difference: SE = 6857.494
#>
#> Hypothesis Test of 0 Mean Diff: t-value = 2.088, df = 35, p-value = 0.044
#>
#> Margin of Error for 95% Confidence Level: 13921.454
#> 95% Confidence Interval for Mean Difference: 395.406 to 28238.314
#>
#>
#> --- Do not assume equal population variances of Salary for each Gender
#>
#> t-cutoff: tcut = 2.036
#> Standard Error of Mean Difference: SE = 6900.112
#>
#> Hypothesis Test of 0 Mean Diff: t = 2.075, df = 32.505, p-value = 0.046
#>
#> Margin of Error for 95% Confidence Level: 14046.505
#> 95% Confidence Interval for Mean Difference: 270.355 to 28363.365
#>
#>
#> ------ Effect Size ------
#>
#> --- Assume equal population variances of Salary for each Gender
#>
#> Standardized Mean Difference of Salary, Cohen's d: 0.687
#>
#>
#> ------ Practical Importance ------
#>
#> Minimum Mean Difference of practical importance: mmd
#> Minimum Standardized Mean Difference of practical importance: msmd
#> Neither value specified, so no analysis
#>
#>
#> ------ Graphics Smoothing Parameter ------
#>
#> Density bandwidth for Gender M: 14777.680
#> Density bandwidth for Gender W: 11630.912
Next, to analyze the operational efficiency of a weeping device, do
the two way independent groups ANOVA analyzing the variable
breaks across levels of tension and wool.
Specify the second independent variable preceded by a *
sign.
#>
#> BACKGROUND
#>
#> Response Variable: breaks
#>
#> Factor Variable 1: tension
#> Levels: L M H
#>
#> Factor Variable 2: wool
#> Levels: A B
#>
#> Number of cases (rows) of data: 54
#> Number of cases retained for analysis: 54
#>
#> Two-way Between Groups ANOVA
#>
#>
#> DESCRIPTIVE STATISTICS
#>
#>
#> Equal cell sizes, so balanced design
#>
#>
#> tension
#> wool L M H
#> A 9 9 9
#> B 9 9 9
#>
#>
#> tension
#> wool L M H
#> A 44.56 24.00 24.56
#> B 28.22 28.78 18.78
#>
#> tension
#>
#> L M H
#> 1 36.39 26.39 21.67
#>
#> wool
#>
#> A B
#> 1 31.04 25.26
#>
#> NA
#>
#>
#>
#> tension
#> wool L M H
#> A 18.10 8.66 10.27
#> B 9.86 9.43 4.89
#>
#>
#> ANOVA
#>
#>
#> df Sum Sq Mean Sq F-value p-value
#> tension 2 2034.26 1017.13 8.50 0.0007
#> wool 1 450.67 450.67 3.77 0.0582
#> tension:wool 2 1002.78 501.39 4.19 0.0210
#> Residuals 48 5745.11 119.69
#>
#> Partial Omega Squared for tension: 0.217
#> Partial Omega Squared for wool: 0.049
#> Partial Omega Squared for tension & wool: 0.106
#>
#> Cohen's f for tension: 0.527
#> Cohen's f for wool: 0.226
#> Cohen's f for tension_&_wool: 0.344
#>
#>
#> TUKEY MULTIPLE COMPARISONS OF MEANS
#>
#> Family-wise Confidence Level: 0.95
#>
#> Factor: tension
#> -------------------------------
#> diff lwr upr p adj
#> M-L -10.00 -18.82 -1.18 0.02
#> H-L -14.72 -23.54 -5.90 0.00
#> H-M -4.72 -13.54 4.10 0.40
#>
#> Factor: wool
#> -----------------------------
#> diff lwr upr p adj
#> B-A -5.78 -11.76 0.21 0.06
#>
#> Cell Means
#> ------------------------------------
#> diff lwr upr p adj
#> M:A-L:A -20.56 -35.86 -5.25 0.00
#> H:A-L:A -20.00 -35.31 -4.69 0.00
#> L:B-L:A -16.33 -31.64 -1.03 0.03
#> M:B-L:A -15.78 -31.08 -0.47 0.04
#> H:B-L:A -25.78 -41.08 -10.47 0.00
#> H:A-M:A 0.56 -14.75 15.86 1.00
#> L:B-M:A 4.22 -11.08 19.53 0.96
#> M:B-M:A 4.78 -10.53 20.08 0.94
#> H:B-M:A -5.22 -20.53 10.08 0.91
#> L:B-H:A 3.67 -11.64 18.97 0.98
#> M:B-H:A 4.22 -11.08 19.53 0.96
#> H:B-H:A -5.78 -21.08 9.53 0.87
#> M:B-L:B 0.56 -14.75 15.86 1.00
#> H:B-L:B -9.44 -24.75 5.86 0.46
#> H:B-M:B -10.00 -25.31 5.31 0.39
#>
#>
#> RESIDUALS
#>
#> Fitted Values, Residuals, Standardized Residuals
#> [sorted by Standardized Residuals, ignoring + or - sign]
#> [res_rows = 20, out of 54 cases (rows) of data, or res_rows="all"]
#> ------------------------------------------------
#> tension wool breaks fitted residual z-resid
#> 5 L A 70.00 44.56 25.44 2.47
#> 9 L A 67.00 44.56 22.44 2.18
#> 4 L A 25.00 44.56 -19.56 -1.90
#> 8 L A 26.00 44.56 -18.56 -1.80
#> 1 L A 26.00 44.56 -18.56 -1.80
#> 24 H A 43.00 24.56 18.44 1.79
#> 36 L B 44.00 28.22 15.78 1.53
#> 2 L A 30.00 44.56 -14.56 -1.41
#> 23 H A 10.00 24.56 -14.56 -1.41
#> 29 L B 14.00 28.22 -14.22 -1.38
#> 37 M B 42.00 28.78 13.22 1.28
#> 34 L B 41.00 28.22 12.78 1.24
#> 40 M B 16.00 28.78 -12.78 -1.24
#> 14 M A 12.00 24.00 -12.00 -1.16
#> 18 M A 36.00 24.00 12.00 1.16
#> 19 H A 36.00 24.56 11.44 1.11
#> 16 M A 35.00 24.00 11.00 1.07
#> 41 M B 39.00 28.78 10.22 0.99
#> 44 M B 39.00 28.78 10.22 0.99
#> 39 M B 19.00 28.78 -9.78 -0.95
For a one-way ANOVA, just include one independent variable. A randomized block design is also available.
more examples of analyzing proportions
The analysis of proportions is of two primary types.
Here, just analyze the \(\chi^2\)
test of independence, which applies to two categorical variables. The
first categorical variable listed in this example is the value of the
parameter variable
, the first parameter in the function
definition, so does not need the parameter name. The second categorical
variable listed must include the parameter name by
.
The question for the analysis is if the observed frequencies of Jacket thickness and Bike ownership sufficiently differ from the frequencies expected by the null hypothesis that we conclude the variables are related.
d <- Read("Jackets")
#>
#> >>> Suggestions
#> Recommended binary format for data files: feather
#> Create with Write(d, "your_file", format="feather")
#> More details about your data, Enter: details() for d, or details(name)
#>
#> Data Types
#> ------------------------------------------------------------
#> character: Non-numeric data values
#> ------------------------------------------------------------
#>
#> Variable Missing Unique
#> Name Type Values Values Values First and last values
#> ------------------------------------------------------------------------------------------
#> 1 Bike character 1025 0 2 BMW Honda Honda ... Honda Honda BMW
#> 2 Jacket character 1025 0 3 Lite Lite Lite ... Lite Med Lite
#> ------------------------------------------------------------------------------------------
Prop_test(Jacket, by=Bike)
#> variable: Jacket
#> by: Bike
#>
#> <<< Pearson's Chi-squared test
#>
#> --- Description
#>
#> Jacket
#> Bike Lite Med Thick Sum
#> BMW 89 135 194 418
#> Honda 283 207 117 607
#> Sum 372 342 311 1025
#>
#> Cramer's V: 0.319
#>
#> Row Col Observed Expected Residual Stnd Res
#> 1 1 89 151.703 -62.703 -8.288
#> 1 2 135 139.469 -4.469 -0.602
#> 1 3 194 126.827 67.173 9.287
#> 2 1 283 220.297 62.703 8.288
#> 2 2 207 202.531 4.469 0.602
#> 2 3 117 184.173 -67.173 -9.287
#>
#> --- Inference
#>
#> Chi-square statistic: 104.083
#> Degrees of freedom: 2
#> Hypothesis test of equal population proportions: p-value = 0.000
more examples of regression and logistic regression
The full output is extensive: Summary of the analysis, estimated model, fit indices, ANOVA, correlation matrix, collinearity analysis, best subset regression, residuals and influence statistics, and prediction intervals. The motivation is to provide virtually all of the information needed for a proper regression analysis.
#> >>> Suggestion
#> # Create an R markdown file for interpretative output with Rmd = "file_name"
#> reg(Salary ~ Years + Pre, Rmd="eg")
#>
#>
#> BACKGROUND
#>
#> Data Frame: d
#>
#> Response Variable: Salary
#> Predictor Variable 1: Years
#> Predictor Variable 2: Pre
#>
#> Number of cases (rows) of data: 37
#> Number of cases retained for analysis: 36
#>
#>
#> BASIC ANALYSIS
#>
#> Estimate Std Err t-value p-value Lower 95% Upper 95%
#> (Intercept) 54140.971 13666.115 3.962 0.000 26337.052 81944.891
#> Years 3251.408 347.529 9.356 0.000 2544.355 3958.462
#> Pre -18.265 167.652 -0.109 0.914 -359.355 322.825
#>
#> Standard deviation of Salary: 21,822.372
#>
#> Standard deviation of residuals: 11,753.478 for df=33
#> 95% range of residuals: 47,825.260 = 2 * (2.035 * 11,753.478)
#>
#> R-squared: 0.726 Adjusted R-squared: 0.710 PRESS R-squared: 0.659
#>
#> Null hypothesis of all 0 population slope coefficients:
#> F-statistic: 43.827 df: 2 and 33 p-value: 0.000
#>
#> -- Analysis of Variance
#>
#> df Sum Sq Mean Sq F-value p-value
#> Years 1 12107157290.292 12107157290.292 87.641 0.000
#> Pre 1 1639658.444 1639658.444 0.012 0.914
#>
#> Model 2 12108796948.736 6054398474.368 43.827 0.000
#> Residuals 33 4558759843.773 138144237.690
#> Salary 35 16667556792.508 476215908.357
#>
#>
#> K-FOLD CROSS-VALIDATION
#>
#>
#> RELATIONS AMONG THE VARIABLES
#>
#> Salary Years Pre
#> Salary 1.00 0.85 0.03
#> Years 0.85 1.00 0.05
#> Pre 0.03 0.05 1.00
#>
#> Tolerance VIF
#> Years 0.998 1.002
#> Pre 0.998 1.002
#>
#> Years Pre R2adj X's
#> 1 0 0.718 1
#> 1 1 0.710 2
#> 0 1 -0.028 1
#>
#> [based on Thomas Lumley's leaps function from the leaps package]
#>
#>
#> RESIDUALS AND INFLUENCE
#>
#> -- Data, Fitted, Residual, Studentized Residual, Dffits, Cook's Distance
#> [sorted by Cook's Distance]
#> [n_res_rows = 20, out of 36 rows of data, or do n_res_rows="all"]
#> -----------------------------------------------------------------------------------------
#> Years Pre Salary fitted resid rstdnt dffits cooks
#> Correll, Trevon 21 97 144419.230 120648.843 23770.387 2.424 1.217 0.430
#> James, Leslie 18 70 132563.380 111387.773 21175.607 1.998 0.714 0.156
#> Capelle, Adam 24 83 118138.430 130658.778 -12520.348 -1.211 -0.634 0.132
#> Hoang, Binh 15 96 121074.860 101158.659 19916.201 1.860 0.649 0.131
#> Korhalkar, Jessica 2 74 82502.500 59292.181 23210.319 2.171 0.638 0.122
#> Billing, Susan 4 91 82675.260 65484.493 17190.767 1.561 0.472 0.071
#> Singh, Niral 2 59 71055.440 59566.155 11489.285 1.064 0.452 0.068
#> Skrotzki, Sara 18 63 101352.330 111515.627 -10163.297 -0.937 -0.397 0.053
#> Saechao, Suzanne 8 98 65545.250 78362.271 -12817.021 -1.157 -0.390 0.050
#> Kralik, Laura 10 74 102681.190 85303.447 17377.743 1.535 0.287 0.026
#> Anastasiou, Crystal 2 59 66508.320 59566.155 6942.165 0.636 0.270 0.025
#> Langston, Matthew 5 94 59188.960 68681.106 -9492.146 -0.844 -0.268 0.024
#> Afshari, Anbar 6 100 79441.930 71822.925 7619.005 0.689 0.264 0.024
#> Cassinelli, Anastis 10 80 67562.360 85193.857 -17631.497 -1.554 -0.265 0.022
#> Osterman, Pascal 5 69 59704.790 69137.730 -9432.940 -0.826 -0.216 0.016
#> Bellingar, Samantha 10 67 76337.830 85431.301 -9093.471 -0.793 -0.198 0.013
#> LaRoe, Maria 10 80 71961.290 85193.857 -13232.567 -1.148 -0.195 0.013
#> Ritchie, Darnell 7 82 63788.260 75403.102 -11614.842 -1.006 -0.190 0.012
#> Sheppard, Cory 14 66 105027.550 98455.199 6572.351 0.579 0.176 0.011
#> Downs, Deborah 7 90 67139.900 75256.982 -8117.082 -0.706 -0.174 0.010
#>
#>
#> PREDICTION ERROR
#>
#> -- Data, Predicted, Standard Error of Prediction, 95% Prediction Intervals
#> [sorted by lower bound of prediction interval]
#> [to see all intervals add n_pred_rows="all"]
#> ----------------------------------------------
#>
#> Years Pre Salary pred s_pred pi.lwr pi.upr width
#> Hamide, Bita 1 83 61036.850 55876.388 12290.483 30871.211 80881.564 50010.352
#> Singh, Niral 2 59 71055.440 59566.155 12619.291 33892.014 85240.296 51348.281
#> Anastasiou, Crystal 2 59 66508.320 59566.155 12619.291 33892.014 85240.296 51348.281
#> ...
#> Link, Thomas 10 83 76312.890 85139.062 11933.518 60860.137 109417.987 48557.849
#> LaRoe, Maria 10 80 71961.290 85193.857 11918.048 60946.405 109441.308 48494.903
#> Cassinelli, Anastis 10 80 67562.360 85193.857 11918.048 60946.405 109441.308 48494.903
#> ...
#> Correll, Trevon 21 97 144419.230 120648.843 12881.876 94440.470 146857.217 52416.747
#> Capelle, Adam 24 83 118138.430 130658.778 12955.608 104300.394 157017.161 52716.767
#>
#> ----------------------------------
#> Plot 1: Distribution of Residuals
#> Plot 2: Residuals vs Fitted Values
#> ----------------------------------
As with several other lessR functions, save the
output to an object with the name of your choosing, such as
r
, and then reference desired pieces of the output. View
the names of those pieces from the manual, here obtained with
?reg
, or use the R names function, such as in the following
example.
names(r)
#> [1] "out_suggest" "call" "formula" "vars"
#> [5] "out_title_bck" "out_background" "out_title_basic" "out_estimates"
#> [9] "out_fit" "out_anova" "out_title_mod" "out_mod"
#> [13] "out_mdls" "out_title_kfold" "out_kfold" "out_title_rel"
#> [17] "out_cor" "out_collinear" "out_subsets" "out_title_res"
#> [21] "out_residuals" "out_title_pred" "out_predict" "out_ref"
#> [25] "out_Rmd" "out_Word" "out_pdf" "out_odt"
#> [29] "out_rtf" "out_plots" "n.vars" "n.obs"
#> [33] "n.keep" "coefficients" "sterrs" "tvalues"
#> [37] "pvalues" "cilb" "ciub" "anova_model"
#> [41] "anova_residual" "anova_total" "se" "resid_range"
#> [45] "Rsq" "Rsqadj" "PRESS" "RsqPRESS"
#> [49] "m_se" "m_MSE" "m_Rsq" "cor"
#> [53] "tolerances" "vif" "resid.max" "pred_min_max"
#> [57] "residuals" "fitted" "cooks.distance" "model"
#> [61] "terms"
View any piece of output with the name of the output file, a dollar sign, and the specific name of that piece. Here, examine the fit indices.
r$out_fit
#> Standard deviation of Salary: 21,822.372
#>
#> Standard deviation of residuals: 11,753.478 for df=33
#> 95% range of residuals: 47,825.260 = 2 * (2.035 * 11,753.478)
#>
#> R-squared: 0.726 Adjusted R-squared: 0.710 PRESS R-squared: 0.659
#>
#> Null hypothesis of all 0 population slope coefficients:
#> F-statistic: 43.827 df: 2 and 33 p-value: 0.000
These expressions could also be included in a markdown document that systematically reviews each desired piece of the output.
more examples of run charts, time series charts, and forecasting
The time series plot, plotting the values of a variable cross time,
is a special case of a scatterplot, potentially with the points of size
0 with adjacent points connected by a line segment. Indicate a time
series by specifying the x
-variable, the first variable
listed, as a variable of type Date
. Unlike Base R
functions, Plot()
automatically converts to
Date
data values as dates specified in a digital format,
such as 18/8/2024
or related formats plus examples such as
2024 Q3
or 2024 Aug
. Otherwise, explicitly use
the R function as.Date()
to convert to this format before
calling Plot()
or pass the date format directly with the
ts_format
parameter.
Plot()
implements time series forecasting based on trend
and seasonality with either exponential smoothing or regression
analysis, including the accompanying visualization. Time series
parameters include:
ts_method
: Set at "es"
for exponential
smoothing, the default, or "lm"
for linear model
regression.ts_unit
: The time unit, either as the natural occurring
interval between dates in the data, the default, or aggregated to a
wider time interval.ts_ahead
: The number of time units to forecast into the
futurets_agg
: If aggregating the time unit, aggregate as the
"sum"
, the default, or as the "mean"
.ts_PIlevel
: The confidence level of the prediction
intervals, with 0.95 the default.ts_format
: Provides a specific format for the date
variable if not detected correctly by default.ts_seasons
: Set to FALSE
to turn off
seasonality in the estimated model.ts_trend
: Set to FALSE
to turn off trend
in the estimated model.ts_type
: Applies to exponential smoothing to specify
additive or multiplicative seasonality, with additive the default.In this StockPrice
data file, the date conversion as
already been done.
d <- Read("StockPrice")
#>
#> >>> Suggestions
#> Recommended binary format for data files: feather
#> Create with Write(d, "your_file", format="feather")
#> More details about your data, Enter: details() for d, or details(name)
#>
#> Data Types
#> ------------------------------------------------------------
#> character: Non-numeric data values
#> Date: Date with year, month and day
#> double: Numeric data values with decimal digits
#> ------------------------------------------------------------
#>
#> Variable Missing Unique
#> Name Type Values Values Values First and last values
#> ------------------------------------------------------------------------------------------
#> 1 Month Date 1461 0 487 1985-01-01 ... 2025-07-01
#> 2 Company character 1461 0 3 Apple Apple ... Intel Intel
#> 3 Price double 1461 0 1440 0.0955960601568222 ... 22.8500003814697
#> 4 Volume double 1461 0 1459 175302400 137737600 ... 67885500 79094900
#> ------------------------------------------------------------------------------------------
head(d)
#> Month Company Price Volume
#> 1 1985-01-01 Apple 0.09559606 175302400
#> 23 1985-02-01 Apple 0.09816800 137737600
#> 42 1985-03-01 Apple 0.08530761 247430400
#> 63 1985-04-01 Apple 0.07416180 114060800
#> 84 1985-05-01 Apple 0.07158987 57344000
#> 106 1985-06-01 Apple 0.05487159 576016000
We have the date as Month, and also have variables Company and stock Price.
d <- Read("StockPrice")
#>
#> >>> Suggestions
#> Recommended binary format for data files: feather
#> Create with Write(d, "your_file", format="feather")
#> More details about your data, Enter: details() for d, or details(name)
#>
#> Data Types
#> ------------------------------------------------------------
#> character: Non-numeric data values
#> Date: Date with year, month and day
#> double: Numeric data values with decimal digits
#> ------------------------------------------------------------
#>
#> Variable Missing Unique
#> Name Type Values Values Values First and last values
#> ------------------------------------------------------------------------------------------
#> 1 Month Date 1461 0 487 1985-01-01 ... 2025-07-01
#> 2 Company character 1461 0 3 Apple Apple ... Intel Intel
#> 3 Price double 1461 0 1440 0.0955960601568222 ... 22.8500003814697
#> 4 Volume double 1461 0 1459 175302400 137737600 ... 67885500 79094900
#> ------------------------------------------------------------------------------------------
Plot(Month, Price, filter=(Company=="Apple"), ts_area_fill="on")
#>
#> filter: (Company == "Apple")
#> -----
#> Rows of data before filtering: 1461
#> Rows of data after filtering: 487
#>
#> [with functions from Ryan, Ulrich, Bennett, and Joy's xts package]
#>
#> >>> Suggestions or enter: style(suggest=FALSE)
#> Plot(Month, Price, enhance=TRUE) # many options
#> Plot(Month, Price, fit="lm", fit_se=c(.90,.99)) # fit line, stnd errors
#> Plot(Month, Price, MD_cut=6) # Mahalanobis distance from center > 6 is an outlier
With the by
parameter, plot all three companies on the
same panel.
Plot(Month, Price, by=Company)
#> [with functions from Ryan, Ulrich, Bennett, and Joy's xts package]
#>
#> >>> Suggestions or enter: style(suggest=FALSE)
#> Plot(Month, Price, enhance=TRUE) # many options
#> Plot(Month, Price, fill="skyblue") # interior fill color of points
#> Plot(Month, Price, fit="lm", fit_se=c(.90,.99)) # fit line, stnd errors
#> Plot(Month, Price, MD_cut=6) # Mahalanobis distance from center > 6 is an outlier
Here, aggregate the mean by time, from months to quarters.
Plot(Month, Price, ts_unit="quarters", ts_agg="mean")
#> >>> Warning
#> The Date variable is not sorted in Increasing Order.
#>
#> For a data frame named d, enter:
#> d <- order_by(d, Month)
#> Maybe you have a by variable with repeating Date values?
#> Enter ?sort_by for more information and examples.
#> [with functions from Ryan, Ulrich, Bennett, and Joy's xts package]
#>
#> >>> Suggestions or enter: style(suggest=FALSE)
#> Plot(Month, Price, enhance=TRUE) # many options
#> Plot(Month, Price, color="red") # exterior edge color of points
#> Plot(Month, Price, fit="lm", fit_se=c(.90,.99)) # fit line, stnd errors
#> Plot(Month, Price, out_cut=.10) # label top 10% from center as outliers
Plot()
implements exponential smoothing or linear
regression with seasonality forecasting with accompanying visualization.
Parameters include ts_ahead
for the number of
ts_units
to forecast into the future, and
ts_format
to provide a specific format for the date
variable if not detected correctly by default. Parameter
ts_method
defaults to es
for exponential
smoothing, or set to lm
for linear regression. Control
aspects of the exponential smoothing estimation and prediction
algorithms with parameters ts_level
(alpha),
ts_trend
(beta), ts_seasons
(gamma),
ts_type
for additive or multiplicative seasonality, and
ts_PIlevel
for the level of the prediction intervals.
To forecast Apple’s stock price, focus here on the last several years
of the data, beginning with Row 400 through Row 473, the last row of
data for apple. In this example, forecast ahead 24 months. Here, rely
upon the default exponential smoothing estimation procedure from the
fpp3
ecosystem package fable
.
d <- d[400:473,]
Plot(Month, Price, ts_unit="months", ts_agg="mean", ts_ahead=24)
#> [with functions from Ryan, Ulrich, Bennett, and Joy's xts package]
#> Registered S3 method overwritten by 'tsibble':
#> method from
#> as_tibble.grouped_df dplyr
#>
#> Attaching package: 'tsibble'
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, union
#> Loading required package: fabletools
#>
#> Attaching package: 'fabletools'
#> The following object is masked from 'package:lessR':
#>
#> model
#> [with functions from Hyndman and Athanasopoulos's, fpp3 packages]
#> -- standard reference: https://otexts.com/fpp3/
#>
#> Specified model
#> ---------------
#> Price [with no specifications]
#> The specified model is only suggested, and may differ from the estimated model.
#>
#> Estimated model
#> ---------------
#> Price ~ error("M") + trend("A")
#>
#>
#> Model analysis
#> --------------
#> Series: Price
#> Model: ETS(M,A,N)
#> Smoothing parameters:
#> alpha = 0.9771807
#> beta = 0.000100023
#>
#> Initial states:
#> l[0] b[0]
#> 40.85085 2.197282
#>
#> sigma^2: 0.0087
#>
#> AIC AICc BIC
#> 657.4638 658.3462 668.9841
#>
#> Mean squared error of fit to data: 100.392
#> Forecast
#> --------
#> Month predicted upper lower width
#> 1 Jun 2024 170.5502 139.34620 201.7542 62.40802
#> 2 Jul 2024 172.7439 128.73308 216.7547 88.02166
#> 3 Aug 2024 174.9376 120.78206 229.0932 108.31110
#> 4 Sep 2024 177.1313 114.18778 240.0748 125.88705
#> 5 Oct 2024 179.3250 108.43963 250.2104 141.77074
#> 6 Nov 2024 181.5187 103.27572 259.7617 156.48595
#> 7 Dec 2024 183.7124 98.54096 268.8838 170.34285
#> 8 Jan 2025 185.9061 94.13481 277.6774 183.54255
#> 9 Feb 2025 188.0998 89.98780 286.2118 196.22395
#> 10 Mar 2025 190.2935 86.04969 294.5373 208.48757
#> 11 Apr 2025 192.4872 82.28278 302.6916 220.40879
#> 12 May 2025 194.6809 78.65797 310.7038 232.04580
#> 13 Jun 2025 196.8746 75.15229 318.5968 243.44455
#> 14 Jul 2025 199.0683 71.74723 326.3893 254.64204
#> 15 Aug 2025 201.2620 68.42766 334.0962 265.66858
#> 16 Sep 2025 203.4556 65.18100 341.7303 276.54930
#> 17 Oct 2025 205.6493 61.99669 349.3020 287.30531
#> 18 Nov 2025 207.8430 58.86576 356.8203 297.95456
#> 19 Dec 2025 210.0367 55.78053 364.2929 308.51241
#> 20 Jan 2026 212.2304 52.73436 371.7265 318.99214
#> 21 Feb 2026 214.4241 49.72147 379.1268 329.40532
#> 22 Mar 2026 216.6178 46.73678 386.4989 339.76208
#> 23 Apr 2026 218.8115 43.77583 393.8472 350.07137
#> 24 May 2026 221.0052 40.83466 401.1758 360.34110
Next, implement the classic Holt-Winters exponential smoothing method
from the Base~R function Holt-Winters()
.
Plot(Month, Price, ts_unit="months", ts_agg="mean", ts_ahead=24,
ts_source="classic")
#> [with functions from Ryan, Ulrich, Bennett, and Joy's xts package]
#> Smoothing Parameters
#> alpha: 0.7688 gamma: 1.000
#>
#> Mean squared error of fit to data: 161.98548
#>
#> Coefficients for Linear Trend
#> b0: 167.6257
#> s1: 1.10394 s2: 3.97746 s3: 9.5661 s4: 5.7359 s5: -2.83534 s6: 4.88025
#> s7: 12.19702 s8: 3.2572 s9: 4.12003 s10: -3.73196 s11: -1.51245 s12: 0.65792
#>
#> Forecast
#> --------
#> Month predicted upper lower width
#> 1 Jun 2024 168.7297 144.53356 192.9258 48.39223
#> 2 Jul 2024 171.6032 141.08261 202.1238 61.04119
#> 3 Aug 2024 177.1918 141.44884 212.9348 71.48599
#> 4 Sep 2024 173.3616 133.06751 213.6558 80.58825
#> 5 Oct 2024 164.7904 120.40942 209.1714 88.76196
#> 6 Nov 2024 172.5060 124.38400 220.6280 96.24398
#> 7 Dec 2024 179.8228 128.23031 231.4152 103.18490
#> 8 Jan 2025 170.8829 116.03921 225.7267 109.68748
#> 9 Feb 2025 171.7458 113.83299 229.6586 115.82557
#> 10 Mar 2025 163.8938 103.06660 224.7210 121.65435
#> 11 Apr 2025 166.1133 102.50512 229.7215 127.21636
#> 12 May 2025 168.2837 102.01108 234.5562 132.54516
#> 13 Jun 2025 168.7297 98.17823 239.2811 141.10290
#> 14 Jul 2025 171.6032 98.64046 244.5659 145.92549
#> 15 Aug 2025 177.1918 101.89498 252.4887 150.59372
#> 16 Sep 2025 173.3616 95.80088 250.9224 155.12152
#> 17 Oct 2025 164.7904 85.02997 244.5508 159.52086
#> 18 Nov 2025 172.5060 90.60495 254.4070 163.80209
#> 19 Dec 2025 179.8228 95.83564 263.8099 167.97424
#> 20 Jan 2026 170.8829 84.86033 256.9056 172.04524
#> 21 Feb 2026 171.7458 83.73472 259.7568 176.02211
#> 22 Mar 2026 163.8938 73.93823 253.8493 179.91110
#> 23 Apr 2026 166.1133 74.25441 257.9722 183.71778
#> 24 May 2026 168.2837 74.56008 262.0072 187.44717
more examples of exploratory and confirmatory factor analysis
Access the lessR data set called datMach4
for the
analysis of 351 people to the Mach IV scale. Read the optional variable
labels. Including the item contents as variable labels means that the
output of the confirmatory factor analysis contains the item content
grouped by factor.
d <- Read("Mach4", quiet=TRUE)
l <- Read("Mach4_lbl", var_labels=TRUE)
#>
#> >>> Suggestions
#> Recommended binary format for data files: feather
#> Create with Write(d, "your_file", format="feather")
#> More details about your data, Enter: details() for d, or details(name)
#>
#> Data Types
#> ------------------------------------------------------------
#> character: Non-numeric data values
#> ------------------------------------------------------------
#>
#> Variable Missing Unique
#> Name Type Values Values Values First and last values
#> ------------------------------------------------------------------------------------------
#> 1 label character 20 0 20 Never tell anyone the real reason you did something unless it is useful to do so ... Most people forget more easily the death of a parent than the loss of their property
#> ------------------------------------------------------------------------------------------
Calculate the correlations and store in R.
R <- cr(m01:m20)
#>
#> >>> No missing data
#>
#>
#> Note: To provide more color separation for off-diagonal
#> elements, the diagonal elements of the matrix for
#> computing the heat map are set to 0.
The correlation matrix for analysis is named R. The item (observed variable) correlation matrix is the numerical input into the confirmatory factor analysis.
Here, do the default two-factor solution with "promax"
rotation. The default correlation matrix is mycor. The
abbreviation for corEFA()
is efa()
.
efa(R, n_factors=4)
#> EXPLORATORY FACTOR ANALYSIS
#>
#> Loadings (except -0.2 to 0.2)
#> -------------------------------------
#> Factor1 Factor2 Factor3 Factor4
#> m06 0.828 -0.290
#> m07 0.712
#> m10 0.539
#> m03 0.422 0.318
#> m09 0.323
#> m05 0.649
#> m18 0.555 0.253
#> m13 0.543 0.226
#> m01 0.490
#> m12 0.434 -0.230
#> m08 0.236 -0.202
#> m14 0.402 0.991 -0.401
#> m04 0.426
#> m20 0.237 -0.282
#> m17 0.267
#> m19
#> m11 -0.299 0.309 -0.609
#> m16 0.274 -0.455
#> m02 -0.319
#> m15 -0.207 0.203 -0.214
#>
#> Sum of Squares
#> ------------------------------------------------
#> Factor1 Factor2 Factor3 Factor4
#> SS loadings 1.933 2.038 1.825 1.099
#> Proportion Var 0.097 0.102 0.091 0.055
#> Cumulative Var 0.097 0.199 0.290 0.345
#>
#> CONFIRMATORY FACTOR ANALYSIS CODE
#>
#> MeasModel <-
#> " F1 =~ m01 + m02 + m03 + m04 + m05
#> F2 =~ m06 + m07 + m08 + m09 + m10 + m11
#> F3 =~ m12 + m13 + m14 + m15
#> F4 =~ m17 + m18 + m19 + m20
#> "
#>
#> fit <- lessR::cfa(MeasModel)
#>
#> library(lavaan)
#> fit <- lavaan::cfa(MeasModel, data=d)
#> summary(fit, fit.measures=TRUE, standardized=TRUE)
#>
#> Deletion threshold: min_loading = 0.2
#> Deleted items: m16
The confirmatory factor analysis is of multiple-indicator measurement scales, that is, each item (observed variable) is assigned to only one factor. Solution method is centroid factor analysis.
Specify the measurement model for the analysis in Lavaan notation. Define four factors: Deceit, Trust, Cynicism, and Flattery.
Aggregate with pivot()
. Any function that processes a
single vector of data, such as a column of data values for a variable in
a data frame, and outputs a single computed value, the statistic, can be
passed to pivot()
. Functions can be user-defined or
built-in.
Here, compute the mean and standard deviation of each company in the StockPrice data set download it with lessR.
d <- Read("StockPrice", quiet=TRUE)
pivot(d, c(mean, sd), Price, by=Company)
#> Company n na Price_mean Price_sd
#> 1 Apple 487 0 28.426 55.821
#> 2 IBM 487 0 62.352 50.028
#> 3 Intel 487 0 16.824 14.495
Interpret this call to pivot()
as
Select any two of the three possibilities for multiple parameter values: Multiple compute functions, multiple variables over which to compute, and multiple categorical variables by which to define groups for aggregation.
Generate color scales with getColors()
. The default
output of getColors()
is a color spectrum of 12
hcl
colors presented in the order in which they are
assigned to discrete levels of a categorical variable. For clarity in
the following function call, the default value of the pal
or palette parameter is explicitly set to its name,
"hues"
.
#>
#> h hex r g b
#> -------------------------------
#> 1 240 #4398D0 67 152 208
#> 2 60 #B28B2A 178 139 42
#> 3 120 #5FA140 95 161 64
#> 4 0 #D57388 213 115 136
#> 5 275 #9A84D6 154 132 214
#> 6 180 #00A898 0 168 152
#> 7 30 #C97E5B 201 126 91
#> 8 90 #909711 144 151 17
#> 9 210 #00A3BA 0 163 186
#> 10 330 #D26FAF 210 111 175
#> 11 150 #00A76F 0 167 111
#> 12 300 #BD76CB 189 118 203
lessR provides pre-defined sequential color scales
across the range of hues around the color wheel in 30 degree increments:
"reds"
, "rusts"
, "browns"
,
"olives"
, "greens"
, "emeralds"
,
"turqoises"
, "aquas"
, "blues"
,
"purples"
, "biolets"
, "magentas"
,
and "grays"
.
#>
#> h hex r g b
#> -------------------------------
#> 1 240 #CCECFFFF 204 236 255
#> 2 240 #B4D8FCFF 180 216 252
#> 3 240 #9DC5EBFF 157 197 235
#> 4 240 #84B2DBFF 132 178 219
#> 5 240 #6B9FCCFF 107 159 204
#> 6 240 #4F8DBCFF 79 141 188
#> 7 240 #2D7CAEFF 45 124 174
#> 8 240 #006BA0FF 0 107 160
#> 9 240 #005B93FF 0 91 147
#> 10 240 #004C8AFF 0 76 138
#> 11 240 #004087FF 0 64 135
#> 12 240 #0040A9FF 0 64 169
To create a divergent color palette, specify beginning and an ending
color palettes, which provide values for the parameters pal
and end_pal
, where pal
abbreviates palette.
Here, generate colors from rust to blue.
#>
#> color r g b
#> ----------------------
#> #70370FFF 112 55 15
#> #7D4A32FF 125 74 50
#> #8B5F4DFF 139 95 77
#> #997568FF 153 117 104
#> #A88E86FF 168 142 134
#> #B9ADAAFF 185 173 170
#> #AAB0B8FF 170 176 184
#> #8595A7FF 133 149 167
#> #658099FF 101 128 153
#> #466D8DFF 70 109 141
#> #1D5C83FF 29 92 131
#> #004D7AFF 0 77 122
lessR provides several utility functions for recoding, reshaping, and rescaling data.