Help for package datana

Title:

Datasets and Functions to Accompany Analisis De Datos Con R

Version:

1.1.1

Description:

Datasets and functions to accompany the book 'Analisis de datos con el programa estadistico R: una introduccion aplicada' by Salas-Eljatib (2021, ISBN: 9789566086109). The package helps carry out data management, exploratory analyses, and model fitting.

License:

GPL (≥ 3)

Encoding:

UTF-8

RoxygenNote:

7.3.2

Depends:

R (≥ 3.5)

LazyData:

true

LazyDataCompression:

Imports:

ggplot2, graphics, Hmisc, methods, scales, stats, utils

Suggests:

lattice, testthat (≥ 3.0.0)

Config/testthat/edition:

NeedsCompilation:

Packaged:

2025-07-21 21:21:46 UTC; christian

Author:

Christian Salas-Eljatib

[aut, cre], Campos Nicolás

[ctb], Pino Nicolas [ctb] (up to 2020), Riquelme Joaquin [ctb] (up to 2020)

Maintainer:

Christian Salas-Eljatib <cseljatib@gmail.com>

Repository:

CRAN

Date/Publication:

2025-07-21 22:40:02 UTC

datana: Datasets and Functions to Accompany Analisis De Datos Con R

Description

Author(s)

Maintainer: Christian Salas-Eljatib cseljatib@gmail.com (ORCID)

Other contributors:

Campos Nicolás (ORCID) [contributor]
Pino Nicolas (up to 2020) [contributor]
Riquelme Joaquin (up to 2020) [contributor]

About the R-Squared statistics: the Anscombe quartet dataset

Description

A dataset that contains four pairs of columns with the same descriptive statistics; however, there is a difference when representing the points through a graph.

Usage

data(aboutrsq)

Format

The data frame contains four variables as follows:

X1: Integers values that represent X-axis for Y1, Y2 and Y3 column
Y1: Float values that represent Y-axis for X1 column
Y2: Float values that represent Y-axis for X1 column
Y3: Float values that represent Y-axis for X1 column
X2: Integers values that represent X-axis for Y4 column
Y4: Float values that represent Y-axis for X2 column

Source

Data were assembled by Dr Christian Salas-Eljatib (Santiago, Chile).

References

Anscombe FJ. 1973. Graphs in statistical analysis. The American Statistician 27:17-21. doi:10.2307/2682899

Examples

data(aboutrsq)    
head(aboutrsq)

Sobre el estadístico R2: los datos del cuarteto de Anscombe

Description

Dataset que contiene cuatro pares de columnas con la mismos estadísticos descriptivos, sin embargo, si existe diferencia al representar los puntos mediante un gráfico.

Usage

data(aboutrsq2)

Format

Variables se describen a continuación::

X1: Valores enteros que representan el eje X para las columnas Y1, Y2 e Y3
Y1: Valores flotantes que representan el eje Y para la columna X1
Y2: Valores flotantes que representan el eje Y para la columna X1
Y3: Valores flotantes que representan el eje Y para la columna X1
X2: Valores enteros que representan el eje X para las columnas Y4
Y4: Valores flotantes que representan el eje Y para la columna X2

Source

Datos fueron contribuidos por el Prof. Christian Salas-Eljatib (Universidad de Chile, Santiago, Chile).

References

Anscombe FJ. 1973. Graphs in statistical analysis. The American Statistician 27:17-21. doi:10.2307/2682899

Examples

data(aboutrsq2)    
head(aboutrsq2)

Airquality data in New York city.

Description

Daily air quality measurements in New York, May to September 1973.

Usage

data(airnyc)

Format

Contains 6 variables, as follows:

ozone: numeric Ozone (ppb).
solar: numeric Solar R (lang).
wind: numeric Wind (mph).
temp: numeric Temperature (degrees F).
month: numeric Month (1–12).
day: numeric Day of month (1–31).

Source

The data were obtained from the library datasets.

References

Chambers J, Cleveland W, Kleiner B, Tukey P. 1983. Graphical Methods for Data Analysis. Belmont. CA: Wadsworth.

Examples

data(airnyc)    
head(airnyc)

Calidad del aire en la ciudad de Nueva York.

Description

Calidad del aire diario medido en New York, de Mayo a Septiembre de 1973.

Usage

data(airnyc2)

Format

Contiene 6 variables:

ozone: Ozono (ppb).
solar: Solar R (largo).
wind: Viento (mph).
temp: Temperatura (grados F).
month: Mes del año (1–12).
day: Dia del mes (1–31).

Source

Los datos fueron obtenidos desde la librería 'datasets'.

References

Chambers J, Cleveland W, Kleiner B, Tukey P. 1983. Graphical Methods for Data Analysis. Belmont. CA: Wadsworth.

Examples

data(airnyc2)    
head(airnyc2)

Time series of annual precipitations in cities of Chile.

Description

Data contains annual precipitations in six cities in Chile (Santiago, Talca, Chillán, Temuco, Valdivia, and Puerto Montt) at different years.

Usage

data(annualppCities)

Format

The dataframe contains three variables as follows:

city: Name of city.
year: Year of registry.
annual: Value of the annual precipitation of a given year (mm).

Source

The data were obtained from https://explorador.cr2.cl/.

Examples

data(annualppCities)    
head(annualppCities)

Serie de tiempo de precipitaciones anuales en Chile.

Description

Data contains annual precipitations in six cities in Chile (Santiago, Talca, Chillan, Temuco, Valdivia, and Puerto Montt) at different years.

Usage

data(annualppCities2)

Format

The dataframe contains three variables as follows:

ciudad: Name of city.
anho: Year of registry.
pp.anual: Value of the annual precipitation of a given year (mm).

Source

Los datos fueron obtenidos desde https://explorador.cr2.cl/.

Examples

data(annualppCities2)    
head(annualppCities2)

Age and physical measurement data for wild bears

Description

Wild bears were anesthetized, and their bodies were measured and weighed. One goal of the study was to make a table (or perhaps a set of tables) for people interested in estimating the weight of a bear based on other measurements. Notice that there are missing values for some of the variables.

Usage

data(bears)

Format

Contains individual-level variables, as follows:

id: Bear id
age: Age in total number of months.
month: Month number within a given year.
sex: 1 =male, 2 = female.
headL: Length of head, in cm.
headW: Width of head, in cm.
neckG: Girth of neck, in cm.
length: Body length, in cm.
chestG: Girth of chest, in cm.
weight: body weight, in kg.
obs: Temporal observation number for bear.
name: Name given to bear.

Source

According to Prof. Timothy Gregoire at Yale University (New Haven, CT, USA), the data set was supplied by Gary Alt.

References

Entertaining references are in Reader's Digest April, 1979, and Sports Afield September, 1981.

Examples

data(bears)    
head(bears) 
table(bears$sex)
boxplot(headL~sex, data=bears)

Edad y características biométricas de osos salvajes

Description

Los osos salvajes fueron anestesiados y sus cuerpos medidos. Uno de los objetivos del estudio fue hacer una tabla (o quizas un conjunto de tablas) para las personas interesadas en estimar el peso de un oso basandose en otras medidas. Observe que faltan valores para algunas de las variables.

Usage

data(bears2)

Format

Contiene variables de nivel individual, como se describen a continuación:

id: Identificador del oso.
edad: edad en meses
mes: identificador del mes,dentro del año.
sexo: 1 = macho, 2 = hembra
cabezaL: longitud de la cabeza, en cm
cabezaA: ancho de la cabeza, en cm
cuelloP: circunferencia del cuello, en cm
largo: longitud del cuerpo, en cm
pechoG: circunferencia del pecho, en cm
peso: peso corporal, en kg
obs: número de observación temporal para el oso
nombre: nombre dado al oso

Source

Segun el Prof. Timothy Gregoire de Yale University (New Haven, CT, USA), los datos fueron cedidos por Gary Alt. Minitab, Inc. La descripcion de los datos fue dada por él.

References

Algunas referencias generales estan en el Reader's Digest de Abril, 1979, y Sports Afield de Septiembre, 1981.

Examples

data(bears2)    
head(bears2) 
table(bears2$sexo)
boxplot(cabezaL~sexo, data=bears2)

Age and physical measurement data for wild bears (without missing values)

Description

Usage

data(bearsdepu)

Format

Individual-level variables, as follows:

id: Bear identificator.
age: Age in total number of months.
month: Month number within a given year.
sex: Sex code: 1 =male, 2 = female.
headL: Length of head, in cm.
headW: Width of head, in cm.
neckG: Girth of neck, in cm.
length: Body length, in cm.
chestG: Girth of chest, in cm.
weight: Body weight, in kg.
obs: Temporal observation number for bear.
name: name given to bear

Source

According to Prof. Timothy Gregoire at Yale University (New Haven, CT, USA), the data set was supplied by Gary Alt.

References

Entertaining references are in Reader's Digest April, 1979, and Sports Afield September, 1981.

Examples

data(bearsdepu)    
head(bearsdepu)
table(bearsdepu$sex)
boxplot(headL~sex, data=bearsdepu)

Edad y características biométricas de osos salvajes (sin datos faltantes)

Description

Los osos salvajes fueron anestesiados y sus cuerpos medidos. Uno de los objetivos del estudio fue hacer una tabla (o quizas un conjunto de tablas) para las personas interesadas en estimar el peso de un oso basandose en otras medidas. Esta dataframe es igual que "bears" pero sin valores perdidos.

Usage

data(bearsdepu2)

Format

Contiene variables de nivel individual, como se describen a continuacion:

id: Identificador del oso.
edad: edad en meses.
mes: identificador del mes,dentro del año.
sexo: 1 = macho, 2 = hembra.
cabezaL: longitud de la cabeza, en cm.
cabezaA: ancho de la cabeza, en cm.
cuelloP: circunferencia del cuello, en cm.
largo: longitud del cuerpo, en cm.
pechoG: circunferencia del pecho, en cm.
peso: peso corporal, en kg.
obs: número de observación temporal para el oso.
nombre: nombre dado al oso.

Source

Segun el Prof. Timothy Gregoire de Yale University (New Haven, CT, USA), los datos fueron cedidos por Gary Alt. Minitab, Inc. La descripcion de los datos fue dada por él.

References

Algunas referencias generales estan en el Reader's Digest de Abril, 1979, y Sports Afield de Septiembre, 1981.

Examples

data(bearsdepu2)    
head(bearsdepu2)
table(bearsdepu2$sexo)
boxplot(cabezaL~sexo, data=bearsdepu2)

Population density growth of beetles

Description

Temporal measurements of density of beetles (Tribolium confusum) growing in different controlled environments.

Usage

beetles

Format

days: Number of days.
diet: The quantities of flour (in grams) of the environments where the beetles were growing. Six levels of the factor diet.
type: The various stage of beetles, i.e., eggs, larvae, pupae, and adults.
density: The number of insects per environment.

Source

Data from Table No. 1, page 116, of Chapman (1928). Series of experiments under controlled conditions in which flour beetles (Tribolium confusum) are kept in environments of known size. The period from egg to adult is approximately forty days at 27C degrees. The data were entered by Miss Yamara Arancibia, a former student of Prof. Christian Salas-Eljatib.

References

Chapman RN. 1928. The quantitative analysis of environmental factors. Ecology 9(2):111-122. doi:10.2307/1929348

Examples

data(beetles)
table(beetles$type)
name.diet<-unique(beetles$diet)
num.diet<-length(name.diet)
##Time series plot
#first, some computation
alys<-with(beetles,tapply(density,list(as.factor(days),as.factor(diet)),sum))
out<-as.data.frame(alys)
out$time<-row.names(out)
head(out)
#Figure 1 of the paper
matplot(out[,"time"], out[,1:num.diet], las=1, type=c("b"),pch=1,
        xlab="Time in days",ylab="Total individuals")
legend("topleft", legend = name.diet, title = "Diet (gr)",
       col = 1:6, lty = 1:6, pch = 1)

Crecimiento poblacional de escarabajos

Description

Mediciones temporales de densidad de escarabajos (Tribolium confusum) creciendo en diferentes ambientes controlados.

Usage

beetles2

Format

dias: Número de días.
dieta: La cantidad de harina (en gramos) de ambientes donde crecen los escarabajos. Seis niveles del factor dieta.
tipo: Estados de desarrollo de los escarabajos, i.e., huevos, larvas, pupas, y adultos.
densidad: Número total de individuos por ambiente de crecimiento.

Source

Datos del Cuadro No. 1, page 116, de Chapman (1928). Serie de experimentos bajo condiciones controladas donde escarabajos (Tribolium confusum) se mantienen en ambientes de tamaño conocido. El periodo desde huevo a adulto es de aproximadamente de cuarenta días a 27 grados Celsius. Los datos fueron digitados por la Srta. Yamara Arancibia, una estudiante del Prof. Christian Salas-Eljatib.

References

Chapman RN. 1928. The quantitative analysis of environmental factors. Ecology 9(2):111-122. doi:10.2307/1929348

Examples

data(beetles2)    
table(beetles2$tipo)
nom.dieta<-unique(beetles2$dieta)
num.dieta<-length(nom.dieta)
##Grafico de serie de tiempo
#primero algunos calculos
alys<-with(beetles2,tapply(
          densidad,list(as.factor(dias),as.factor(dieta)),sum)
          )
out<-as.data.frame(alys)
out$tiempo<-row.names(out)
head(out)
##Figura 1 del paper
matplot(out[,"tiempo"], out[,1:num.dieta], las=1, type=c("b"),pch=1,
        xlab="Tiempo en dias",ylab="Densidad de individuos")
legend("topleft", legend = nom.dieta, title = "Dieta (gr)",
       col = 1:6, lty = 1:6, pch = 1)

Camera trap data on mammals in Ruaha National Park, southern Tanzania.

Description

Dataset contains 14604 observations and sampling was carried out for two months during the dry season of 2013 and two months during the wet season of 2014. Each camera station is associated with a randomly placed camera and a trail-based camer, with the aim of comparing communities resulting from the two camera trap placement strategies.

Usage

data(cameratrap)

Format

Contains 6 variables, as follows:

reference: Number of observation od datasets.
placement: Type of "placement" placed in each station (random or trail).
season: Season where were made the samplings.
station: Station where were collected the data.
specie: Name of specie medium to large terrestrial mammals.
date.time: The date and time of each photographic event is also given.

Source

The data were provided by Dr Jeremy Cusack.

References

Cusack J, Dickman A, Rowcliffe M, Carbone C, Macdonald D, Coulson T. 2016. Random versus game trail-based camera trap placement strategy for monitoring terrestrial mammal communities. PLoS ONE 10(5): e0126373.

Examples

data(cameratrap)    
head(cameratrap)

Camaras trampa de mamiferos en el parque nacional Ruaha, en el sur de Tanzania

Description

Contains information of Camera trap data on medium to large terrestrial mammals collected at 54 camera stations in Ruaha National Park, southern Tanzania. Dataset contains 14604 observations and sampling was carried out for two months during the dry season of 2013 and two months during the wet season of 2014. Each camera station is associated with a randomly placed camera and a trail-based camer, with the aim of comparing communities resulting from the two camera trap placement strategies.

Usage

data(cameratrap2)

Format

Contiene 6 variables, como sigue:

referencia: Number of observation od datasets.
posicion: Type of "placement" placed in each station (random or trail).
temporada: Season where were made the samplings.
estacion: Station where were collected the data.
especie: Name of specie medium to large terrestrial mammals.
fecha.hora: The date and time of each photographic event is also given.

Source

Los datos fueron cedidos por el Dr Jeremy Cusack.

References

Cusack J, Dickman A, Rowcliffe M, Carbone C, Macdonald D, Coulson T.

Random versus game trail-based camera trap placement strategy for monitoring terrestrial mammal communities. PLoS ONE 10(5): e0126373.

Examples

data(cameratrap2)    
head(cameratrap2)

Driver status after car accidents in Greece.

Description

A data frame showing the use of seat belt and the driver status after a car accident in Greece.

Usage

data(carAccidents)

Format

Contains the factor variables:

record: factor representing the driver status.
seatBelt: factor indicating whether the driver wore a setbelt.

Source

R package 'gginference'

Examples

data(carAccidents)    
head(carAccidents)
table(carAccidents)

Caribou survival

Description

Caribou survival

Usage

caribou

Format

Data frame con 91 filas y 3 columnas:

herd: Herd identifier.
wolf.density: Wolf density of the herd as wolf / 100 km².
alive: Caribou survival, 1 survives, 0 don't survive.

Examples

data(caribou)
table(caribou$alive, caribou$herd)

Sobrevivencia de caribú

Description

Sobrevivencia de caribú

Usage

caribou2

Format

Data frame con 91 filas y 3 columnas:

herd: Identificador de la manada.
wolf.density: Densidad de lobos, en número de lobos / 100 km².
alive: Sobrevivencia de un caribú, 1 sobrevive, 0 no sobrevive.

Examples

data(caribou2)
table(caribou2$alive, caribou2$herd)

Datos encuesta CASEN del 2022

Description

Encuesta de Caracterización Socioeconómica Nacional (CASEN) de Chile, es realizada por el Ministerio de Desarrollo Social y Familia con el objetivo de disponer de información que permita conocer situación de los hogares y de la población. Estos datos corresponden a los de la encuesta CASEN 2022.

Usage

data(casen)

Format

Este set de datos contiene las siguientes columnas:

id.vivienda: Identificador de la vivienda.
id.persona: Identificador de la persona.
region: Región administrativa de Chile.
comuna: Comuna.
edad: Edad de la persona, en años.
sexo: Sexo de la persona.
esc: Años de escolaridad (edad >= 15).
educ: Clasificación de educación recibida.
personas.hogar: Número de personas que habitan en el hogar.
tipohogar: Nivel de tipo de hogar según encuesta.
activ: Nivel de actividad actual de la persona según encuesta.
ytot: Ingreso total.
ytoth: Ingreso total del hogar.
ypch: Ingreso total per cápita del hogar.
ytotcor: Ingreso total corregido.
ytotcorh: Ingreso total corregido del hogar.
ypc: Ingreso total corregido per cápita del hogar.
mayor.nivel.edu: ¿Cuál es el nivel educacional al que asiste o el más alto al cual asistió?
area.edu.cinef: Clasificación Internacional Normalizada de Educación (CINE-F).
subarea.edu.cinef: Clasificación Internacional Normalizada de Sub-Area de Educación (CINE-F).
previ.salud: Sistema de previsión de salud.

Source

Los datos fueron obtenidos desde el web https://observatorio.ministeriodesarrollosocial.gob.cl/encuesta-casen. Note que solo algunas columnas son utilizadas aca, así como el nombre de algunas columnas fueron levemente cambiados.

Examples

data(casen)    
head(casen) 
table(casen$region)
table(casen$region,casen$sexo)
tapply(casen$ytotcor,casen$sexo,sum)

Function to compute the cumulative distribution of a variable

Description

Builds the cumulative distribution of a vector, using a step% of the data as fixed-intervals.

Usage

cdf(y = y, step = 0.05)

Arguments

y

a vector of a random variable

step

a numeric proportion of the data used as increment interval for building the cdf of the random variable. The default value for 'step' is 0.05, representing a 5%.

Details

By default the cumulative distribution is build using 5% of the data as intervals, that is to say, from 0.05 (i.e., 5%) to 0.95 (i.e., 95%).

Value

returns a dataframe having two columns: the first contains the random variable values and the second the cumulative distribution for the variable.

Author(s)

Christian Salas-Eljatib

References

Salas-Eljatib, C. 2021. Análisis de datos con el programa estadístico R: una introducción aplicada. Ediciones Universidad Mayor, Santiago, Chile. 170 p. https://eljatib.com/rlibro

Examples

y.var <- rnorm(10)
cdf(y.var)
cdf(y.var, step=0.1)

Chicken growth data.

Description

The body weights of the chicks were measured at birth and every second day thereafter until day 20. They were also measured on day 21. There were four groups on chicks on different protein diets.

Usage

data(chicksw)

Format

Contains four variables, as follows:

chick: An ordered factor with levels different giving a unique identifier for the chick. The ordering of the levels groups chicks on the same diet together and orders them according to their final weight (lightest to heaviest) within diet.
diet: A factor with levels 1,2,3 and 4 indicating which experimental diet the chick received.
time: A numeric vector giving the number of days since birth when the measurement was made.
weight: A numeric vector giving the body weight of the chick (gm).

Source

The data were obtained from the alr4 library.

References

Crowder M, Hand D. 1990. Analysis of Repeated Measures. Chapman and Hall

Examples

data(chicksw)    
head(chicksw)

Crecimiento de pollos.

Description

El peso de pollos fueron medidos al momento de nacer y cada dia por medio hasta el dia 20. Ellos también fueron medidos el día 21. Hubo cuatro grupos de pollos en diferentes dietas de proteinas.

Usage

data(chicksw2)

Format

Contine cuatro variables, como sigue:

pollo: Un identificador único para cada pollo. La numeracion esta ordenado segun el peso final dentro de cada dieta.
dieta: Un factor con cuatro nivels: 1,2,3 y 4 indicando que dieta recibió el pollo.
tiempo: Número de días desde el nacimiento.
peso: Peso del pollo (gm).

Source

Los datos fueron obtenidos desde la librería alr4.

References

Crowder M, Hand D. 1990. Analysis of Repeated Measures. Chapman and Hall

Examples

data(chicksw2)    
head(chicksw2)

CO2 emissions and temperature at country-level.

Description

Data obtained from the hockeystick package, which retrieves annual global carbon dioxide emissions since 1750 from the World Data repository https://github.com/owid/co2-data, as well as other climate-related variables.

Usage

data(co2temp)

Format

The data contains 75 variables, and the fully description can be reviewed in the references provided here.

country: Country.
year: Calendar year.
iso_code: TBA.
population: Population size, in number of people.
gdp: Gross domestic product, a measure of the value added created through the production of goods and services in a country.
cement_co2: TBA.
cement_co2_per_capita: TBA.
co2: TBA.
co2_growth_abs: TBA.
co2_growth_prct: TBA.
co2_including_luc: TBA.
co2_including_luc_growth_abs: TBA.
co2_including_luc_growth_prct: TBA.
co2_including_luc_per_capita: TBA.
co2_including_luc_per_gdp: TBA.
co2_including_luc_per_unit_energy: TBA.
co2_per_capita: TBA.
co2_per_gdp: TBA.
co2_per_unit_energy: TBA.
coal_co2: TBA.
coal_co2_per_capita: TBA.
consumption_co2: TBA.
consumption_co2_per_capita: TBA.
consumption_co2_per_gdp: TBA.
cumulative_cement_co2: TBA.
cumulative_co2: TBA.
cumulative_co2_including_luc: TBA.
cumulative_coal_co2: TBA.
cumulative_flaring_co2: TBA.
cumulative_gas_co2: TBA.
cumulative_luc_co2: TBA.
cumulative_oil_co2: TBA.
cumulative_other_co2: TBA.
energy_per_capita: TBA.
energy_per_gdp: TBA.
flaring_co2: TBA.
flaring_co2_per_capita: TBA.
gas_co2: TBA.
gas_co2_per_capita: TBA.
ghg_excluding_lucf_per_capita: TBA.
ghg_per_capita: TBA.
land_use_change_co2: TBA.
land_use_change_co2_per_capita: TBA.
methane: TBA.
methane_per_capita: TBA.
nitrous_oxide: TBA.
nitrous_oxide_per_capita: TBA.
oil_co2: TBA.
oil_co2_per_capita: TBA.
primary_energy_consumption: TBA.
share_global_cement_co2: TBA.
share_global_co2: TBA.
share_global_co2_including_luc: TBA.
share_global_coal_co2: TBA.
share_global_cumulative_cement_co2: TBA.
share_global_cumulative_co2: TBA.
share_global_cumulative_co2_including_luc: TBA.
share_global_cumulative_coal_co2: TBA.
share_global_cumulative_flaring_co2: TBA.
share_global_cumulative_gas_co2: TBA.
share_global_cumulative_luc_co2: TBA.
share_global_cumulative_oil_co2: TBA.
share_global_cumulative_other_co2: TBA.
share_global_flaring_co2: TBA.
share_global_gas_co2: TBA.
share_global_luc_co2: TBA.
share_global_oil_co2: TBA.
share_global_other_co2: TBA.
share_of_temperature_change_from_ghg: TBA.
temperature_change_from_ch4: TBA.
temperature_change_from_co2: TBA.
temperature_change_from_ghg: TBA.
temperature_change_from_n2o: TBA.
total_ghg: TBA.
total_ghg_excluding_lucf: TBA.
trade_co2: TBA.
trade_co2_share: TBA.

Source

The data were obtained from the hockeystick library of R. Notice that in the dataframe only a portion of countries have been kept.

References

https://www.globalcarbonproject.org/carbonbudget/
Friedlingstein P. et al. 2020. Global Carbon Budget 2020, Earth System Science Data 12:3269-3340 doi:10.5194/essd-12-3269-2020

Examples

data(co2temp)
names(co2temp)
table(co2temp$country)
lattice::xyplot(co2~year|country,data=co2temp,type="l",as.table=TRUE)

Function to compute the needed statistics for a given contrast

Description

The function computes the statistics for inference in a given contrast, subject to a given significance level. Those statistics are as follows: estimated contrast, standard error of the contrast, and the confidence interval of the contrast.

Usage

contrast(
  model = model,
  coef.cont = coef.cont,
  grp.m = grp.m,
  grp.n = grp.n,
  alpha = 0.05,
  full = TRUE
)

Arguments

model

object containing the fitted model

coef.cont

vector with the coefficients to establish the contrasts

grp.m

a vector having the sample mean per each group, or level of the factor under study.

grp.n

a vector having the sample size per each group, or level of the factor under study.

alpha

is the significance level for building the confidence intervals. Default value is 0.05, which is 95% confidence level.

full

FALSE if want short output, TRUE for longer (i.e. more details). Default is TRUE.

Details

The contrast is established based upon an already fitted statistical model that describe the relationship among variables. The significance level ('alpha') is defined by the user, although by default has been set to 0.05, that is to say, a 95% of statistical confidence.

Value

This function returns the above described statistics for a given contrast.

Author(s)

Christian Salas-Eljatib

References

Salas-Eljatib C. 2025. datana: Datasets and Functions to Accompany Análisis de Datos con R. R package version 1.0.7, doi:10.32614/CRAN.package.datana, https://CRAN.R-project.org/package=datana

Examples


data(fertiliza)
table(fertiliza$treat)
means.trt <- tapply(fertiliza$volume,fertiliza$treat,mean);means.trt
sds.trt <- tapply(fertiliza$volume,fertiliza$treat,sd);sds.trt
ns.trt <- tapply(fertiliza$volume,fertiliza$treat,length);ns.trt
m1 <- lm(volume ~ treat, data=fertiliza)
anova(m1)
## Coefficients to be used in the contrast
#c1: (tmoA1-A2) - (tmoA3-A4)
C1.coeff <- c(0,1,1,-1,-1)
contrast(model=m1,C1.coeff,grp.m=means.trt,grp.n=ns.trt,alpha=0.1,full=TRUE)
contrast(model=m1,C1.coeff,grp.m=means.trt,grp.n=ns.trt,alpha=0.1,full=FALSE)
contrast(m1,C1.coeff,grp.m=means.trt,grp.n=ns.trt,alpha=0.05,full=TRUE)
contrast(m1,C1.coeff,grp.m=means.trt,grp.n=ns.trt)

Tree-level cork biomass data for Oak trees in Portugal

Description

Measurements of cork weight in Quercus suber (Oak) trees in Portugal.

Usage

corkoak

Format

tree: A correlative number for each sample tree.
csc: is tree circumference at 1.3 m outside bark, in cm.
cbc: is tree circumference at 1.3 m under bark, in cm.
bt: bark thickness, in cm.
hdeb: is debarking height, in m.
hblc: height to base of live crown, in m.
nb: number of branches debarked
cr.diam: crown diameter, in m.
w: total green weight of the stripped cork, in kg
stratum: Stratum

Source

Data supplied electronically to Prof. Timothy Gregoire (Yale University) by authors accompanied by a note which said "After the article was published we discovered a problem with 2 of the observations so Teresa and I decided it was best just to delete them."

References

Fonseca TJ, Parresol BR. 2001. A new model for cork weight estimation in northern Portugal with methodology for construction of confidence intervals. Forest Ecology and Management 152(1):131–139.

Examples

data(corkoak)    
head(corkoak)

Datos de biomasa de corcho en árboles de Encino en Portugal

Description

Mediciones de peso de corcho en árboles muestra de Quercus suber en Portugal.

Usage

corkoak2

Format

arbol: A correlative number for each sample tree.
perimetro.cc: is tree circumference at 1.3 m outside bark, in cm.
perimetro.sc: is tree circumference at 1.3 m under bark, in cm.
e.corteza: bark thickness, in cm.
h.desc: is debarking height, in m.
hcc: height to base of live crown, in m.
num.ram: number of branches debarked
diam.copa: crown diameter, in m.
biomasa: total green weight of the stripped cork, in kg
estrato: Estrato

Source

Datos cedidos por Prof. Timothy Gregoire (Yale University) y los autores originales mencionaron "After the article was published we discovered a problem with 2 of the observations so Teresa and I decided it was best just to delete them."

References

Fonseca TJ, Parresol BR. 2001. A new model for cork weight estimation in northern Portugal with methodology for construction of confidence intervals. Forest Ecology and Management 152(1):131–139.

Examples

data(corkoak2)    
head(corkoak2)

Deletes the first n-characters of a string

Description

Function to delete the last n-characters of a string from the left-hand side.

Usage

deleteLeft(fac, n)

Arguments

fac

is an object of class string or factor

n

is the number of characters to be deleted of a the string given in 'fac'.

Details

It is specially set to arrange data vector having alphanumeric format.

Value

This function returns an object having n-less characters from the left-hand side.

Author(s)

Christian Salas-Eljatib

References

Salas-Eljatib, C. 2021. Análisis de datos con el programa estadístico R: una introducción aplicada. Ediciones Universidad Mayor, Santiago, Chile. 170 p. https://eljatib.com/rlibro

Examples


plot.id <- c("BNE1","BNE2","PLE1")
deleteLeft(plot.id,1)
deleteLeft(plot.id,2)
deleteLeft(plot.id,3)

Deletes the last n-characters of a string

Description

Function to delete the last n-characters of a string from the right-hand side.

Usage

deleteRight(fac, n)

Arguments

fac

is an object of class string or factor

n

is the number of characters to be deleted of a the string given in 'fac'.

Details

It is specially set to arrange data vector having alphanumeric format.

Value

This function returns an object having n-less characters from the right-hand side.

Author(s)

Christian Salas-Eljatib

References

Salas-Eljatib, C. 2021. Análisis de datos con el programa estadístico R: una introducción aplicada. Ediciones Universidad Mayor, Santiago, Chile. 170 p. https://eljatib.com/rlibro

Examples


last.names.id <- c("Stage-1924","Gregoire-1958","Robinson-1967")
deleteRight(last.names.id,5)
deleteRight(last.names.id,4)

Creates a descriptive statistics table for continuous variables.

Description

Function to create a descriptive statistics table for continuous variables from a dataframe.

Usage

descstat(data = data, decnum = 3, eng = TRUE, full = FALSE)

Arguments

data

a dataframe containing numeric variables as columns.

decnum

the number of decimals to be used in the output. The default is set to 3.

eng

logical; if "TRUE" (by default), the language of the statistics will be in English; if "FALSE" will be in Spanish. descriptive statistics. The default is to "FALSE".

full

logical; if "TRUE", the output includes some extra descriptive statistics. The default is to "FALSE".

Details

The resulting table offers the main central and dispersion statistics.

Value

This function wraps descriptive statistics into a summarize table having the following statistics: sample size, minimum, maximum, mean, median, SD, and coefficient of variation. If the "full" option is set to "TRUE", the following statistics will be added to the table: 25th and 75th percentiles, the interquartile range, skewness, and kurtosis.

Author(s)

Christian Salas-Eljatib and Tomas Cayul.

References

Salas-Eljatib C. 2021. Análisis de datos con el programa estadístico R: una introducción aplicada. Ediciones Universidad Mayor. Santiago, Chile. https://eljatib.com

Examples


df <- datana::idahohd
head(df)
df.h<-df[,c("dbh","height")]
## using the function
descstat(data=df.h)
descstat(data=df.h,decnum=1,eng=FALSE)
descstat(df.h,2)

Presidential election data of Florida (USA) in 2000.

Description

County-by-county vote for president in Florida in 2000 for Bush, Gore and Buchanan.

Usage

data(election)

Format

Contains three variables, as follows:

gore: Vote for Gore.
bush: Vote for Bush.
buchanan: Vote for Pat Buchanan.

Source

The data were obtained from the alr4 library.

References

Weisberg S. 2014. Applied Linear Regression. 4th edition. Hoboken NJ: Wiley

Examples

data(election)    
head(election)

Elección presidencial en el estado de Florida (USA) en el 2000.

Description

Conteo de votos a nivel de condado en el estado de Florida, año 2000.

Usage

data(election2)

Format

Contiene las siguientes tres columnas:

gore: Votos para Gore. Número de votos para Al Gore.
bush: Votos para Bush. Número de votos para George W. Bush.
buchanan: Votos para Buchaman. Número de votos para Pat Buchanan.

Source

Los datos se obtuvieron desde el paquete alr4 de R.

References

Weisberg S. 2014. Applied Linear Regression. 4th edition. Hoboken NJ: Wiley

Examples

data(election2)    
head(election2)

Puntaje ENDFID 2021 por carrera

Description

Puntaje promedio por carrera de la Evaluación Nacional Diagnóstica de la Formación Inicial Docente (ENDFID), enfocado en matemática. Se tienen 79 observaciones. Se incluyen dos variables binarias: cuech (pertenece 1 o no 0 al CUECH) y pace (tiene cupos PACE 1 o no 0).

Usage

data(endfid2)

Format

Variables se describen a continuación:

programa: Nombre de la carrera dictada
universidad: Universidad correspondiente al programa
zona: Ubicación de la sede de la carrera
region: Región de la sede de la carrera
tipo.programa: Tipo de carrera (1 Ped. En Matemáticas, 2 Enseñanza General Básica, 3 Programa formación pedagógica)
cuech: Universidad pertenece al Consejo de Universidades del Estado (1 si, 0 no)
pace: Carrera incluye cupos PACE (1 si, 0 no)
end.pcpg: Puntaje promedio de la carrera en la Prueba de Conocimientos Pedagógicos Generales
end.pcdd: Puntaje promedio de la carrera en la Prueba de Conocimientos Disciplinarios y Didácticos
matricula: Cantidad de estudiantes matriculados en la carrera el 2022

Source

Datos obtenidos desde el Centro de Perfeccionamiento, Experimentación e Investigaciones Pedagógicas (CPEIP) del Mineduc y desde los sitios web respectivo de cada universidad. Los datos fueron digitados por Diego Fernández, estudiante del Prof. Christian Salas-Eljatib.

Examples

data(endfid2)
head(endfid2)

Leaf measurements for Eucalyptus nitens trees in Tasmania, Australia.

Description

The length, width, and area of Eucalyptus nitens leaves were measured.

Usage

data(eucaleaf)

Format

Contains leaf-level variables, as follows:

time: Time factor, in two levels: early or Late.
tree: Sample tree code identificator.
shoot: Shoot description factor, in three levels.
l: Length of the leaf, in mm.
w: Width of the leaf, in mm.
la: leaf area, in cm^{2}.

Source

Although the original source of the measurements is the Dissertation of Dr Candy (1999), the data file used here was courtesy of Prof. Timothy Gregoire at Yale University (New Haven, CT, USA). Furthermore, these data were used by Gregoire and Salas (2009).

References

Candy SG. 1999. Predictive models for integrated pest management of the leaf beetle Chrysophtharta bimaculata in Eucalyptus nitens in Tasmania. Doctoral dissertation, University of Tasmania, Hobart, Australia.

Gregoire TG, and Salas C. 2009. Ratio estimation with measurement error in the auxiliary variate. Biometrics 65(2):590-598 doi:10.1111/j.1541-0420.2008.01110.x

Examples

data(eucaleaf)    
head(eucaleaf)

Mediciones foliares para árboles de Eucalyptus nitens en Tasmania, Australia.

Description

Mediciones de largo, ancho y area de hojas de Eucalyptus nitens.

Usage

data(eucaleaf2)

Format

Contiene variables a nivel de hoja, como sigue:

tiempo: Factor a dos niveles: Temprano o Tardío.
arbol: Identificador del árbol muestra.
meristema: Factor de la descripción del meristema, en tres niveles.
largo: Largo de la hoja, en mm.
ancho: Ancho de la hoja, en mm.
area: Área foliar, en cm^{2}.

Source

Aunque la fuente original de estas mediciones proviene de la tesis del Dr. Candy (1999), el archivo de datos fue cortesía del Prof. Timothy Gregoire de Yale University (New Haven, CT, USA). Además, estos datos fueron ocupados en el estudio de Gregoire y Salas (2009).

References

Candy SG. 1999. Predictive models for integrated pest management of the leaf beetle Chrysophtharta bimaculata in Eucalyptus nitens in Tasmania. Doctoral dissertation, University of Tasmania, Hobart, Australia.

Gregoire TG, and Salas C. 2009. Ratio estimation with measurement error in the auxiliary variate. Biometrics 65(2):590-598 doi:10.1111/j.1541-0420.2008.01110.x

Examples

data(eucaleaf2)    
head(eucaleaf2)

Leaf measurements (all, n=744) for Eucalyptus nitens trees in Tasmania, Australia.

Description

The length, width, and area of Eucalyptus nitens leaves were measured for all the samples of Candy (1999).

Usage

data(eucaleafAll)

Format

Contains leaf-level variables, as follows:

time: Time factor, in two levels: early or Late.
tree: Sample tree code identificator.
shoot: Shoot description factor, in three levels.
l: Length of the leaf, in mm.
w: Width of the leaf, in mm.
la: leaf area, in cm^{2}.

Source

References

Candy SG. 1999. Predictive models for integrated pest management of the leaf beetle Chrysophtharta bimaculata in Eucalyptus nitens in Tasmania. Doctoral dissertation, University of Tasmania, Hobart, Australia.

Examples

data(eucaleafAll)    
head(eucaleafAll)

Mediciones foliares (todas, n=744) para árboles de Eucalyptus nitens en Tasmania, Australia.

Description

Mediciones de largo, ancho y área de hojas de Eucalyptus nitens para toda la muestra de Candy (1999).

Usage

data(eucaleafAll2)

Format

Contiene variables a nivel de hoja, como sigue:

tiempo: Factor a dos niveles: Temprano o Tardío
arbol: Identificador del árbol muestra
meristema: Factor de la descripción del meristema, en tres niveles.
largo: Largo de la hoja, en mm
ancho: Ancho de la hoja, en mm
area: Área foliar, en cm^{2}

Source

Aunque la fuente original de estas mediciones proviene de la tesis del Dr. Candy (1999), el archivo de datos fue cortesía del Prof. Timothy Gregoire de Yale University (New Haven, CT, USA).

References

Candy SG. 1999. Predictive models for integrated pest management of the leaf beetle Chrysophtharta bimaculata in Eucalyptus nitens in Tasmania. Doctoral dissertation, University of Tasmania, Hobart, Australia.

Examples

data(eucaleafAll2)    
head(eucaleafAll2)

Extracts the last n-characters of a string

Description

Function to extract the first n-characters of a string from the left-hand side.

Usage

extractLeft(fac, n)

Arguments

fac

is an object of class string or factor

n

is the number of characters to be deleted of a the string given in 'fac'.

Details

It is specially set to arrange data vector having alphanumeric format.

Value

This function returns an object having the first n-characters from the left-hand side.

Author(s)

Christian Salas-Eljatib

References

Salas-Eljatib, C. 2021. Análisis de datos con el programa estadístico R: una introducción aplicada. Ediciones Universidad Mayor, Santiago, Chile. 170 p. https://eljatib.com/rlibro

Examples


plot.id <- c("BNE1","BNE2","PLE1")
extractLeft(plot.id,1)
extractLeft(plot.id,2)
extractLeft(plot.id,3)

Extracts the last n-characters of a string

Description

Function to extract the last n-characters of a string from the right-hand side.

Usage

extractRight(fac, n)

Arguments

fac

is an object of class string or factor

n

is the number of characters to be deleted of a the string given in 'fac'.

Details

It is specially set to arrange data vector having alphanumeric format.

Value

This function returns an object having the last n characters from the right-hand side.

Author(s)

Christian Salas-Eljatib

References

Salas-Eljatib, C. 2021. Análisis de datos con el programa estadístico R: una introducción aplicada. Ediciones Universidad Mayor, Santiago, Chile. 170 p. https://eljatib.com/rlibro

Examples


last.names.id <- c("Stage-1924","Gregoire-1958","Robinson-1967")
extractRight(last.names.id,4)
extractRight(last.names.id,2)

Foliar damage by Ozone

Description

Foliar damage by Ozone

Usage

fdamage

Format

Data frame con 52 filas y 2 columnas:

damage: Foliar decoloration, 1 with decoloration, 0 without decoloration.
ozone: Maximum charge of Ozone concentration.

Examples

data(fdamage)
table(fdamage$damage)

Daño foliar por Ozono

Description

Daño foliar por Ozono

Usage

fdamage2

Format

Data frame con 52 filas y 2 columnas:

damage: Decoloración foliar, 1 con decoloración, 0 sin decoloración.
ozone: Máxima carga de concentración de Ozono.

Examples

data(fdamage2)
table(fdamage2$damage)

Fertilization experiment data.

Description

Data contains volume data at plot-level for a fertilization experiment.

Usage

data(fertiliza)

Format

Contains two variables, as follows:

treat: Treatment level.
volume: Plot-level volume, in m^{3}.

Source

The data were provided by Dr Christian Salas-Eljatib (Universidad de Chile, Santiago, Chile).

References

not yet

Examples

data(fertiliza)
head(fertiliza)
class(fertiliza$treat)
unique(fertiliza$treat)
means.g <- tapply(fertiliza$volume,fertiliza$treat,mean);means.g
sds.g <- tapply(fertiliza$volume,fertiliza$treat,sd);sds.g
ns.g <- tapply(fertiliza$volume,fertiliza$treat,length);ns.g

Experimento de fertilización

Description

Datos a nivel de parcela de un experimento de fertilización con tratamientos y replicas.

Usage

data(fertiliza2)

Format

Contiene tres columnas como sigue:

tmo: Tratamiento.Factor medido en diferentes niveles.
vol: Volumen de madera en la parcela experimental, en m^{3}.

Source

Datos cedidos por el Prof. Christian Salas.

References

not yet

Examples

data(fertiliza2)
head(fertiliza2)
class(fertiliza2$tmo)
unique(fertiliza2$tmo)
media.g <- tapply(fertiliza2$vol,fertiliza2$tmo,mean);media.g
desvst.g <- tapply(fertiliza2$vol,fertiliza2$tmo,sd);desvst.g
n.g <- tapply(fertiliza2$vol,fertiliza2$tmo,length);n.g

Diameter growth of trees

Description

The 'ficdiamgr' is a fictitious dataframe built to show the structure of longitudinal data. The dataframe has records of tree diameter growth of five sample trees, spanning three species.

Usage

data(ficdiamgr)

Format

A time series data containing the following columns:

tree.id: an ordered factor indicating the tree on which the measurement is made. The ordering is according to increasing maximum diameter.
time: a numeric vector giving the numbers of days since establishment.
dbh: a numeric vector of diameter at breast height, in cm.
site: a factor variable, representing site conditions with two levels.
spp: a factor variable, representing tree species with three levels.

Source

This dataframe was built from the 'Orange' data of the datasets package, by Christian Salas-Eljatib.

Examples

data(ficdiamgr)

coplot(dbh ~ time | tree, data = ficdiamgr, show.given = FALSE)

Crecimiento diametral de árboles

Description

Los datos 'ficdiamgr2' son ficticios, y fue construida para mostrar la estructura de datos longitudinales. Los datos tienen registro de crecimiento en cinco árboles muestra, representando a tres especies.

Usage

data(ficdiamgr2)

Format

Una serie de tiempo conteniendo las siguientes columnas:

arbol: indica el identificador del árbol.
tiempo: número de dias desde el inicio de las mediciones.
dap: diámetro a la altura del pecho, en cm.
sitio: un factor, representando condiciones de sitio, en dos niveles.
espe: un factor, representando especie del árbol, en tres niveles.

Source

Estos datos fueron modificados desde la dataframe 'Orange' de la librería 'datasets', por Christian Salas-Eljatib.

Examples

data(ficdiamgr2)

coplot(dap ~ tiempo | arbol, data = ficdiamgr2, show.given = FALSE)

Finds the position of a specific variable.

Description

Sometimes in data manipulation we face the task of locating the position of a specific variable within a dataframe. The function finds the position in which a column name is within an object.

Usage

findColumn.byname(data = data, col.name = col.name)

Arguments

data

is a dataframe

col.name

is a string specifying the name of the variable

Details

Although the function finds the position of a specific variable, can also be used for more than one variable.

Value

This function returns the number of a specific column-name.

Note

It can be used for a vector of specified column-names as well.

Author(s)

Christian Salas-Eljatib

References

Salas-Eljatib, C. 2021. Análisis de datos con el programa estadístico R: una introducción aplicada. Ediciones Universidad Mayor, Santiago, Chile. 170 p. https://eljatib.com/rlibro

Examples


df <- data.frame(varX=1:5, varY=letters[1:5], varZ=rep("a",5),
varK=rep("b",5))
df
#using the function
findColumn.byname(df, c("varY","varZ"))
findColumn.byname(df, "varK")
#Creating an example vector
vector <- letters
vector
findColumn.byname(vector, c("h","z"))

Fish growth variables.

Description

Variables of small mouth bass (i.e, a fish) collected in West Bearskin Lake, Minnesota, in 1991.

Usage

data(fishgrowth)

Format

Contains three variables, as follows:

years: Year at capture.
length: Length at capture (mm).
scale: radius of a key scale (mm).

Source

The data were obtained from the alr4 library of R, specifically from the dataframe wblake that includes only fish of ages 8 or younger.

References

Weisberg S. 2014. Applied Linear Regression. 4th edition. Hoboken NJ: Wiley

Examples

data(fishgrowth)    
head(fishgrowth) 
plot(length~age, data=fishgrowth)

Crecimiento de peces

Description

Variables de crecimiento de peces en el lago West Bearskin del estado de Minnesota, en 1991.

Usage

data(fishgrowth2)

Format

Contiene tres variables, como sigue:

edad: Year at capture.
largo: Length at capture, en mm.
escala: radius of a key scale, en mm.

Source

Datos obtenidos desde el paquete alr4 de R, de la dataframe wblake qie incluye peces de hasta 8 años.

References

Weisberg S. 2014. Applied Linear Regression. 4th edition. Hoboken NJ: Wiley

Examples

data(fishgrowth2)    
head(fishgrowth2)
plot(largo~edad,data=fishgrowth2)

Forest fire occurrence in central Chile

Description

Data of forest fire occurrence in central Chile having 7210 observations, with 890 cases of fire occurrence and 6320 cases of non-occurrence. The binary variable (Y) is the occurrence of forest fire, where Y=1 to denotes occurrence and Y=0, otherwise.

Usage

data(forestfire)

Format

The data frame contains four variables as follows:

fire: Occurrence of forest fire (1 yes, 0 no)
xcoord: Geographic coordinate x.utm
ycoord: Geographic coordinate y.utm
aspect: Exposure (degrees from north)
eleva: Elevation (m)
slope: Slope (degrees)
distr: Distance to dirt roads
distcity: Distance to cities
distriver: Distance to paved roads
covera: Land use classifications according to a polygon
coverb: Land use classifications according to a polygon
tempe: Minimum temperature of the coldest month
ppan: Annual precipitation
ndii: Normalized difference infrared index
nvdi: Normalized difference vegetation index
tempe2: Minimum temperature of the warmest month
ppan2: Precipitation of the driest month
frec.fire: Frequency of fires
perc.fire: Percentage of fire frequency
fireClass: Class for frecuency fire
asp.class: Class of variable exposure
eleva.class: Class of numerical variable elevation
slope.class: Class of numerical variable slope
ndii.class: Normalized difference infrared index class
nvdi.class: Normalized difference vegetation index class

Source

Data were provided by Dr Adison Altamirano at the Universidad de La Frontera (Temuco, Chile).

References

-Salas-Eljatib C, Fuentes-Ramírez A, Gregoire TG, Altamirano A, Yaitul V. 2018. A study on the effects of unbalanced data when fitting logistic regression models in ecology. Ecological Indicators 85:502-508. doi:10.1016/j.ecolind.2017.10.030

Altamirano A, Salas C, Yaitul V, Smith-Ramirez C, Avila A. 2013. Infuencia de la heterogeneidad del paisaje en la ocurrencia de incendios forestales en Chile Central. Revista de Geografia del Norte Grande, 55:157-170.

Examples

data(forestfire)
head(forestfire)

Ocurrencia de incendios forestales

Description

Datos de ocurrencia de incendios forestales en la zona central de Chile. Se tienen 7210 observaciones, de las cuales 890 tienen ocurrencia de incendios y 6320 casos de no ocurrencia. La variable binaria (Y) es la ocurrencia de un incendio forestal, donde Y=1 denota ocurrencia y Y=0, lo contrario.

Usage

data(forestfire2)

Format

Variables se describen a continuacion:

fire: Presencia de incendio forestal (1 si, 0 no)
xcoord: Coordenada geografica x.utm
ycoord: Coordenada geografica y.utm
aspect: Exposicion (grados desde el norte)
eleva: Elevacion (m)
slope: Pendiente (grados)
distr: Distancia a caminos de tierra
distcity: Distancia a ciudades
distriver: Distancia a caminos pavimentados
covera: Clasificaciones de uso del suelo segun un poligono
coverb: Clasificaciones de uso del suelo segun un poligono
tempe: Temperatura m?nima del mes m?s frio
ppan: Precipitacion anual
ndii: Indice infrarrojo de diferencia normalizado
nvdi: Indice de vegetacion de diferencia normalizado
tempe2: Temperatura m?nima del mes mas calido
ppan2: Precipitacion del mes mas seco
frec.fire: Frecuencia de incendios
perc.fire: Porcentajede la frecuencia de incendios
fireClass: Clase para variable frecuencia de incendio
asp.class: Clase de variable exposicion
eleva.class: Clase de variable numerica elevacion
slope.class: Clase de variable numerica pendiente
ndii.class: Clase de indice infrarrojo de diferencia normalizado
nvdi.class: Clase de indice de vegetacion de diferencia normalizado

Source

Datos fueron cedidos por el Dr. Adison Altamirano, Universidad de La Frontera, Temuco, Chile.

References

Altamirano A, Salas C, Yaitul V, Smith-Ramirez C, Avila A. 2013. Infuencia de la heterogeneidad del paisaje en la ocurrencia de incendios forestales en Chile Central. Revista de Geografia del Norte Grande, 55:157-170.

Examples

data(forestfire2)
head(forestfire2)

Prices of gasoline and crude oil

Description

Prices of gasoline and crude oil

Usage

gasoline

Format

Data frame of 14 rows and 3 columns:

year: Year of data
gasoline: Price of gasoline for year in cents / gallon
crude.oil: Price of crude oil fot year in $ / bbl

Source

McClave, James T. Benson, P.G. 1991. Statistics for Business and Economics, Fifth Edition. Dellen and Macmillan.

References

Statistial Abstract of the United States: 1989, pp476, 480.

Examples

data(gasoline)
plot(gasoline~year, data = gasoline, type = "b",
     ylab = "Gasoline price (cents/gallon)",
     xlab = "Year")

Precios de gasolina y petróleo

Description

Precios de gasolina y petróleo

Usage

gasoline2

Format

Data frame que contiene 14 filas y 3 columnas:

año: Año del precio
gasolina: Precio de la gasolina para el año en centavos / galón
petroleo: Precio del petróleo para el año en $ / bbl

Source

McClave, James T. Benson, P.G. 1991. Statistics for Business and Economics, Fifth Edition. Dellen and Macmillan.

References

Statistial Abstract of the United States: 1989, pp476, 480.

Examples

data(gasoline2)
plot(gasolina~año, data = gasoline2, type = "b",
     ylab = "Precio de la gasolina (centavos/galón)",
     xlab = "Año")

Datos GDP-per capita

Description

Datos del producto interno bruto per capita, por pais.

Usage

data(gdpcap)

Format

Este set de datos contiene las siguientes columnas:

pais: Nombre del país.
pais.cod: Codificación del país.
gdp.pc: GDP per capita, en US dollars.
y: GDP per capita, en miles de US dollars.

Source

Los datos fueron obtenidos desde la web https://data.worldbank.org/indicator/NY.GDP.PCAP.CD

Examples

data(gdpcap)    
head(gdpcap) 
unique(gdpcap$pais)
hist(gdpcap$y, breaks=20,xlab='PIB per capita (miles de US$)', col='orange', las=1)

Function to compute the geometric mean of a numeric vector

Description

Computes the geometric mean of a numeric vector. It is the n-th root of the product of n numbers, as follows.

y_g = \left(\prod_{i=1}^{n} y_i\right)^{1/n}

for y_i > 0. The geometric mean can be used a central position statistics of a random variable.

Usage

gmean(v)

Arguments

v

is a numeric vector

Details

Notice that can only be computed for positive values. For negative values, there are alternatives, but not covered here.

Value

This function returns the geometric mean, a numeric scalar.

Author(s)

Christian Salas-Eljatib.

References

Salas-Eljatib, C. 2021. Análisis de datos con el programa estadístico R: una introducción aplicada. Ediciones Universidad Mayor, Santiago, Chile. 170 p. https://eljatib.com/rlibro

Examples


y.var <- runif(10, min=10, max=45)
gmean(y.var)

Tree height growth of Douglas-fir sample trees in the Northwest of the United States

Description

Data contains 148 observations on the height growth of dominant trees of Pseudotsguga mensiezzi in the Northwest of the United States.

Usage

data(hgrdfir)

Format

The data frame contains seven variables as follows:

natfor.id: Code identifier.
plot.code: Plot number identification
tree.code: Tree number identification.
dbh: Diameter at breast height at sampling, in in.
toth: Total height at sa,pling, in ft.
age: Age of tree, yr.
height: Height at a given age, in ft.

Source

The data were provided by Dr Christian Salas.

References

Monserud RA. 1984. Height growth and site index curves for Inland Douglas-fir based on stem analysis data and forest habitat type. Forest Science 30(4):943-965.

Salas C, Stage AR, and Robinson AP. 2008. Modeling effects of overstory density and competing vegetation on tree height growth. Forest Science 54(1):107-122. doi:10.1093/forestscience/54.1.107

Examples

data(hgrdfir)
head(hgrdfir)
unique(hgrdfir$tree.code)
table(hgrdfir$plot.code,hgrdfir$tree.code)
tapply(hgrdfir$dbh, hgrdfir$tree.code, mean)
tapply(hgrdfir$dbh, hgrdfir$tree.code, mean) #dbh of each sample tree
tapply(hgrdfir$toth, hgrdfir$tree.code, mean) #toth of each sample tree

Crecimiento en altura de una muestra de árboles en los Estados Unidos

Description

Data contiene 148 obserrvaciones sobre el crecimiento en altura de árboles dominantes de Pseudotsguga mensiezzi en el Nor-Oeste de los Estados Unidos

Usage

data(hgrdfir2)

Format

La data frame contiene siete variables:

bosque.id: Codigo identificador del bosque.
parcela: Codigo identificador de la parcela.
arbol: Número de identificacion árbol.
dap: Diámetro a la altura del pecho, en pulgadas.
atot: Altura total, en pies
edad: Edad, en os
altura: Altura para cada edad del árbol, en pies

Source

La data fue cedida por el Dr Christian Salas-Eljatib.

References

Monserud RA. 1984. Height growth and site index curves for Inland Douglas-fir based on stem analysis data and forest habitat type. Forest Science 30(4):943-965.

Salas C, Stage AR, and Robinson AP. 2008. Modeling effects of overstory density and competing vegetation on tree height growth. Forest Science 54(1):107-122. doi:10.1093/forestscience/54.1.107

Examples

data(hgrdfir2)
head(hgrdfir2)
unique(hgrdfir2$arbol.id)
table(hgrdfir2$parcela,hgrdfir2$arbol.id)
tapply(hgrdfir2$dap, hgrdfir2$arbol.id, mean) #dap de cada arbol muestra
tapply(hgrdfir2$atot, hgrdfir2$arbol.id, mean) #atot de cada arbol muestra

Function for building a figure having both an histogram and a boxplot for a single random variable

Description

The function creates a figure having both an histogram and a boxplot for a random variable, as a way to help understanding its distribution.

Usage

histbxp(
  y = y,
  freqlab = "Frequency",
  varlab = "Variable",
  eng = TRUE,
  refval = NA,
  print.refval = FALSE,
  col.hist = "gray",
  col.bxp = "gray",
  portrait = TRUE,
  oma = c(3, 0.5, 2, 0),
  mar = c(1, 4, 0.2, 1),
  cex.varlab = 1.2,
  refval.symbol = expression(bar(y)),
  col.refval = "blue",
  varlim = NA,
  freqlim = NA
)

Arguments

y

A numeric vector representing the random variable.

freqlab

(optional) A string specifying the frequency label. The default is set to "Frequency".

varlab

(optional) A string specifying the random variable label. The default is set to "Variable".

eng

logical; if "TRUE" (by default), the language of some default text will be in English; if "FALSE" will be in Spanish. The default is to "TRUE".

refval

A numeric value to be used for printing as reference for the random variable. By default is set to the mean of the variable y.

print.refval

A logical statement to define whether a reference value should be printed, if set to TRUE, the mean of the y vector will be plotted. The default is FALSE.

col.hist

A string specifying the histogram color. The default is "gray".

col.bxp

A string specifying the boxplot color. The default is "gray".

portrait

A logical statement, if set to TRUE, the boxplot will be located under the histogram (2 rows, 1 column). If is set to FALSE, the boxplot will be located next to the histogram (1 row, 2 columns). The default is TRUE.

oma

As in the plot environment. The default is c(3, .5, 2, 0).

mar

As in the plot environment. The default is c(1, 4.0, 0.2, 1).

cex.varlab

A numeric value for the cex option of plotting to the assigned varlab element. The default value is set to 1.2 .

refval.symbol

A string of type expression with name of the refval being printed, if print.refval is set to TRUE. The default is expression(bar(y)).

col.refval

A string specifying the refval.symbol color, if print.refval is set to TRUE. The default is "blue"

varlim

(optional) A numeric vector having the minimum and maximum, respectively for the random variable.

freqlim

(optional) A numeric vector having the minimum and maximum, respectively for the frequency axis.

Details

The variable must be numeric.

Value

The function returns the above described graph.

Author(s)

Christian Salas-Eljatib

References

Salas-Eljatib C. 2021. Análisis de datos con el programa estadístico R: una introducción aplicada. Ediciones Universidad Mayor. Santiago, Chile. 170 p. https://eljatib.com

Salas C, Stage AR, and Robinson AP. 2008. Modeling effects of overstory density and competing vegetation on tree height growth. Forest Science 54(1):107-122. doi:10.1093/forestscience/54.1.107

Examples

df <- datana::fishgrowth
histbxp(y=df$length)

### distribution of 'length'
## with mean refval
histbxp(y=df$length, print.refval = TRUE)

## with given refval
histbxp(y=df$length, print.refval = TRUE, refval = 250)

## changing labels
histbxp(y=df$length, print.refval = TRUE, refval = 250,
        freqlab = "FREQ", varlab = "LENGTH")

## changing colors
histbxp(y=df$length, print.refval = TRUE, refval = 250,
        freqlab = "FREQ", varlab = "LENGTH",
        col.hist = "blue",
        col.bxp = "green",
        col.refval = "red")


### distribution of 'scale'
## with mean refval
histbxp(y=df$scale, print.refval = TRUE)

## landscape mode
histbxp(y=df$scale, print.refval = TRUE, portrait = FALSE)

## with limits
histbxp(y=df$scale, print.refval = TRUE, portrait = FALSE,
        freqlim = c(0,100),
        varlim = c(0, max(df$scale)))

Function to compute the harmonic mean of a numeric vector

Description

Computes the harmonic mean of a numeric vector. It is the inverse of the mean of the recriprocals of n numbers, as follows.

y_h = \frac{n}{\left(\sum_{i=1}^{n} \frac{1}{y_i}\right)}

for y_i \neq 0. The harmonic mean can be used a central position statistics of a random variable.

Usage

hmean(v)

Arguments

v

is a numeric vector

Details

Notice that can only be computed for values different from cero.

Value

This function returns the harmonic mean, a numeric scalar.

Author(s)

Christian Salas-Eljatib.

References

Salas-Eljatib, C. 2021. Análisis de datos con el programa estadístico R: una introducción aplicada. Ediciones Universidad Mayor, Santiago, Chile. 170 p. https://eljatib.com/rlibro

Examples


y.var <- runif(10, min=10, max=45)
hmean(y.var)

Tree height-diameter data from Idaho (USA)

Description

These data are forest inventory measures from the Upper Flat Creek stand of the University of Idaho Experimental Forest, dated 1991.

Usage

data(idahohd)

Format

Contains five variables, as follows:

plot: Plot number.
tree: Tree within plot.
species: A factor with levels DF = Douglas-fir, GF = Grand fir, SF = Subalpine fir, WL = Western larch, WC = Western red cedar, WP = White pine.
dbh: Diameter 137 cm perpendicular to the bole, cm.
height: Height of the tree, in m.

Source

The data were assembled from the 'ufc' dataframe from the alr4 library.

References

Weisberg S. 2014. Applied Linear Regression. 4th edition. New York: Wiley.

Examples

data(idahohd)    
head(idahohd) 
plot(height~dbh, data=idahohd)

Altura-diámetro de árboles en el estado de Idaho (USA)

Description

Estos datos provienen de un muestreo en el bosque experimental de la University of Idaho, en Upper Flat Creek, Idaho, USA. Medido en 1991.

Usage

data(idahohd2)

Format

Contiene cinco variables detalladas a continuación:

parce: Número de la parcela de muestreo.
arbol: Número del árbol dentro de la parcela.
spp: Especie del árbol, una variable factor con niveles DF = Douglas-fir, GF = Grand fir, SF = Subalpine fir, WL = Western larch, WC = Western red cedar, WP = White pine.
dap: Diámetro del fuste a los 1.3 m sobre el suelo, en cm.
atot: Altura del árbol, en m.

Source

Los datos fueron obtenidos desde la dataframe 'ufc' de la librería alr4.

References

Weisberg S. 2014. Applied Linear Regression. 4th edition. New York: Wiley.

Examples

data(idahohd2)    
head(idahohd2)
plot(atot~dap, data=idahohd2)

Índice Mensual de Actividad Económica (IMACEC)

Description

Base de datos con el Índice Mensual de Actividad Económica (IMACEC) de Chile, que incluye información desde enero de 1997 en adelante. La base cuenta con 340 observaciones, que representan meses, e incorpora diversas desagregaciones sectoriales. La variable principal es el IMACEC mensual, que representa una estimación de la evolución de la actividad económica del país respecto al mismo mes del año anterior.

Usage

data(imacec2)

Format

Variables se describen a continuación:

fecha: Fecha de la observación (formato Date, primer día del mes)
anho: Año de la observación
mes: Mes de la observación
imacec: Índice mensual de actividad económica total
crec.prod: Crecimiento del sector producción de bienes
crec.min: Crecimiento del sector minería
crec.ind: Crecimiento del sector industrial
crec.rest: Crecimiento del resto de bienes no mineros ni industriales
crec.com: Crecimiento del sector comercio
crec.serv: Crecimiento del sector servicios
imacec.fac: IMACEC ajustado por costo de factores
crec.imp: Crecimiento de los impuestos sobre los productos
imacec.nomin: Índice de actividad económica excluyendo minería

Source

Banco Central de Chile. Datos extraídos de la serie histórica de indicadores mensuales. Los datos fueron digitados por Saúl Ketterer, estudiante del Prof. Christian Salas-Eljatib.

References

Banco Central de Chile. “Serie IMACEC”, disponible en https://si3.bcentral.cl/siete

Examples

data(imacec2)
head(imacec2)

Computes the sample kurtosis of a distribution

Description

The kurtosis is about the tailedness, or the degree of heaviness of the tails, in the frequency distribution. The function computes an estimator of the kurtosis.

Usage

kurto(x, na.rm = TRUE)

Arguments

x

a numeric vector of a random variable.

na.rm

logical operator to remove NA values. The default is set to TRUE.

Details

The kurtosis of a random variable is the fourth moment of the standardized variable. There are several ways of parameterizing a kurtosis estimator, such as depending on the fourth moment and the standard deviation of the random variable.

Value

An estimator of the kurtosis.

Author(s)

Christian Salas-Eljatib

Examples

y.var<-rnorm(100);x.var<-rbeta(100,.2,2)
kurto(y.var)
kurto(x.var)

Land-cover, environmental and sociodemographic data for the 34 municipalities composing the Greater Santiago area, Santiago, Chile.

Description

dataset contains 476 observations, 34 categorical and 442 numerical. Land-cover data was generated through remote sensing classification techniques using Sentinel-2 satellite images from year 2016. Temperatures were obtained from TIRS band 10 of Landsat 8 satellites images. Particulate matter concentrations were estimated using spatial modelling techniques from 10 pollution stations distributed in the city. Altitude was generated from a Digital Elevation Model. Population and poverty were gathered from Casen 2017 survey.

Usage

data(landcover)

Format

The data frame contains four variables as follows:

county: Name of Municipality
built.p: Percentage of surface covered by built-up area
vegeta.p: Percentage of surface covered by vegetation
naked.p: Percentage of surface covered by bare soil
grass.p: Percentage of surface covered by deciduous vegetation
p.Deciduo: Percentage of surface covered by evergreen vegetation
p.Siempreverde: Percentage of surface covered by evergreen vegetation
temp.winter: Land surface temperature in celsius degrees at 2pm on a winter 0% cloud day
temp.summer: Land surface temperature in celsius degrees at 2pm on a summer 0% cloud day
pm10.winter: Average particulate matter 10 micron during winter months
pm10.summer: Average particulate matter 10 micron during summer months
poor.p: Percentage of people under poverty line year 2017.
eleva: Average altitude of municipal area.
pop: Total population of municipality

Source

Data were provided by Dr Ignacio Fernandez at Universidad Adolfo Ibañez (Santiago, Chile).

References

Not yet

Examples

data(landcover)
head(landcover)

Cobertura territorial, ambiental y sociodemografica de los 34 municipios que componen el area del Gran Santiago, Santiago, Chile..

Description

El conjunto de datos contiene 476 observaciones, 34 categoricas y 442 numericas. Los datos de cobertura terrestre se generaron mediante tecnicas de clasificacion de teledeteccion utilizando imagenes de satelite Sentinel-2 del año 2016. Las temperaturas se obtuvieron de la banda TIRS 10 de las imagenes de los satelites Landsat 8. Las concentraciones de material particulado se estimaron mediante tecnicas de modelado espacial de 10 estaciones de contaminacion distribuidas en la ciudad. La altitud se genero a partir de un modelo de elevacion digital. La poblacion y la pobreza se obtuvieron de la encuesta Casen 2017.

Usage

data(landcover2)

Format

Variables se describen a continuacion:

comuna: Name of Municipality
const.p: Porcentaje de superficie cubierta por area construida
vegeta.p: Porcentaje de superficie cubierta por vegetacion
desnu.p: Porcentaje de superficie cubierta por suelo desnudo
pasto.p: Porcentaje de superficie cubierta por cesped
deci.p: Porcentaje de superficie cubierta por vegetacion de hoja caduca
sverde.p: Porcentaje de superficie cubierta por vegetacion siempre verde
temp.inv: Temperatura de la superficie terrestre en grados celsius a las 2 p.m.en un dia de invierno con 0% de nubes
temp.ver: Temperatura de la superficie de la tierra en grados celsius a las 2 p.m.en un dia de verano con 0% de nubes
pm10.inv: Material particulado promedio de 10 micrones durante los meses de invierno
pm10.ver: Material particulado promedio de 10 micrones durante los meses de verano
pobreza.p: Porcentaje de personas por debajo de la linea de pobreza año 2017
altitud: Altitud media del termino municipal
pob: Poblacion total del municipio

Source

Los datos fueron cedidos por el Dr Ignacio Fernandez de la Universidad Adolfo Ibañez (Santiago, Chile).

References

Not yet

Examples

data(landcover2)
head(landcover2)

Large trees in forests near Tolga, in Eastern Norway.

Description

The study area is situated in the municipality of Tolga, located in Hedmark County, Eastern Norway. Field plots 32 m × 32 m in size were established in forests. A total of 1109 plots were sampled. In each plot, Scots pines (Pinus sylvestris L.). trees with a stem diameter larger than 35 cm were measured and counted.

Usage

data(largetrees)

Format

Contains two variables, as follows:

plot: Plot code.
y: Number of large-diameter trees in a given sample plot.

Source

Although Christian Salas was part of the study, he just reproduced the needed data to mimic the distribution of the random variable of interest, as shown in the study of Korkhonen et al (2016).

References

Korhonen L, Salas C, Ostgard T, Lien V, Gobakken T, Naesset E. 2016. Predicting the occurrence of large-diameter trees using airborne laser scanning. Canadian Journal of Forest Research 46:461–469. doi:10.1139/cjfr-2015-0384

Examples

data(largetrees)
head(largetrees)
hist(largetrees$y)

Árboles grandes en bosques cercanos a Tolga, en el Este de Noruega.

Description

El área de estudio esta ubicada en la municiplaidad de Tolga, en la comuna de Hedmark, al Este de Noruega. 1109 parcelas de muestreo de 32 m × 32 m se establecieron en los bosques. En cada parcela, los árboles de pino escoses (Pinus sylvestris L.). que tuvieran un diámetro mayor a 35 cm fueron medidos y contados.

Usage

data(largetrees2)

Format

Los datos poseen las siguientes dos columnas:

parc: Identificador de la parcela de muestreo.
y: Número de árboles de gran diámetro encontrados en una parcela de muestreo.

Source

Aunque el Prof. Christian Salas fue parte del estudio, acá se han reproducido los datos necesarios que imitan la distribución de la variable aleatoria de interés, tal como se muestra en el estudio de Korkhonen et al (2016).

References

Korhonen L, Salas C, Ostgard T, Lien V, Gobakken T, Naesset E. 2016. Predicting the occurrence of large-diameter trees using airborne laser scanning. Canadian Journal of Forest Research 46:461–469. doi:10.1139/cjfr-2015-0384

Examples

data(largetrees2)
head(largetrees2)
hist(largetrees2$y)

Esperanza de vida de paises

Description

El repositorio del Observatorio Mundial de la Salud (GHO) de la Organización Mundial de la Salud (WHO) mantiene un registro del estado de salud como también otros factores relacionados, para todos los países. Las bases de datos son publicadas con el objetivo de analizarlos. La base de datos de esperanza de vida ha sido compilada en conjunto con datos económicos de las Naciones Unidas.

Usage

data(lifexpect)

Format

Este set de datos contiene 22 columnas:

country: País de origen
year: Año
status: Categoría del país Desarrollado/En desarrollo
life.expectancy: Esperanza de vida en años
adult.mortality: Mortalidad en adultos expresado como la probabilidad de morir entre 15 y 60 años de edad por cada 1000 habitantes
infant.deaths: Mortalidad en infantes cada 1000 habitantes
alcohol: Consumo de alcohol percapita en mayores de 15 años
percentage.expenditure: Porcentaje de vacunación
hepatitis.b: Porcentaje de vacunación contra hepatitis b
measles: Casos de sarampión cada 1000 habitantes
bmi: Índice de masa corporal (BMI) promedio
under.five.deaths: Muertes de menores de 5 años cada 1000 habitantes
polio: Porcentaje de vacunación contra polio
total.expenditure: Inversión en salud como porcentaje del GDP per cápita
diphtheria: Porcentaje de vacunación contra diphteria
hiv.aids: Porcentaje casos de VIH, ETS
gdp: GDP per cápita en USD
population: Población total
thinness10.19: Desnutrición entre 10 y 19 años de edad
thinness5.9: Desnutrición entre 5 y 9 años de edad
icr: Índice de desarrollo humano en términos de composición de ingresos
schooling: Promedio de años de educación

Source

Los datos fueron obtenidos desde la web https://rpubs.com/Alvian2022/LifeExpectancy. Note que solo los datos del año 2014 son utilizados acá.

Examples

data(lifexpect)    
head(lifexpect) 
table(lifexpect$status)
tapply(lifexpect$life.expectancy, lifexpect$status,mean)

Tree locations for a sample plot in the Llancahue experimental forest

Description

The Cartesian position, species, and diameter of trees within a plot were measured. The sample plot is rectangular of 130 m by 70 m. Further details can be #' reviewed in the reference.

Usage

data(llancahue)

Format

Contains tree-level variables, as follows:

tree.code: Tree identificator
spp: species abreviation as follows: AP=Aextocicon puncatatum, EC=Eucryphia cordifolia, GA=Gevuina avellana, LP=Laureliopsis philippiana, LS=Laurelia sempervirens, ND=Nothofagus dombeyi, Ot=Other, PS=Podocarpus saligna
dbh: diameter at breast height, in cm.
x.coord: Cartesian position in the X-axis, in m.
y.coord: Cartesian position in the Y-axis, in m.

Source

The data are provided courtesy of Prof. Daniel Soto at Universidad de Aysen (Coyhaique, Chile).

References

Soto DP, Salas C, Donoso PJ, Uteau D. 2010. Heterogeneidad estructural y espacial de un bosque mixto dominado por Nothofagus dombeyi después de un disturbio parcial. Revista Chilena de Historia Natural 83(3): 335-347.

Examples

data(llancahue)    
head(llancahue) 
descstat(llancahue$dbh)
boxplot(dbh~spp, data=llancahue)

Ubicación cartesiana de árboles en el bosque de Llancahue

Description

Corresponde a la posición cartesiana, especie, y diámetro de árboles en una parcela de muestreo en el bosque de Llancahue, cerca de Valdivia, Chile. La parcela es rectangular con dimensiones de 130 m por 70 m. Mayores antecedentes aparecen en las referencias.

Usage

data(llancahue2)

Format

Contains tree-level variables, as follows:

arb.id: Identificador del árbol.
spp: Codificación de la especie como sigue: AP= Aextocicon puncatatum, EC=Eucryphia cordifolia, GA=Gevuina avellana, LP=Laureliopsis philippiana, LS=Laurelia sempervirens, ND=Nothofagus dombeyi, Ot=Other, PS=Podocarpus saligna.
dap: Diámetro a la altura del pecho, en cm.
coord.x: Posición cartesiana en el eje-X, en m.
coord.y: Posición cartesiana en el eje-Y, en m.

Source

Los datos fueron cedidos por el Prof. Daniel Soto de Universidad de Aysen (Coyhaique, Chile).

References

Soto DP, Salas C, Donoso PJ, Uteau D. 2010. Heterogeneidad estructural y espacial de un bosque mixto dominado por Nothofagus dombeyi después de un disturbio parcial. Revista Chilena de Historia Natural 83(3): 335-347.

Examples

data(llancahue2)    
head(llancahue2) 
descstat(llancahue2$dap)
boxplot(dap~spp, data=llancahue2)

Performs a likelihood ratio test between two models being fitted by maximum likelihood.

Description

Function to perform a likelihood ratio test (LRT) between a reduced model (modA) versus a more complex model (modB), provided both models were fitted by maximum likelihood. The function requires to be filled with the needed values used to perform a LRT.

Usage

lrt(
  llma = llma,
  llmb = llmb,
  qa = qa,
  qb = qb,
  nfit = nfit,
  modA = "modA",
  modB = "modB",
  alpha = 0.05
)

Arguments

llma

maximized log-likelihood of the reduced model (or modA).

llmb

maximized log-likelihood of the more-complex model (or modB).

qa

the number of parameters of the reduced model.

qb

the number of parameters of the more-complex model.

nfit

the sample size used for fitted both models.

modA

is a character with a name to be assigned to object modA.

modB

is a character with a name to be assigned to object modB.

alpha

is the level of sifnificance to used for computing as a reference only, the tabulated value of the respective Chi-Squared statistic. By the defaul is set to 0.05.

Details

The resulting output offers statistical inference estimates of the LRT, as well as other maximum likelihood-based statistics. Notice that the function only works if the number of parameters for modA is lower than the ones of modB.

Value

This function wraps two outputs: (i) a table that computes the AIC, BIC and AICc goodness-of-fit statistics for both models, and (ii) the result of the likelihood ratio test, such as the value of the statistic being computed, its respective p-value, and the tabulated value of the statistics using the a defined alpha significance of level.

Author(s)

Christian Salas-Eljatib.

References

Salas-Eljatib, C. 2025. Estadística Aplicada e Inferencial. Borrador de libro, Universidad de Chile, Santiago, Chile. https://eljatib.com/rlibro

Examples


#Maximized values for two probability mass functions
max.ll.pois<- -39.86337; max.ll.bneg<--33.823003
c(max.ll.pois,max.ll.bneg)
sample.size<-26
#Number of parameters
num.para.pois<- 1; num.para.bneg<- 3
c(num.para.pois, num.para.bneg)
#Names to be used for each model
 modA="Poisson"; modB="hiper"
outall<-lrt(llma=max.ll.pois,llmb=max.ll.bneg,qa=num.para.pois,
qb=num.para.bneg,nfit = sample.size,modA = "Poisson",
modB = "Hipergeometrico")
#Output1: A comparative table 
tab.out<-outall$tab.models
tab.out
#Output2: the results of the LRT
out<-outall$lrt.out
out$r.tab
out$Ldif

Computes a likelihood ratio test between a reduced model and a full model. Both models must be already fitted using and R function.

Description

Computes a likelihood ratio test between a reduced model (modr) and a full model (modr). Both models must be previously fitted by maximum likelihood using an R function such as nlme() and such, that are part of the generalized lineal models.

Usage

lrt.glm(modr, modf)

Arguments

modr

is the object containing a previously fitted reduced model, using a glm-type of function, having less parameters than modf.

modf

is the object containing a previously fitted full model, using a glm-type of function, having more parameters than modr.

Details

Double-check the order of the reduced and full model, before of using the model

Value

This function returns an object having the following elements: "loglik.Modr" maximized log-likelihood of modr; "loglik.Modf" maximized log-likelihood of modf; "dif.loglik" difference in log-likelihood between both models, and "dif.df" difference in degrees of freedong of both models, and "p-value" is the p-value for the LRT.

Author(s)

Christian Salas-Eljatib.

References

Pinheiro JC, and Bates DM. 2000. Mixed-effects models in S and Splus. Springer-Verlag, New York, NY. 528 p.

Examples


#not yet implemented

Computes the mode

Description

Computes the mode of a random variable.

Usage

moda(y = y)

Arguments

y

is a numeric vector.

Details

The mode is an statistics representing the most "used" value of the random variable as a way of central position.

Value

The function returns the mode, a numeric scalar.

Author(s)

Christian Salas-Eljatib.

Examples


set.seed(1234)
variable <- rnorm(10, mean=45,sd=6)
#using the function
moda(y=variable)
moda(variable)

Productividad científica de estudiantes de postgrado

Description

Corresponde a un estudio realizado en la Universidad de Indiana, sobre el número de papers publicados por estudiantes egresados de programas de doctorado en bioquímica luego de 3 años.

Usage

data(papersdocstu)

Format

Este set de datos contiene las siguientes columnas:

papers: Es el número de artículos cientificos publicados luego de 3 años de egresado.
genero: Hombre/mujer.
est.civil: Estado civil del egresado.
nin.men5: Número de hijos menores a 6 años que dependen del egresado.
prog.prest: Puntaje asignado al prestigio del programa de postgrado.
papers.guia: Número de papers publicados por el profesor(a) guía del egresado, en el mismo periodo de tiempo.

Source

Los datos fueron obtenidos desde el paquete 'AER'.

References

Long, J.S. (1997). The Origin of Sex Differences in Science.

Examples

data(papersdocstu)
df<-papersdocstu    
head(df)
barplot(table(df$papers),xlab="Numero de papers publicados",
   ylab="Frecuencia (num. de estudiantes)")
table(df$genero)
table(df$est.civil,df$genero)
tapply(df$papers,df$est.civil,summary)

Peso de hojas

Description

Peso de hojas

Usage

pesohojas

Format

Data frame con 64 filas y 2 columnas:

peso: peso foliar en gramos (g)
area: área foliar en centímetros cuadrados (cm²)

Examples

data(pesohojas)
plot(peso~area, data = pesohojas)

Presence or absence of sea ice from logbook records of annual cruises

Description

Data containing 52717 observations about presence of sea ice from logbook records of annual cruises to the B-C-B in an unbroken record between years 1850 to 1910.

Usage

data(presenceIce)

Format

The dataframe contains the following columns:

ship.id: The code number for ships.
move.type: Type of movement of ships. 0 indicates a sail-powered vessel and 1 indicates an auxiliary-powered vessel.
year: Year of registry.
month: Month of registry.
day: Day of registry.
lat.dec: Decimal latitude.
long.dec: Decimal longitude.
e.w: East or west of the Prime Meridian.
ice.cov: Sea Ice Observed. 0 no see (Not registered) and 1 presence sea ice (Registered).

Source

The data were provided from Sea Ice Group at the Geophysical Institute.

References

Mahoney A, Bockstoce J, Botkin D, Eicken H, Nisbet R. 2011. Sea-Ice Distribution in the Bering and Chukchi Seas: Information from Historical Whaleships' Logbooks and Journals ARCTIC. 64(4): 465-477.

Examples

data(presenceIce)
head(presenceIce)

Eleccion presidencial del 2021 en Chile.

Description

Datos de mesa de la eleccion presidencial del 2012 en Chile. La eleccion se llevo a cabo el 19 de Diciembre del 2021.

Usage

data(president)

Format

Los datos contienen las siguientes columnas:

region.no: Número de la region adminsitrativa de Chile.
region: Nombre de la region administrativa de Chile
provincia: Provincia.
circu.senatorial: Circunscripcion senatorial.
distrito: Distrit.
comuna: County.
circu.elec: Circunscripcion electoral.
local: Local de votacion. Generalmente es un colegio.
no.mesa: Número de mesa.
tipo.mesa: Tipo de mesa de votacion.
mesas.fusionadas: Mesa de votacion fucionada.
electores: Electores.
nro.en.voto: .
candidato: Candidato, ya sea Gabriel Boric o Jose A. Kast
votos.tricel: Número total de votos segun el TRICEL (Tribunal calificador de elecciones).

Source

Los datos fueron obtenidos desde el sitio web del Servicio Electoral del Gobierno de Chilean (SERVEL) en https://www.servel.cl. El archivo de datos descargado el 24 de Octubre del 2022 tenia el nombre Resultados mesa presidencial TRICEL 2v 2021-1.xlsx.

Examples

data(president)
head(president)

Elección primaria para la presidencia de Chile

Description

Datos a nivel de mesa de la votación para elecciones primarias para Presidente de Chile en 2021.

Usage

data(primarias)

Format

Este set de datos contiene las siguientes columnas:

region.no: Región administrativa de Chile.
region: Nombre de la región.
provincia: Provincia.
distrito: Distrito.
comuna: Comuna.
circu.elec: Circunscripción electoral.
local: Local de votación.
tipo.mesa: tipo de mesa.
mesa: Código identificador de la mesa.
mesas.fusionadas: Mesas fusionadas.
nro.voto: .
lista: Lista política del candidato.
pacto: Pacto político del candidato.
partido: Partido político del candidato.
candidato: Nombre del candidato.
votos: Número total de votos.

Source

Los datos fueron obtenidos desde el servicio electoral de Chile (SERVEL) en el web https://www.servel.cl. El nombre del archivo era Resultados Primarias Presidenciales 2021 CHILE.xlsx, y fue descargado el 4 de octubre del 2022. Los datos fueron ordenados, y solo aquellas filas que contenian información en la columna 'votos' son parte de la dataframe.

Examples

data(primarias)
head(primarias)
table(primarias$region)
table(primarias$region,primarias$candidato)
tapply(primarias$votos,primarias$candidato,sum)

Ubicación cartesiana de árboles en el bosque de Llancahue para uso del libro.

Description

Usage

data(pspLlancahue)

Format

Contains tree-level variables, as follows:

arb.id: Identificador del árbol.
spp: Codificación de la especie como sigue: AP= Aextocicon puncatatum, EC=Eucryphia cordifolia, GA=Gevuina avellana, LP=Laureliopsis philippiana, LS=Laurelia sempervirens, ND=Nothofagus dombeyi, Ot=Other, PS=Podocarpus saligna.
dap: Diámetro a la altura del pecho, en cm.
coord.x: Posición cartesiana en el eje-X, en m.
coord.y: Posición cartesiana en el eje-Y, en m.

Source

Los datos fueron cedidos por el Prof. Daniel Soto de Universidad de Aysen (Coyhaique, Chile).

References

Soto DP, Salas C, Donoso PJ, Uteau D. 2010. Heterogeneidad estructural y espacial de un bosque mixto dominado por Nothofagus dombeyi después de un disturbio parcial. Revista Chilena de Historia Natural 83(3): 335-347.

Examples

data(pspLlancahue)    
head(pspLlancahue) 
descstat(pspLlancahue$dap)
boxplot(dap~spp, data=pspLlancahue)

Tree spatial coordinates in the Rucamanque forest

Description

Tree-level variables and spatial coordinates in a permanent sample plot of 1 ha (100 x 100m) in the Rucamanque experimental forest, near Temuco, Chile.

Usage

data(pspruca)

Format

The data frame contains four variables for the standing-alive trees as follows:

tree.no: tree number
species: Species name, "N. obliqua" is Nothofagus obliqua, "Ap" is Aexitocicum puncatatum, etc.
crown.class: Crown class (1: superior, 2: intermediate, 3; inferior)
dbh: diameter at breast-height, in cm
x.coord: Cartesian position at the X-axis, in m
y.coord: Cartesian position at the Y-axis, in m

Source

Data were provided by Dr Christian Salas-Eljatib (Universidad de Chile, Santiago, Chile).

References

Salas C, LeMay V, Nunez P, Pacheco P, and Espinosa A. 2006. Spatial patterns in an old-growth Nothofagus obliqua forest in south-central Chile. Forest Ecology and Management 231(1-3): 38-46. doi:10.1016/j.foreco.2006.04.037

Examples

data(pspruca)
head(pspruca)
table(pspruca$species)

Ubicación espacial de árboles en el bosque de Rucamanque

Description

Medidas a nivel de árbol y coordenadas espaciales en un parcela de muestreo permanente de 1 ha (100 x 100m) en el bosque de Rucamanque, cerca de Temuco, Chile. Mayores antecedentes en las referencias.

Usage

data(pspruca2)

Format

Las columnas describen características de los árboles vivos en pie, como sigue:

arbol: Número del árbol
especie: Nombre de la especie, "N. obliqua" es Nothofagus obliqua, "Ap" es Aexitocicum puncatatum, etc.
clase.copa: Clase de copa (1: superior, 2: intermedio, 3; inferior)
dap: Diámetro a la altura del pecho, en cm
coord.x: Posicion cartesiana en el eje X, en m
coord.y: Posicion cartesiana en el eje Y, en m

Source

Los datos fueron cedidos por el Dr Christian Salas-Eljatib (Santiago, Chile).

References

Examples

data(pspruca2)
table(pspruca2$especie)

Height growth of Pinus taeda (Loblolly pine) trees

Description

The Loblolly data frame has 84 rows and tree columns of records of the tree height growth of Loblolly pine trees. This dataframe is a slight modification to the original dataframe "Loblolly" from the datasets R package.

Usage

data(ptaeda, package="datana")

Format

A dataframe containing the following columns:

seed.id: an ordered factor indicating the seed source for the tree. The ordering is according to increasing maximum height.
age: a numeric vector of tree ages, in yr.
toth: a numeric vector of tree heights, in m.

Source

Pinheiro, J. C. and Bates, D. M. (2000) Mixed-effects Models in S and S-PLUS. Springer.

Examples


data(ptaeda, package="datana")
head(ptaeda)
plot(toth ~ age, data = subset(ptaeda, seed.id == 329),
     xlab = "Age (yr)", las = 1,
     ylab = "Height (m)")

Crecimiento en altura de Pinus taeda

Description

Esta dataframe contiene 84 folas y tres columnas de crecimiento en altura de árboles de Pinus taeda (Loblolly pine). Es una modificación de la dataframe "Loblolly" del paquete 'datasets' de R.

Usage

data(ptaeda2)

Format

Los datos contienen las siguientes columnas:

semilla.id: Un factor indicando el origen de la semilla del árbol.
edad: Edad del árbol, en años.
atot: Altura total, en m.

Source

Pinheiro, J. C. and Bates, D. M. (2000) Mixed-effects Models in S and S-PLUS. Springer.

Examples


data(ptaeda2, package="datana")
head(ptaeda2)
plot(atot ~ edad, data = subset(ptaeda2, semilla.id == 329),
     xlab = "Edad (años)", las = 1,
     ylab = "Altura (m)")

Obtain the P-value for a Standard t-distributed random variable

Description

Function to compute the P-value for a Standard t-distributed random variable.

Usage

pvalt(t.value, df, decnum = 14)

Arguments

t.value

A numeric random variable following a t-student pdf distribution.

df

degrees of freedom of the random variable following a t-student pdf distribution.

decnum

the number of decimals to be used in the output. The default is set to 5.

Details

It is suited to compute the P-value for any random variable following a Standard t probability density function (pdf). For instance, to obtain the p-value in a t-test.

Value

The function returns the P-value or probability of getting a value as large as t.value.

Author(s)

Christian Salas-Eljatib

Examples

# Load dataset
 df <- datana::fertiliza2
 head(df)
 ## Computes the t-test statistics (from the 'stats' package)
t.value <- stats::t.test(df$vol)
t.value
 t.v <- as.numeric(t.value$statistic);t.v
 deg.f <- as.numeric(t.value$parameter);deg.f

 ## Obtaining the p ##  pvalt(t.v,deg.f)

Obtain the P-value for a Standard Gaussian random variable

Description

Function to computes the P-value for a Standard Gaussian random variable.

Usage

pvalz(zval, decnum = 5)

Arguments

zval

A numeric random variable following a Standard Gaussian distribution.

decnum

the number of decimals to be used in the output. The default is set to 5.

Details

It is suited to compute the P-value for any random variable following a Standard Gaussian probability density function.

Value

This function returns the P-value or probability of getting a value as large as 'zval'.

Author(s)

Christian Salas-Eljatib

Examples


pvalz(1.96)

Datos de precipitación en Californa

Description

Datos de precipitación medidos en distintos lugares de california, con las coordenadas de los puntos y su distancia a la costa.

Usage

data(rainfallCA)

Format

Este set de datos contiene las siguientes columnas:

saple.id: Identificador del punto de muestreo.
easting: Coordenada este del punto.
northing: Coordenada norte del punto.
pp: Precipitación, en pulgadas.
ele: Elevación, en pies.
lat: Latitud del punto.
d.coast: Distancia a la costa, en millas.

Source

Los datos provienen de mediciones hechas en California

Examples

data(rainfallCA)    
head(rainfallCA) 
plot(pp~ele, data=rainfallCA)
hist(rainfallCA$pp)

Height growth of Nothofagus alpina trees in Chile.

Description

Time series data of height for rauli (Nothofagus alpina) trees in south-central Chile. These sampled trees are part of the ones used in Salas-Eljatib (2021, Ecological Applications). The full citation is provided below.

Usage

data(raulihg)

Format

The data frame contains four variables as follows:

tree.code: tree id code
spp: species common name
bha.t: breast-height age, in yrs.
h.t: total height, in m.

Source

Data were provided by Dr Christian Salas-Eljatib (Santiago, Chile).

References

Salas-Eljatib C. 2021. An approach to quantify climate-productivity relationships: an example from a widespread Nothofagus forest. Ecological Applications 31(4): e02285. doi:10.1002/eap.2285
Salas-Eljatib, C. 2021. Time series height-data for Nothofagus alpina trees. doi:10.6084/m9.figshare.13521602.v5

Examples

data(raulihg)
head(raulihg)

Crecimiento en altura de árboles de Nothofagus alpina.

Description

Datos de series de tiempo de altura para árboles muestreados de Nothofagus alpina (raulí) en el centro-sur de Chile. Estos árboles son parte de los usados en Salas-Eljatib (2021, Ecological Applications). La cita completa se da en referencias.

Usage

data(raulihg2)

Format

Contiene variables de nivel individual, como se describen a continuacion::

tree.code: Codigo del árbol
spp: Nombre comun especie
bha.t: Edad a la altura del pecho, en años.
h.t: Altura total, en m.

Source

Datos cedidos por el Prof. Christian Salas-Eljatib.

References

Salas-Eljatib C. 2021. An approach to quantify climate-productivity relationships: an example from a widespread Nothofagus forest. Ecological Applications 31(4): e02285. doi:10.1002/eap.2285
Salas-Eljatib C. 2021. Time series height-data for Nothofagus alpina trees. doi:10.6084/m9.figshare.13521602.v5

Examples

data(raulihg2)
head(raulihg2)

Rendimiento escolar por estudiante en Chile 2024

Description

Base de datos con información anónima de rendimiento escolar por estudiante, correspondiente al año 2024. Contiene 687033 observaciones de estudiantes de Enseñanza Media Humanístico Científica modalidad Jóvenes, pertenecientes a establecimientos municipales, particulares subvencionados y particulares pagados. Cada fila representa un estudiante y sus características básicas, incluyendo su promedio general, asistencia y situación final del curso.

Usage

data(rendesc2)

Format

Variables se describen a continuación:

region: Región de Chile del registro
comuna: Comuna de la region correspondiente
mrun: Identificador anónimo del estudiante
cod.depe: Código de dependencia administrativa del establecimiento (1 = municipal, 2 = particular subvencionado, 3 = particular pagado)
gen.alu: Género del estudiante (1 = hombre, 2 = mujer)
edad.alu: Edad del estudiante
prom.gral: Promedio general de notas (escala de 1.0 a 7.0)
asistencia: Porcentaje de asistencia anual del estudiante
sit.fin: Situación final del estudiante (P = promovido, R = reprobado)

Source

Ministerio de Educación de Chile (MINEDUC), portal de datos abiertos: https://datosabiertos.mineduc.cl/. Los datos fueron digitados por Saúl Ketterer, estudiante del Prof. Christian Salas-Eljatib.

References

MINEDUC (2024). Datos de rendimiento por estudiante. Subsecretaria de Educación.

Examples

data(rendesc2)
head(rendesc2)

Puntaje SIMCE 2023 en matemática 4to Básico por RBD

Description

Puntaje promedio por establecimiento del SIMCE 2023 en matemática de 4to Básico. Se tienen 6534 observaciones. La variable binaria (Y) es la presencia de convenio PIE en el establecimiento, donde Y=1 denota presencia y Y=0, lo contrario.

Usage

data(simce2)

Format

Variables se describen a continuación:

rbd: Rol Base de Datos del establecimiento
region: Región del establecimiento
comuna: Comuna del estableciimento
dependencia: Dependencia administrativa del establecimiento
prom.mate4b: Puntaje promedio del establecimiento en la prueba de matemática del SIMCE de 4to básico en 2023
mat.total: Cantidad de estudiantes matriculados en el establecimiento
convenio.pie: Establecimiento tiene convenio PIE (1 si, 0 no)

Source

Datos obtenidos desde la Agencia de Calidad de la Educación del Mineduc y desde el portal de DatosAbiertos del Mineduc (datosabiertos.mineduc.cl). Los datos fueron digitados por Diego Fernández, estudiante del Prof. Christian Salas-Eljatib.

Examples

data(simce2)
head(simce2)

Computes the skewness of a numeric vector

Description

The skewness is about the departure from symmetry of a frequency distribution. Therefore, It is about asymmetry. One way to assess asymmetry of a random variable is to compute an statistics representing its skewness. The current function an dimensionless statistics of the skewness of given vector.

Usage

skewn(x, na.rm = TRUE)

Arguments

x

A numeric vector representing a random variable.

na.rm

Logical value to remove NA values. The default is set to TRUE.

Details

The skewness of a random variable is the third moment of the standardized variable. There are several ways of parameterizing an skewness estimator, such as depending on the third moment and the standard deviation of the random variable.

Value

The value of the the skewness of given vector

Author(s)

Christian Salas-Eljatib.

Examples

y.var<-rnorm(100);x.var<-rbeta(100,.2,2)
skewn(y.var)
skewn(x.var)

Sludge data are at different cities, with a value of concentration zinc.

Description

Dataset contains 36 observations

Usage

data(sludge)

Format

Contains four variables, as follows:

city: Name of city.
rate: Concentration rate of sludge.
zinc: Value of concentration ( in ppm).
trt.comb: Combination between city and rate factors.

Source

The data were provided from.. still remember.

References

not yet

Examples

data(sludge2)    
table(sludge$city,sludge$rate) 
levels(sludge$city)
tapply(sludge$zinc, list(sludge$city,sludge$rate), mean)

Sludge data are at different cities, with a value of concentration zinc.

Description

Datos de contenido de Zinc en el tratamiento de lodos

Usage

data(sludge2)

Format

Contiene las siguinetes cuatro variables:

ciudad: Nombre de la ciudad.
tasa: Tasa de concentracion de lodo.
zinc: Concentracion de Zinc, en ppm.
trt.comb: Identificador de la combinacion de niveles entre los factores ciudad y tasa.

Source

The data were provided from.. still remember.

References

not yet

Examples

data(sludge2)    
table(sludge2$ciudad,sludge2$tasa) 
levels(sludge2$ciudad)
tapply(sludge2$zinc, list(sludge2$ciudad,sludge2$tasa), mean)

On the National System of State Protected Wild Areas (SNASPE) of Chile.

Description

Units of the National System of State Protected Wild Areas (SNASPE).

Usage

data(snaspe)

Format

Contains the following variables:

unit.id: Number for the unit.
unit: Name of the protected area.
category: Category of the unit. It can be either a National Park, a National Reserve or a Natural Monument.
county: Name of the county where the unit is located.
province: Province where the unit is located.
region: Region where the unit is located.
perim.km: Perimeter, in km.
area.ha: Area, in hectares.
area.m2: Area, in m^{2}.

Source

These data are freely available at https://ide.minagri.gob.cl

References

The Chilean SNASPE is under the direction of the Chilean Forest Service (CONAF). Further information and documentation can be found at https://www.conaf.cl

Examples

data(snaspe)
head(snaspe)
table(snaspe$category)
tapply(snaspe$area.ha,snaspe$category,mean)

Sistema nacional de areas protegidas del estado (SNASPE) de Chile

Description

Contiene variables general de las unidades del sistema de areas protegidas por el estado de Chile (SNASPE).

Usage

data(snaspe2)

Format

Contiene las siguientes variables para cada unidad del SNASPE:

uni.id: Número indentificador de la unidad.
unidad: Nombre de la unidad.
categoria: Categoría de la unidad. Puede ser Parque Nacional, Reserva Nacional, o Monumento Natural.
comuna: Nombre de la communa donde esta la unidad.
province: Nombre de la provincia donde esta la unidad.
region: Nombre de la región.
perim.km: Perimetro, en km.
area.ha: Área, en hectareas.
area.m2: Área, en m^{2}.

Source

Estos datos fueron obtenidos desde https://ide.minagri.gob.cl

References

EL SNASPE esta bajo la administración de la Corporación Nacional Forestal (CONAF) de Chile. Mayor información se puede encontrar en https://www.conaf.cl

Examples

data(snaspe2)
head(snaspe2)
table(snaspe2$categoria)
tapply(snaspe2$area.ha,snaspe2$categoria,mean)

Soil treatment experiment in tree seedlings

Description

A test was made of the effect of three soil treatments on the height growth of 2-year-old seedlings. Treatments were assigned at random to the three plots within each of 11 blocks. Each plot was made up of 50 seedlings. Average 5-year height growth was the criterion for evaluating treatments.

Usage

data(soiltreat)

Format

Contains the four following columns, at the plot-level,

block: Block unit.
treat: Treatment level.
ini.h: Initial height, in m.
inc.h: Increment in height during 5-year, in m.

Source

Table in page 71 of Freese (1967). The data were entered by Miss Nayeli Ramirez, a former student of Prof. Christian Salas-Eljatib.

References

Freese, F 1967. Elementary statistical methods for foresters. Agriculture Handbook 3171, USDA Forest Service.

Examples

data(soiltreat)
head(soiltreat)
tapply(soiltreat$inc.h,soiltreat$treat,summary)
tapply(soiltreat$inc.h,soiltreat$treat,sd)

Tratamientos del suelo en el crecimiento de plantulas.

Description

Un experimento sobre el efecto de tres tratamientos del suelo en el crecimiento en altura de plantulas de 2-años de edad. Los tratamientos fueron asignados aleatoriamente a tres parcelas dentro de cada uno de 11 bloques. Cada parcela esta constituida por hasta 50 plantulas. El promedio del incremento en altura de los últimos 5 años fue la variable de interes para evaluar los tratamientos.

Usage

data(soiltreat2)

Format

Los datos, a nivel de parcela, tienen las siguientes columnas,

bloque: Bloque del experimento.
tmo: Factor tratamiento, medido en tres nivels.
alt.ini: Altura initial, rn m.
alt.inc: Incremento en altura durante los últimos cinco años, en m.

Source

Cuadro de la página 71 de Freese (1967). Los datos fueron digitados por la Srta. Nayeli Ramirez, una estudiante del Prof. Christian Salas-Eljatib.

References

Freese, F 1967. Elementary statistical methods for foresters. Agriculture Handbook 3171, USDA Forest Service.

Examples

data(soiltreat2)
head(soiltreat2)
tapply(soiltreat2$alt.inc,soiltreat2$tmo,summary)
tapply(soiltreat2$alt.inc,soiltreat2$tmo,sd)

Tree locations for several plots of Norway spruce (Picea abies) in Austria

Description

The Austrian Research Center for Forests established a spacing experiment with Norway spruce (Picea abies) in the Vienna Woods. In the 'Hauersteig' experiment, several tree-level variables were measured within four sample plots over time. The current dataframe has only the measurements carried out in 1944.

Usage

data(spataustria)

Format

Contains cartesian position of trees, and covariates, in sample plots, as follows:

plot: Plot number.
tree: Tree number.
species: Species code as follows: PCAB=Picea abies, LADC=Larix decidua, PNSY=Pinus sylvestris, FASY=Fagus Sylvatica, QCPE=Quercus petraea, BTPE=Betula pendula.
x.coord: Cartesian position in the X-axis, in m.
y.coord: Cartesian position in the Y-axis, in m.
year: Measurement year.
dbh: diameter at breast-height, in cm.

References

Kindermann G. Kristofel F, Neumann M, Rossler G, Ledermann T & Schueler.

109 years of forest growth measurements from individual Norway spruce trees. Sci. Data 5:180077 doi:10.1038/sdata.2018.77

Examples

data(spataustria)
head(spataustria)
df<-spataustria
oldpar<-par(mar=c(4,4,0,0))
bord<-data.frame(
 x=c(min(df$x.coord),max(df$x.coord),min(df$x.coord),max(df$x.coord)),
 y=c(min(df$y.coord),min(df$y.coord),max(df$y.coord),min(df$y.coord))
 )
plot(bord,type="n", xlab="x (m)", ylab="y (m)", asp=1, bty='n')
points(df$x.coord,df$y.coord,col=df$plot,cex=0.5)
par(oldpar)

Creates a LaTeX file having an ANOVA table for a previously fitted linear regression model

Description

Function to create a LaTeX file of an ANOVA table.

Function to create a LaTeX file for a table with the main fitting statistics from a fitted regression model.

Usage

tabtexanova(
  mod = mod,
  nametab = nametab,
  cap = cap,
  save.file = FALSE,
  filename = "tabregre.tex",
  eng = TRUE,
  rowlab = "Source of variation",
  decnum = 3,
  font.size.tab = "normalsize",
  font.type.tab = "normalfont"
)

tabtexregre(
  mod = mod,
  nametab = nametab,
  cap = cap,
  save.file = FALSE,
  filename = "tabregre.tex",
  eng = TRUE,
  rowlab = "Parameter",
  decnum = 3,
  font.size.tab = "normalsize",
  font.type.tab = "normalfont"
)

Arguments

mod

an object containing the fitted model by using the lm() function.

nametab

a string having a brief name to be used in both the label of the table and the file name. For instance, if "=mod1", the table can be refered in your LaTeX document by using ⁠\ref{tab:mod1}⁠

cap

a string having the caption of the LaTeX table.

save.file

The defauls is set to “FALSE”, if is set to TRUE, then the option filename must be provided.

filename

A string having the name of the resulting LaTeX file having the table. The default is set to "tabdescdata.tex".

eng

The language to be used in the output. English is the default, meanwhile if eng=FALSE, Spanish is used.

rowlab

a character with the name to be used as label for the column where the variables will be printed. The default is set to "Parameter".

decnum

the number of decimals to be used in the output. The default is set to 3.

font.size.tab

The defauls is set to "normalsize". You could also try with "footnotesize".

font.type.tab

The defauls is set to "normalfont".

Details

The resulting file is a LaTeX table, that can be added to your main LaTeX document by using ⁠\input{filename}⁠.

Value

This function creates a LaTeX file having an ANOVA table, from a fitted regression model.

This function creates a LaTeX file having the main fitting statistics of a linear regression model.

Author(s)

Christian Salas-Eljatib.

References

Salas-Eljatib, C. 2021. Análisis de datos con el programa estadístico R: una introducción aplicada. Ediciones Universidad Mayor, Santiago, Chile. 170 p. https://eljatib.com/rlibro

Examples


df <- datana::fishgrowth2
head(df)
descstat(df[,c("largo","edad")])
plot(largo ~ edad, data=df)
mod1<-lm(largo ~ edad, data=df)
##example 1
tabtexanova(mod=mod1,nametab="anovatab",
cap="ANOVA-style table of the fitted regression model")
##example 2
tabtexanova(mod=mod1,nametab="anovatab",
cap="Cuadro estilo ANOVA para modelo de regresion ajustado",
eng=FALSE)

df <- datana::fishgrowth2
head(df)
datana::descstat(df[,c("largo","edad")])
graphics::plot(largo ~ edad, data=df)
mod1<-stats::lm(largo ~ edad, data=df)
##example 1
tabtexregre(mod=mod1,nametab="basicmodel",
cap="Parameter estimates of the fitted regression model")
##example 2
tabtexregre(mod=mod1,nametab="basicmodel",
cap="Cuadro con parametros estimados del modelo de regresion",
eng=FALSE)

Creates a LaTeX file having a descriptive statistics table for continuous variables

Description

Function to create a LaTeX file for a table of descriptive statistics of continuous variables from a dataframe.

Usage

tabtexdescstat(
  data = data,
  colnames = colnames,
  varnames = varnames,
  cap = cap,
  nametab = nametab,
  save.file = FALSE,
  filename = "tabdescdata.tex",
  eng = TRUE,
  rowlab = "Variable",
  decnum = 3,
  font.size.tab = "normalsize",
  font.type.tab = "normalfont"
)

Arguments

data

a dataframe containing numeric variables as columns.

colnames

a string having the column names of the dataframe to which the descriptive statistics will be computed.

varnames

a string having the name of each of the variables to be used in the LaTeX table.

cap

a string having the caption of the LaTeX table.

nametab

a string having a brief name to be used in both the label of the table and the file name. For instance, if "=descdata", the table can be refered in your LaTeX document by using ⁠\ref{tab:descdata}⁠

save.file

The defauls is set to “FALSE”, if is set to TRUE, then the option filename must be provided.

filename

A string having the name of the resulting LaTeX file having the table. The default is set to "tabdescdata.tex".

eng

The language to be used in the output. English is the default, meanwhile if eng=FALSE, Spanish is used.

rowlab

a character with the name to be used as label for the column where the variables will be printed. The default is set to "Variables".

decnum

the number of decimals to be used in the output. The default is set to 3.

font.size.tab

The defauls is set to "normalsize". You could also try with "footnotesize".

font.type.tab

The defauls is set to "normalfont".

Details

The resulting file is a LaTeX table, that can be added to your main LaTeX document by using ⁠\input{filename}⁠.

Value

This function creates a LaTeX file having the following descriptive statistics: sample size, minimum, maximum, mean, median, SD, and coefficient of variation. If the full option is set to TRUE, the following statistics are added to the table: 25th and 75th percentiles, the interquartile range, skewness, and kurtosis.

Author(s)

Christian Salas-Eljatib.

References

Salas-Eljatib, C. 2021. Análisis de datos con el programa estadístico R: una introducción aplicada. Ediciones Universidad Mayor, Santiago, Chile. 170 p. https://eljatib.com/rlibro

Examples


df <- datana::idahohd
head(df)
##example 1
tabtexdescstat(data=df,nametab="idaho",
cap="Descriptive statistics table",
colnames=c("dbh","height"),varnames = c("Diameter","Height"))
##example 2
tabtexdescstat(data=df,nametab="idaho",
cap="Cuadro con estadistica descriptiva",
 colnames=c("dbh","height"),varnames = c("Diametro","Altura"),
 eng=FALSE)

Produces a time series plot

Description

Produces a time series plot, of variable 'y' as a function of 'x' by an observational unit factor.

Usage

timeserplot(
  data = data,
  y = y,
  x = x,
  obs.unit = obs.unit,
  factor1 = NA,
  factor2 = NA,
  only.lines = FALSE,
  ylab = NA,
  xlab = NA,
  linetype.lab = NA,
  factor2.line = TRUE,
  factor2.col = FALSE,
  col.lines = "black",
  max.y.all = NA,
  levels.i.want = FALSE,
  col.lev.i.want = FALSE
)

Arguments

data

a dataframe with at least tree columns representing the response variable ("y"), the main predictor variable ("x"), and a variable indicating the observational unit ("obs.unit").

y

a character giving the column name of the response variable or variable of interest.

x

a character giving the column name of the main predictor variable. Generally this variable is time.

obs.unit

a character giving the column name containing the info of the observational unit.

factor1

an optional character having the name of a column having a factor variable (e.g., treatment). The detault value is set to NULL.

factor2

an optional character having the name of a column having another factor variable (e.g., species). The detault value is set to NULL.

only.lines

a logic value if only lines, but not including dots, are going to be drwan in the plot. The detault value is set to FALSE.

ylab

Label for the Y-axis

xlab

Label for the X-axis

linetype.lab

is an optional string to be used as the title of the factor being represented by lines. It is only needed if factor1 and factor2 are defined. See example.

factor2.line

a logic value if the second factor, factor2, is going to be segregated according to the type of lines. The detault value is set to TRUE.

factor2.col

a logic value if the second factor, factor2, is going to be segregated according to the color of the lines only. The detault value is set to FALSE.

col.lines

A string specifying the single color to be used for the lines of the timeseries

max.y.all

A number representing the maximum level of Y-axis for all classes

levels.i.want

A vector having the levels for the factor under study

col.lev.i.want

A vector having the colors to be used for the factor under study

Details

Both 'y' and 'x' must be numeric variables, and the column representing the observational unit, must be a factor. This factor identifies the longitudinal context of the data, for instance, a student being measured on time. Besides, two more factors can be added to the plotting details, in order to represent the potential variability among them.

Value

This function returns a time series plot

Note

Please, uses the function with caution, and run first the examples to understand it better.

Author(s)

Christian Salas-Eljatib

References

Salas-Eljatib, C. 2021. Análisis de datos con el programa estadístico R: una introducción aplicada. Ediciones Universidad Mayor, Santiago, Chile. 170 p. https://eljatib.com/rlibro

Examples

data(ficdiamgr, package="datana")
df <- ficdiamgr
head(df)
str(df)
df$site<-as.factor(df$site)
df$species<-as.factor(df$species)
table(df$tree,df$species)
table(df$species,df$site)
#
timeserplot(df, y="dbh", x="time", obs.unit = "tree")
timeserplot(df, y="dbh", x="time", obs.unit = "tree", only.lines = TRUE)
#
## Otros ejemplos de uso de la funcion
timeserplot(df, y="dbh", x="time", obs.unit = "tree", col.lines = "blue",
only.lines = TRUE)
timeserplot(df, y="dbh", x="time", obs.unit = "tree", only.lines = FALSE)
#
timeserplot(df, y="dbh", x="time", obs.unit = "tree", factor1="site")
timeserplot(df, y="dbh", x="time", obs.unit = "tree", factor1="site",
factor2= "species")
timeserplot(df, y="dbh", x="time", obs.unit = "tree", factor1="site",
 factor2= "species", factor2.col = TRUE, only.lines = TRUE)

Diameter, height and volume for Black Cherry Trees

Description

This data set provides measurements of the diameter, height and volume of timber in 31 felled black cherry trees. The records are a slight modification to the original dataframe "trees" from the datasets R package.

Usage

data(treevol)

Format

A data frame with 31 observations and three variables

dbh: Diameter at breast height, in cm.
toth: Total height, in m.
vtot: Timber volume, in cubic meters.

Source

Ryan TA, Joiner BL, and Ryan BF. 1976. The Minitab Student Handbook. Duxbury Press.

Examples

pairs(treevol, panel = panel.smooth, main = "treevol dataframe")
plot(vtot ~ dbh, data = treevol, log = "xy")
coplot(log(vtot) ~ log(dbh) | toth, data = treevol,
       panel = panel.smooth)
summary(m1 <- lm(log(vtot) ~ log(dbh), data = treevol))
summary(m2 <- update(m1, ~ . + log(toth), data = treevol))
anova(m1,m2)

Volumen, altura, y diámetro para árboles de Black Cherry

Description

Estos datos provienen de mediciones de volumen, altura y diámetro en 31 árboles volteados de black cherry (Prunus serotina). Son una modificacion la dataframe 'trees' del paquete datasets de R.

Usage

data(treevol2)

Format

Datos con 31 observaciones y tres variables

dap: diámetro a la altura del pecho, en cm
atot: altural total, en m
vtot: volumen total, en m^{3}

Source

Ryan, T. A., Joiner, B. L. and Ryan, B. F. (1976) The Minitab Student Handbook. Duxbury Press.

Examples

pairs(treevol2, panel = panel.smooth, main = "treevol dataframe")
plot(vtot ~ dap, data = treevol2, log = "xy")
coplot(log(vtot) ~ log(dap) | atot, data = treevol2,
       panel = panel.smooth)
summary(m1 <- lm(log(vtot) ~ log(dap), data = treevol2))
summary(m2 <- update(m1, ~ . + log(atot), data = treevol2))
anova(m1,m2)

Tree volume of roble (Nothofagus obliqua) in the Rucamanque forest

Description

These are tree-level measurement data of sample trees in the Rucamanque experimental forest, near Temuco, in the Araucania region in south-central Chile, measured in 1999. The data are the same as in the dataframe "treevolruca", but only having observations for the species Nothofagus obliqua (roble).

Usage

data(treevolroble)

Format

Contains tree-level variables, as follows:

tree.no: Tree id
dbh: Diameter at breast height, in cm
toth: Total height, in m.
d6: Upper-stem diameter at 6 m, in cm
totv: Tree gross volume, in m^{3} with bark.

Source

The data are provided courtesy of Dr Christian Salas at the Universidad de Chile (Santiago, Chile).

References

Salas C. 2002. Ajuste y validación de ecuaciones de volumen para un relicto del bosque de Roble-Laurel-Lingue. Bosque 23(2): 81-92. doi:10.4067/S0717-92002002000200009 https://eljatib.com/publication/2002-07-01_ajuste_y_validacion_/

Examples

data(treevolroble)    
head(treevolroble)

Volumen a nivel de árbol para roble (Nothofagus obliqua) especie en el bosque de Rucamanque

Description

Volumen, altura y diámetro, entre otras para árboles muestra de Nothofagus obliqua (roble) en el bosque de Rucamanque, cerca de Temuco, en la región de la Araucania, en el sur de Chile.

Usage

data(treevolroble2)

Format

Las siguientes columnas son parte de la dataframe:

arbol: Número del árbol.
especie: Especie.
dap: Diámetro a la altura del pecho, en cm.
atot: Altura total, en m.
d6: Diámetro fustal a los 6 m, en cm.
vtot: Volumen bruto total, en m^{3} with bark.

Source

Los datos son proporcionados por el Prof. Christian Salas (Universidad de Chile).

References

Salas C. 2002. Ajuste y validación de ecuaciones de volumen para un relicto del bosque de Roble-Laurel-Lingue. Bosque 23(2): 81-92. doi:10.4067/S0717-92002002000200009 https://eljatib.com/publication/2002-07-01_ajuste_y_validacion_/

Examples

data(treevolroble2)    
head(treevolroble2)

convert the first n-characters of a string to upper-case letters.

Description

Function to upper-case the first n-characters of a string from the left-hand side.

Usage

upperleft(fac, n = 1)

Arguments

fac

is an object of class string or factor

n

is the number of characters to be converted of a the string given in fac.

Details

It is specially set to arrange data vector having alphanumeric (i.e., letters) format.

Value

This function returns an object having the first n-characters from the left-hand side in upper-case.

Author(s)

Christian Salas-Eljatib

References

Salas-Eljatib, C. 2021. Análisis de datos con el programa estadístico R: una introducción aplicada. Ediciones Universidad Mayor, Santiago, Chile. 170 p. https://eljatib.com/rlibro

Examples


fac.x<-"willkommen"
upperleft(fac.x)
upperleft(fac.x,n = 2)
upperleft(fac.x,2)
upperleft(fac.x,3)
#A longer vector of characters
fac.x<-c("willkommen","welcome","bem-vindo","bievenido")
upperleft(fac.x,1)

Function to compute prediction statistics based on observed values

Description

Computes three prediction statistics as a way to compare observed versus predicted values of a response variable of interest. The statistics are: the aggregated difference (AD), the root mean square differences (RMSD), and the aggregated of the absolute value differences (AAD). All of them area based on

r_i = y_i - \widehat{y}_i

where y_i and \widehat{y}_i are the observed and the predicted value of the response variable y for the i-th observation, respectively. Both the observed and predicted values must be expressed in the same units.

Usage

valesta(y.obs = y.obs, y.pred = y.pred)

Arguments

y.obs

observed values of the variable of interest

y.pred

predicted values of the variable of interest

Details

The function computes the three aforementioned statistics expressed in both (a) the units of the response variable and (b) the percentage. Notice that to represent each statistic in percentual terms, we divided them by the mean observed value of the response variable.

Value

The main output following six prediction statistics as a vector: (RMSD, RMSD.p, AD, AD.p, AAD, AAD.p); where RMSD.p stands for RMSD expressed as a percentage, and the same applies to AD.p and AAD.p.

Author(s)

Christian Salas-Eljatib.

References

Salas C, Ene L, Gregoire TG, Nasset E, Gobakken T. 2010. Modelling tree diameter from airborne laser scanning derived variables: a comparison of spatial statistical models. Remote Sensing of Environment 114(6):1277-1285. doi:10.1016/j.rse.2010.01.020
Salas C. 2002. Ajuste y validación de ecuaciones de volumen para un relicto del bosque de roble-laurel-lingue. Bosque 23(2):81–92. doi:10.4067/S0717-92002002000200009.

Examples


#Creates a fake dataframe
set.seed(1234)
df <- as.data.frame(cbind(Y=rnorm(30, 30,9), X=rnorm(30, 450,133)))
#fitting a candidate model
mod1 <- lm(Y~X, data=df)
#Using the valesta function
valesta(y.obs=df$Y,y.pred=fitted(mod1))

Function for building a scatterplot with superposing boxplots

Description

The function creates a scatterplot with superposing boxplots for the Y-axis variable segregated by classes (i.e., groups) of the X-axis variable. For a scatterplot between a response variable Y and a predictor variable X, this function superposes boxplots of the response by groups of the predictor variable. The main aim of the above described graph is to get a sense of the distribution of the response variable depending upon the predictor variable.

Usage

xyboxplot(
  x = x,
  y = y,
  col.dots = "blue",
  transp.dots = 0.1,
  xlab = NULL,
  ylab = NULL,
  num.classes = 10,
  segre.type = "percentile",
  limi.classes = NA,
  x.category = FALSE,
  pch.dots = 19,
  col.box = "red",
  transp.boxp = 0.07,
  xlim = NA,
  ylim = NA,
  class.ticks.lwd = 1,
  class.ticks.col = "red",
  class.marks.col = "black",
  cex.dots = 0.7,
  class.marks = FALSE,
  class.ticks = TRUE
)

Arguments

x

A numeric vector representing the X-axis variable.

y

A numeric vector representing the Y-axis variable (response).

col.dots

A string specifying the dot colors. The default value is "blue".

transp.dots

A numeric value to be used as transparency for the dots of the figure to be produced. The defauls is set to 0.2

xlab

(optional) A string specifying X-axis label.

ylab

(optional) A string specifying Y-axis label.

num.classes

The number of classes to be used for computing the prediction capabilities. The default is set to 10.

segre.type

A string specifying the type of segregation to build the classes. The types are: (a) percentile implies to segregate with the same amount, or close, of observations to each of the defined num.classes. (b) user.defined implies that the user must provided the limits of the num.classes-1. The default is set to percentile. Notice if user.defined is specified, the option

limi.classes

A vector of size num.classes-1 containing the limits to be used for defining the classes.

x.category

A logical statement, if set to TRUE, the X-axis variable will be treated as categorical for the drawing of the boxplots. The default is set to FALSE.

pch.dots

A numeric factor altering the shape of the dots.

col.box

A string specifying the boxplot color. The default is "red"

transp.boxp

A numeric value to be used as transparency for the boxpot of the figure to be produced. The defauls is set to 0.1

xlim

(optional) A numeric vector having the minimum and maximum, respectively for the X-axis variable.

ylim

(optional) A numeric vector having the minimum and maximum, respectively for the Y-axis variable.

class.ticks.lwd

The numeric width of the tick line for each of the X-axis variable classes. By default is set to 1.

class.ticks.col

A string with the color of the tick line for each of the X-axis variable classes. By default is set to "red".

class.marks.col

A string with the color of the mark value for each of the X-axis variable classes. By default is set to "black".

cex.dots

A numeric factor altering the size of the dots. The default value is 0.7.

class.marks

Whether (logic: TRUE or FALSE) the number value of each of the X-axis variable classes should be printed. By default is set to FALSE.

class.ticks

Whether (logic: TRUE or FALSE) the number tick of each of the X-axis variable classes should be printed. By default is set to TRUE.

Details

Notice that the superposing boxplots for the Y-axis variable are computed by grouping the X-axis variable in 10 classes. Those classes are set by computing the 0.1, 0.2, ..., 0.9-percentiles of the X-axis variable, therefore each group has the same number of observations. The wide of the boxplot represent the extend of the respective X-axis variable used for drawwing each boxplot.

Value

The function returns the above described graph.

Author(s)

Christian Salas-Eljatib

References

Salas-Eljatib C. 2021. Análisis de datos con el programa estadístico R: una introducción aplicada. Ediciones Universidad Mayor. Santiago, Chile. 170 p. https://eljatib.com

Salas C, Stage AR, and Robinson AP. 2008. Modeling effects of overstory density and competing vegetation on tree height growth. Forest Science 54(1):107-122. doi:10.1093/forestscience/54.1.107

Examples

df <- datana::fishgrowth
xyboxplot(x=df$length,y=df$scale)
xyboxplot(x=df$length,y=df$scale,col.dots = "red",
xlab="Variable X")
xyboxplot(x=df$length,y=df$scale,xlab="Variable X")

## dots with alpha channel
xyboxplot(x=df$length,y=df$scale,xlab="Variable X",
transp.dots = 0.4)

## with categorical x
xyboxplot(x=df$age,y=df$length,x.category = TRUE)

## fixed x axis limits
xyboxplot(x=df$age,y=df$length,x.category = TRUE, xlim = c(0,10))

## x marks width to .5
xyboxplot(x=df$age,y=df$length,x.category = TRUE, xlim = c(0,10),
          class.ticks.lwd = .5)

## x marks red and width 2
xyboxplot(x=df$age,y=df$length,x.category = TRUE, xlim = c(0,10),
          class.ticks.lwd = 2, class.ticks.col = "red")

## larger dots
xyboxplot(x=df$age,y=df$length,x.category = TRUE, xlim = c(0,10),
          cex.dots = 1.5)

## print classes ticks
xyboxplot(x=df$age,y=df$length,x.category = TRUE, xlim = c(0,10),
          class.marks = FALSE, class.ticks.col = "green")

### the x-variable not recorded such as a categorical variable
df <- datana::fishgrowth
## print classes ticks, by default with red color
xyboxplot(x=df$length, y=df$scale)

## don't print ticks
xyboxplot(x=df$length, y=df$scale, class.ticks=FALSE)

## print classes marks values
xyboxplot(x=df$length, y=df$scale, class.marks=TRUE)

## print classes marks values without ticks
xyboxplot(x=df$length, y=df$scale, class.marks=TRUE, class.ticks=FALSE)

## change class marks and ticks colors
xyboxplot(x=df$length, y=df$scale, class.marks=TRUE,
          class.marks.col = "red",
          class.ticks.col = "blue")

## bigger ticks
xyboxplot(x=df$length, y=df$scale, class.marks=TRUE,
          class.marks.col = "red",
          class.ticks.col = "blue", class.ticks.lwd=3)

## Changing the number of the X-variable classes
xyboxplot(x=df$length,y=df$scale,num.classes=5)

## Defining the classes not by percentiles, but by fixed values
xyboxplot(x=df$length,y=df$scale,xlim=c(0,410),
ylim=c(0,20),num.classes=4,
segre.type="fixed",limi.classes=c(140,195,250))

## Note that the limits must be in agreement with the num.classes
xyboxplot(x=df$length,y=df$scale,xlim=c(0,410),ylim=c(0,20),
num.classes=5,segre.type="fixed",limi.classes=c(100,160,200,250))

A scatterplot with marginal histograms

Description

The function produces a scatterplot between the 'y'-axis variable and the 'x'-axis variable, but also adding the marginal histograms for both variables.

Usage

xyhist(
  x = x,
  y = y,
  col.x = "blue",
  col.y = "red",
  xlab = NULL,
  ylab = NULL,
  x.lim = NULL,
  y.lim = NULL
)

Arguments

x

A numeric vector representing the X-axis variable

y

A numeric vector representing the Y-axis variable

col.x

(optional) A string specifying the color of the histogram of the X-variable. Default is "blue".

col.y

(optional) A string specifying the color of the histogram of the Y-variable. Default is "red".

xlab

(optional) A string specifying X-axis label. Default is "xvar".

ylab

(optional) A string specifying Y-axis label. Default is "yvar".

x.lim

(optional) A vector of two elements with the limits of the Y-axis. Default is the range of the X-variable.

y.lim

(optional) A vector of two elements with the limits of the Y-axis. Default is the range of the Y-variable.

Details

Both the response variable (Y-axis) and the predictor variable (X-axis) must be numeric.

Value

The function returns the above described graph.

Author(s)

Christian Salas-Eljatib

References

Salas-Eljatib C. 2021. Análisis de datos con el programa estadístico R: una introducción aplicada. Ediciones Universidad Mayor. Santiago, Chile. https://eljatib.com

Examples

data(treevolroble)
df <- datana::treevolroble
head(df)
xyhist(x=df$dbh,y=df$toth)
xyhist(x=df$dbh,y=df$toth, xlab="Variable X",  ylab="Variable Y")
xyhist(x=df$dbh,y=df$toth, xlab="Variable X", ylab="Variable Y",
  col.x = "gray",col.y="white")

Figure of a matrix of scatterplots and histograms for several variables.

Description

The function produces a panel of multiple scatterplots and histograms, showing the correlation coefficient among all pairs of variables. Notice that the data must contain only numeric variables.

Usage

xymultiplot(
  x,
  smooth = TRUE,
  scale = FALSE,
  density = TRUE,
  digits = 2,
  method = "pearson",
  pch = 20,
  lm = FALSE,
  cor = TRUE,
  jiggle = FALSE,
  factor = 2,
  col.hist = "cyan",
  col.densi.curve = "black",
  show.points = TRUE,
  col.points = "gray",
  smoother = FALSE,
  col.smooth = "red",
  ellipses = FALSE,
  col.ellip = "blue",
  col.cent.point = "green",
  rug = TRUE,
  breaks = "Sturges",
  cex.cor = 1,
  ci = FALSE,
  alpha = 0.05,
  ...
)

Arguments

x

is a dataframe containing all the numeric variables to be used for drawing the panel plot

smooth

a logical value for drawing smooth curves. The default is set to TRUE.

scale

scales the correlation font by the size of the absolute correlation. The default is set to FALSE.

density

a logical value for drawing a density curve. The default is set to TRUE.

digits

an optional numeric value for the digits to be used for drawing the correlation coefficient in the panel. Defaults is set to 2.

method

a string giving the method to be used for computing the correlation coefficient. Default is set to "pearson".

pch

The plot character (The default is 20, which looks like '.').

lm

Plot the linear fit rather than the LOESS smoothed fits. The default is FALSE.

cor

If plotting regressions, should correlations be reported? The default is TRUE.

jiggle

Should the points be jittered before plotting? The default is FALSE.

factor

factor for jittering (1-5), therefore only needed if "jiggle" is set to TRUE.

col.hist

a string giving the color to be used for the histograms of the panel. Default is set to "cyan".

col.densi.curve

a string with the name of the color to be used for the density curve. The default is set to "black".

show.points

a logical value for drawing the points in the scatter-plots. Defauls is set to TRUE.

col.points

a string giving the color to be used for the data points. Default is set to "gray".

smoother

If TRUE, then smooth.scatter the data points-slow but pretty with lots of subjects

col.smooth

a string giving the color to be used for the smoothed curve of the scatterplot. Default is set to "red".

ellipses

an optional logical value for drawing an ellipse for the scatter-plots. The default is set to FALSE.

col.ellip

a string giving the color to be used for the ellipse of the scatterplot. The default is set to "blue".

col.cent.point

a string giving the color to be used for the centroid point of the ellipse of the scatterplot. The default is set to "blue".

rug

a logical value for drawing the rugs in the histograms. Defauls is set to TRUE.

breaks

a string giving the method to be used for obtaining the breaks of the histogram. Defauls is set to "Sturges".

cex.cor

If this is specified, this will change the size of the text in the correlations. this allows one to also change the size of the points in the plot by specifying the normal cex values. If just specifying cex, it will change the character size, if cex.cor is specified, then cex will function to change the point size.

ci

Draw confidence intervals for the linear model or for the loess fit, defaults to ci=FALSE. If confidence intervals are not drawn, the fitting function is lowess.

alpha

an optional numeric value for the significance level. Defauls is set to 0.05.

...

other graphical parameters (see par and section ‘Details’ below).

Details

Generates a multipanel (matrix) of scatterplots and histograms to explore potential relationships among variables.

Value

This function returns a multipanel of scatterplots and histograms

Author(s)

A modification of Christian Salas-Eljatib of the function pairs.panels of the package psych.

References

Salas-Eljatib C. 2021. Análisis de datos con el programa estadístico R: una introducción aplicada. Ediciones Universidad Mayor. Santiago, Chile. https://eljatib.com

Examples

##First example
data(bears2)
head(bears2)
df <- bears2[,c('peso','edad','cabezaL','cabezaA','largo','pechoP')]
descstat(df)
xymultiplot(df)
xymultiplot(df,ellipse=TRUE)
xymultiplot(df,ellipses=TRUE,col.cent.point = "yellow",
 col.densi.curve = "dark green",col.hist = "white")