Title: Datasets and Functions to Accompany Analisis De Datos Con R
Version: 1.1.1
Description: Datasets and functions to accompany the book 'Analisis de datos con el programa estadistico R: una introduccion aplicada' by Salas-Eljatib (2021, ISBN: 9789566086109). The package helps carry out data management, exploratory analyses, and model fitting.
License: GPL (≥ 3)
Encoding: UTF-8
RoxygenNote: 7.3.2
Depends: R (≥ 3.5)
LazyData: true
LazyDataCompression: xz
Imports: ggplot2, graphics, Hmisc, methods, scales, stats, utils
Suggests: lattice, testthat (≥ 3.0.0)
Config/testthat/edition: 3
NeedsCompilation: no
Packaged: 2025-07-21 21:21:46 UTC; christian
Author: Christian Salas-Eljatib ORCID iD [aut, cre], Campos Nicolás ORCID iD [ctb], Pino Nicolas [ctb] (up to 2020), Riquelme Joaquin [ctb] (up to 2020)
Maintainer: Christian Salas-Eljatib <cseljatib@gmail.com>
Repository: CRAN
Date/Publication: 2025-07-21 22:40:02 UTC

datana: Datasets and Functions to Accompany Analisis De Datos Con R

Description

Datasets and functions to accompany the book 'Analisis de datos con el programa estadistico R: una introduccion aplicada' by Salas-Eljatib (2021, ISBN: 9789566086109). The package helps carry out data management, exploratory analyses, and model fitting.

Author(s)

Maintainer: Christian Salas-Eljatib cseljatib@gmail.com (ORCID)

Other contributors:


About the R-Squared statistics: the Anscombe quartet dataset

Description

A dataset that contains four pairs of columns with the same descriptive statistics; however, there is a difference when representing the points through a graph.

Usage

data(aboutrsq)

Format

The data frame contains four variables as follows:

X1

Integers values that represent X-axis for Y1, Y2 and Y3 column

Y1

Float values that represent Y-axis for X1 column

Y2

Float values that represent Y-axis for X1 column

Y3

Float values that represent Y-axis for X1 column

X2

Integers values that represent X-axis for Y4 column

Y4

Float values that represent Y-axis for X2 column

Source

Data were assembled by Dr Christian Salas-Eljatib (Santiago, Chile).

References

Anscombe FJ. 1973. Graphs in statistical analysis. The American Statistician 27:17-21. doi:10.2307/2682899

Examples

data(aboutrsq)    
head(aboutrsq) 

Sobre el estadístico R2: los datos del cuarteto de Anscombe

Description

Dataset que contiene cuatro pares de columnas con la mismos estadísticos descriptivos, sin embargo, si existe diferencia al representar los puntos mediante un gráfico.

Usage

data(aboutrsq2)

Format

Variables se describen a continuación::

X1

Valores enteros que representan el eje X para las columnas Y1, Y2 e Y3

Y1

Valores flotantes que representan el eje Y para la columna X1

Y2

Valores flotantes que representan el eje Y para la columna X1

Y3

Valores flotantes que representan el eje Y para la columna X1

X2

Valores enteros que representan el eje X para las columnas Y4

Y4

Valores flotantes que representan el eje Y para la columna X2

Source

Datos fueron contribuidos por el Prof. Christian Salas-Eljatib (Universidad de Chile, Santiago, Chile).

References

Anscombe FJ. 1973. Graphs in statistical analysis. The American Statistician 27:17-21. doi:10.2307/2682899

Examples

data(aboutrsq2)    
head(aboutrsq2) 

Airquality data in New York city.

Description

Daily air quality measurements in New York, May to September 1973.

Usage

data(airnyc)

Format

Contains 6 variables, as follows:

ozone

numeric Ozone (ppb).

solar

numeric Solar R (lang).

wind

numeric Wind (mph).

temp

numeric Temperature (degrees F).

month

numeric Month (1–12).

day

numeric Day of month (1–31).

Source

The data were obtained from the library datasets.

References

Chambers J, Cleveland W, Kleiner B, Tukey P. 1983. Graphical Methods for Data Analysis. Belmont. CA: Wadsworth.

Examples

data(airnyc)    
head(airnyc) 

Calidad del aire en la ciudad de Nueva York.

Description

Calidad del aire diario medido en New York, de Mayo a Septiembre de 1973.

Usage

data(airnyc2)

Format

Contiene 6 variables:

ozone

Ozono (ppb).

solar

Solar R (largo).

wind

Viento (mph).

temp

Temperatura (grados F).

month

Mes del año (1–12).

day

Dia del mes (1–31).

Source

Los datos fueron obtenidos desde la librería 'datasets'.

References

Chambers J, Cleveland W, Kleiner B, Tukey P. 1983. Graphical Methods for Data Analysis. Belmont. CA: Wadsworth.

Examples

data(airnyc2)    
head(airnyc2) 

Time series of annual precipitations in cities of Chile.

Description

Data contains annual precipitations in six cities in Chile (Santiago, Talca, Chillán, Temuco, Valdivia, and Puerto Montt) at different years.

Usage

data(annualppCities)

Format

The dataframe contains three variables as follows:

city

Name of city.

year

Year of registry.

annual

Value of the annual precipitation of a given year (mm).

Source

The data were obtained from https://explorador.cr2.cl/.

Examples

data(annualppCities)    
head(annualppCities) 

Serie de tiempo de precipitaciones anuales en Chile.

Description

Data contains annual precipitations in six cities in Chile (Santiago, Talca, Chillan, Temuco, Valdivia, and Puerto Montt) at different years.

Usage

data(annualppCities2)

Format

The dataframe contains three variables as follows:

ciudad

Name of city.

anho

Year of registry.

pp.anual

Value of the annual precipitation of a given year (mm).

Source

Los datos fueron obtenidos desde https://explorador.cr2.cl/.

Examples

data(annualppCities2)    
head(annualppCities2) 

Age and physical measurement data for wild bears

Description

Wild bears were anesthetized, and their bodies were measured and weighed. One goal of the study was to make a table (or perhaps a set of tables) for people interested in estimating the weight of a bear based on other measurements. Notice that there are missing values for some of the variables.

Usage

data(bears)

Format

Contains individual-level variables, as follows:

id

Bear id

age

Age in total number of months.

month

Month number within a given year.

sex

1 =male, 2 = female.

headL

Length of head, in cm.

headW

Width of head, in cm.

neckG

Girth of neck, in cm.

length

Body length, in cm.

chestG

Girth of chest, in cm.

weight

body weight, in kg.

obs

Temporal observation number for bear.

name

Name given to bear.

Source

According to Prof. Timothy Gregoire at Yale University (New Haven, CT, USA), the data set was supplied by Gary Alt.

References

Entertaining references are in Reader's Digest April, 1979, and Sports Afield September, 1981.

Examples

data(bears)    
head(bears) 
table(bears$sex)
boxplot(headL~sex, data=bears)

Edad y características biométricas de osos salvajes

Description

Los osos salvajes fueron anestesiados y sus cuerpos medidos. Uno de los objetivos del estudio fue hacer una tabla (o quizas un conjunto de tablas) para las personas interesadas en estimar el peso de un oso basandose en otras medidas. Observe que faltan valores para algunas de las variables.

Usage

data(bears2)

Format

Contiene variables de nivel individual, como se describen a continuación:

id

Identificador del oso.

edad

edad en meses

mes

identificador del mes,dentro del año.

sexo

1 = macho, 2 = hembra

cabezaL

longitud de la cabeza, en cm

cabezaA

ancho de la cabeza, en cm

cuelloP

circunferencia del cuello, en cm

largo

longitud del cuerpo, en cm

pechoG

circunferencia del pecho, en cm

peso

peso corporal, en kg

obs

número de observación temporal para el oso

nombre

nombre dado al oso

Source

Segun el Prof. Timothy Gregoire de Yale University (New Haven, CT, USA), los datos fueron cedidos por Gary Alt. Minitab, Inc. La descripcion de los datos fue dada por él.

References

Algunas referencias generales estan en el Reader's Digest de Abril, 1979, y Sports Afield de Septiembre, 1981.

Examples

data(bears2)    
head(bears2) 
table(bears2$sexo)
boxplot(cabezaL~sexo, data=bears2)

Age and physical measurement data for wild bears (without missing values)

Description

Wild bears were anesthetized, and their bodies were measured and weighed. One goal of the study was to make a table (or perhaps a set of tables) for people interested in estimating the weight of a bear based on other measurements.

Usage

data(bearsdepu)

Format

Individual-level variables, as follows:

id

Bear identificator.

age

Age in total number of months.

month

Month number within a given year.

sex

Sex code: 1 =male, 2 = female.

headL

Length of head, in cm.

headW

Width of head, in cm.

neckG

Girth of neck, in cm.

length

Body length, in cm.

chestG

Girth of chest, in cm.

weight

Body weight, in kg.

obs

Temporal observation number for bear.

name

name given to bear

Source

According to Prof. Timothy Gregoire at Yale University (New Haven, CT, USA), the data set was supplied by Gary Alt.

References

Entertaining references are in Reader's Digest April, 1979, and Sports Afield September, 1981.

Examples

data(bearsdepu)    
head(bearsdepu)
table(bearsdepu$sex)
boxplot(headL~sex, data=bearsdepu) 

Edad y características biométricas de osos salvajes (sin datos faltantes)

Description

Los osos salvajes fueron anestesiados y sus cuerpos medidos. Uno de los objetivos del estudio fue hacer una tabla (o quizas un conjunto de tablas) para las personas interesadas en estimar el peso de un oso basandose en otras medidas. Esta dataframe es igual que "bears" pero sin valores perdidos.

Usage

data(bearsdepu2)

Format

Contiene variables de nivel individual, como se describen a continuacion:

id

Identificador del oso.

edad

edad en meses.

mes

identificador del mes,dentro del año.

sexo

1 = macho, 2 = hembra.

cabezaL

longitud de la cabeza, en cm.

cabezaA

ancho de la cabeza, en cm.

cuelloP

circunferencia del cuello, en cm.

largo

longitud del cuerpo, en cm.

pechoG

circunferencia del pecho, en cm.

peso

peso corporal, en kg.

obs

número de observación temporal para el oso.

nombre

nombre dado al oso.

Source

Segun el Prof. Timothy Gregoire de Yale University (New Haven, CT, USA), los datos fueron cedidos por Gary Alt. Minitab, Inc. La descripcion de los datos fue dada por él.

References

Algunas referencias generales estan en el Reader's Digest de Abril, 1979, y Sports Afield de Septiembre, 1981.

Examples

data(bearsdepu2)    
head(bearsdepu2)
table(bearsdepu2$sexo)
boxplot(cabezaL~sexo, data=bearsdepu2) 

Population density growth of beetles

Description

Temporal measurements of density of beetles (Tribolium confusum) growing in different controlled environments.

Usage

beetles

Format

days

Number of days.

diet

The quantities of flour (in grams) of the environments where the beetles were growing. Six levels of the factor diet.

type

The various stage of beetles, i.e., eggs, larvae, pupae, and adults.

density

The number of insects per environment.

Source

Data from Table No. 1, page 116, of Chapman (1928). Series of experiments under controlled conditions in which flour beetles (Tribolium confusum) are kept in environments of known size. The period from egg to adult is approximately forty days at 27C degrees. The data were entered by Miss Yamara Arancibia, a former student of Prof. Christian Salas-Eljatib.

References

Examples

data(beetles)
table(beetles$type)
name.diet<-unique(beetles$diet)
num.diet<-length(name.diet)
##Time series plot
#first, some computation
alys<-with(beetles,tapply(density,list(as.factor(days),as.factor(diet)),sum))
out<-as.data.frame(alys)
out$time<-row.names(out)
head(out)
#Figure 1 of the paper
matplot(out[,"time"], out[,1:num.diet], las=1, type=c("b"),pch=1,
        xlab="Time in days",ylab="Total individuals")
legend("topleft", legend = name.diet, title = "Diet (gr)",
       col = 1:6, lty = 1:6, pch = 1)

Crecimiento poblacional de escarabajos

Description

Mediciones temporales de densidad de escarabajos (Tribolium confusum) creciendo en diferentes ambientes controlados.

Usage

beetles2

Format

dias

Número de días.

dieta

La cantidad de harina (en gramos) de ambientes donde crecen los escarabajos. Seis niveles del factor dieta.

tipo

Estados de desarrollo de los escarabajos, i.e., huevos, larvas, pupas, y adultos.

densidad

Número total de individuos por ambiente de crecimiento.

Source

Datos del Cuadro No. 1, page 116, de Chapman (1928). Serie de experimentos bajo condiciones controladas donde escarabajos (Tribolium confusum) se mantienen en ambientes de tamaño conocido. El periodo desde huevo a adulto es de aproximadamente de cuarenta días a 27 grados Celsius. Los datos fueron digitados por la Srta. Yamara Arancibia, una estudiante del Prof. Christian Salas-Eljatib.

References

Examples

data(beetles2)    
table(beetles2$tipo)
nom.dieta<-unique(beetles2$dieta)
num.dieta<-length(nom.dieta)
##Grafico de serie de tiempo
#primero algunos calculos
alys<-with(beetles2,tapply(
          densidad,list(as.factor(dias),as.factor(dieta)),sum)
          )
out<-as.data.frame(alys)
out$tiempo<-row.names(out)
head(out)
##Figura 1 del paper
matplot(out[,"tiempo"], out[,1:num.dieta], las=1, type=c("b"),pch=1,
        xlab="Tiempo en dias",ylab="Densidad de individuos")
legend("topleft", legend = nom.dieta, title = "Dieta (gr)",
       col = 1:6, lty = 1:6, pch = 1)

Camera trap data on mammals in Ruaha National Park, southern Tanzania.

Description

Dataset contains 14604 observations and sampling was carried out for two months during the dry season of 2013 and two months during the wet season of 2014. Each camera station is associated with a randomly placed camera and a trail-based camer, with the aim of comparing communities resulting from the two camera trap placement strategies.

Usage

data(cameratrap)

Format

Contains 6 variables, as follows:

reference

Number of observation od datasets.

placement

Type of "placement" placed in each station (random or trail).

season

Season where were made the samplings.

station

Station where were collected the data.

specie

Name of specie medium to large terrestrial mammals.

date.time

The date and time of each photographic event is also given.

Source

The data were provided by Dr Jeremy Cusack.

References

Examples

data(cameratrap)    
head(cameratrap) 

Camaras trampa de mamiferos en el parque nacional Ruaha, en el sur de Tanzania

Description

Contains information of Camera trap data on medium to large terrestrial mammals collected at 54 camera stations in Ruaha National Park, southern Tanzania. Dataset contains 14604 observations and sampling was carried out for two months during the dry season of 2013 and two months during the wet season of 2014. Each camera station is associated with a randomly placed camera and a trail-based camer, with the aim of comparing communities resulting from the two camera trap placement strategies.

Usage

data(cameratrap2)

Format

Contiene 6 variables, como sigue:

referencia

Number of observation od datasets.

posicion

Type of "placement" placed in each station (random or trail).

temporada

Season where were made the samplings.

estacion

Station where were collected the data.

especie

Name of specie medium to large terrestrial mammals.

fecha.hora

The date and time of each photographic event is also given.

Source

Los datos fueron cedidos por el Dr Jeremy Cusack.

References

  1. Random versus game trail-based camera trap placement strategy for monitoring terrestrial mammal communities. PLoS ONE 10(5): e0126373.

Examples

data(cameratrap2)    
head(cameratrap2) 

Driver status after car accidents in Greece.

Description

A data frame showing the use of seat belt and the driver status after a car accident in Greece.

Usage

data(carAccidents)

Format

Contains the factor variables:

record

factor representing the driver status.

seatBelt

factor indicating whether the driver wore a setbelt.

Source

R package 'gginference'

Examples

data(carAccidents)    
head(carAccidents)
table(carAccidents)

Caribou survival

Description

Caribou survival

Usage

caribou

Format

Data frame con 91 filas y 3 columnas:

herd

Herd identifier.

wolf.density

Wolf density of the herd as wolf / 100 km².

alive

Caribou survival, 1 survives, 0 don't survive.

Examples

data(caribou)
table(caribou$alive, caribou$herd)

Sobrevivencia de caribú

Description

Sobrevivencia de caribú

Usage

caribou2

Format

Data frame con 91 filas y 3 columnas:

herd

Identificador de la manada.

wolf.density

Densidad de lobos, en número de lobos / 100 km².

alive

Sobrevivencia de un caribú, 1 sobrevive, 0 no sobrevive.

Examples

data(caribou2)
table(caribou2$alive, caribou2$herd)

Datos encuesta CASEN del 2022

Description

Encuesta de Caracterización Socioeconómica Nacional (CASEN) de Chile, es realizada por el Ministerio de Desarrollo Social y Familia con el objetivo de disponer de información que permita conocer situación de los hogares y de la población. Estos datos corresponden a los de la encuesta CASEN 2022.

Usage

data(casen)

Format

Este set de datos contiene las siguientes columnas:

id.vivienda

Identificador de la vivienda.

id.persona

Identificador de la persona.

region

Región administrativa de Chile.

comuna

Comuna.

edad

Edad de la persona, en años.

sexo

Sexo de la persona.

esc

Años de escolaridad (edad >= 15).

educ

Clasificación de educación recibida.

personas.hogar

Número de personas que habitan en el hogar.

tipohogar

Nivel de tipo de hogar según encuesta.

activ

Nivel de actividad actual de la persona según encuesta.

ytot

Ingreso total.

ytoth

Ingreso total del hogar.

ypch

Ingreso total per cápita del hogar.

ytotcor

Ingreso total corregido.

ytotcorh

Ingreso total corregido del hogar.

ypc

Ingreso total corregido per cápita del hogar.

mayor.nivel.edu

¿Cuál es el nivel educacional al que asiste o el más alto al cual asistió?

area.edu.cinef

Clasificación Internacional Normalizada de Educación (CINE-F).

subarea.edu.cinef

Clasificación Internacional Normalizada de Sub-Area de Educación (CINE-F).

previ.salud

Sistema de previsión de salud.

Source

Los datos fueron obtenidos desde el web https://observatorio.ministeriodesarrollosocial.gob.cl/encuesta-casen. Note que solo algunas columnas son utilizadas aca, así como el nombre de algunas columnas fueron levemente cambiados.

Examples

data(casen)    
head(casen) 
table(casen$region)
table(casen$region,casen$sexo)
tapply(casen$ytotcor,casen$sexo,sum)

Function to compute the cumulative distribution of a variable

Description

Builds the cumulative distribution of a vector, using a step% of the data as fixed-intervals.

Usage

cdf(y = y, step = 0.05)

Arguments

y

a vector of a random variable

step

a numeric proportion of the data used as increment interval for building the cdf of the random variable. The default value for 'step' is 0.05, representing a 5%.

Details

By default the cumulative distribution is build using 5% of the data as intervals, that is to say, from 0.05 (i.e., 5%) to 0.95 (i.e., 95%).

Value

returns a dataframe having two columns: the first contains the random variable values and the second the cumulative distribution for the variable.

Author(s)

Christian Salas-Eljatib

References

Salas-Eljatib, C. 2021. Análisis de datos con el programa estadístico R: una introducción aplicada. Ediciones Universidad Mayor, Santiago, Chile. 170 p. https://eljatib.com/rlibro

Examples

y.var <- rnorm(10)
cdf(y.var)
cdf(y.var, step=0.1)

Chicken growth data.

Description

The body weights of the chicks were measured at birth and every second day thereafter until day 20. They were also measured on day 21. There were four groups on chicks on different protein diets.

Usage

data(chicksw)

Format

Contains four variables, as follows:

chick

An ordered factor with levels different giving a unique identifier for the chick. The ordering of the levels groups chicks on the same diet together and orders them according to their final weight (lightest to heaviest) within diet.

diet

A factor with levels 1,2,3 and 4 indicating which experimental diet the chick received.

time

A numeric vector giving the number of days since birth when the measurement was made.

weight

A numeric vector giving the body weight of the chick (gm).

Source

The data were obtained from the alr4 library.

References

Crowder M, Hand D. 1990. Analysis of Repeated Measures. Chapman and Hall

Examples

data(chicksw)    
head(chicksw) 

Crecimiento de pollos.

Description

El peso de pollos fueron medidos al momento de nacer y cada dia por medio hasta el dia 20. Ellos también fueron medidos el día 21. Hubo cuatro grupos de pollos en diferentes dietas de proteinas.

Usage

data(chicksw2)

Format

Contine cuatro variables, como sigue:

pollo

Un identificador único para cada pollo. La numeracion esta ordenado segun el peso final dentro de cada dieta.

dieta

Un factor con cuatro nivels: 1,2,3 y 4 indicando que dieta recibió el pollo.

tiempo

Número de días desde el nacimiento.

peso

Peso del pollo (gm).

Source

Los datos fueron obtenidos desde la librería alr4.

References

Crowder M, Hand D. 1990. Analysis of Repeated Measures. Chapman and Hall

Examples

data(chicksw2)    
head(chicksw2) 

CO2 emissions and temperature at country-level.

Description

Data obtained from the hockeystick package, which retrieves annual global carbon dioxide emissions since 1750 from the World Data repository https://github.com/owid/co2-data, as well as other climate-related variables.

Usage

data(co2temp)

Format

The data contains 75 variables, and the fully description can be reviewed in the references provided here.

country

Country.

year

Calendar year.

iso_code

TBA.

population

Population size, in number of people.

gdp

Gross domestic product, a measure of the value added created through the production of goods and services in a country.

cement_co2

TBA.

cement_co2_per_capita

TBA.

co2

TBA.

co2_growth_abs

TBA.

co2_growth_prct

TBA.

co2_including_luc

TBA.

co2_including_luc_growth_abs

TBA.

co2_including_luc_growth_prct

TBA.

co2_including_luc_per_capita

TBA.

co2_including_luc_per_gdp

TBA.

co2_including_luc_per_unit_energy

TBA.

co2_per_capita

TBA.

co2_per_gdp

TBA.

co2_per_unit_energy

TBA.

coal_co2

TBA.

coal_co2_per_capita

TBA.

consumption_co2

TBA.

consumption_co2_per_capita

TBA.

consumption_co2_per_gdp

TBA.

cumulative_cement_co2

TBA.

cumulative_co2

TBA.

cumulative_co2_including_luc

TBA.

cumulative_coal_co2

TBA.

cumulative_flaring_co2

TBA.

cumulative_gas_co2

TBA.

cumulative_luc_co2

TBA.

cumulative_oil_co2

TBA.

cumulative_other_co2

TBA.

energy_per_capita

TBA.

energy_per_gdp

TBA.

flaring_co2

TBA.

flaring_co2_per_capita

TBA.

gas_co2

TBA.

gas_co2_per_capita

TBA.

ghg_excluding_lucf_per_capita

TBA.

ghg_per_capita

TBA.

land_use_change_co2

TBA.

land_use_change_co2_per_capita

TBA.

methane

TBA.

methane_per_capita

TBA.

nitrous_oxide

TBA.

nitrous_oxide_per_capita

TBA.

oil_co2

TBA.

oil_co2_per_capita

TBA.

primary_energy_consumption

TBA.

share_global_cement_co2

TBA.

share_global_co2

TBA.

share_global_co2_including_luc

TBA.

share_global_coal_co2

TBA.

share_global_cumulative_cement_co2

TBA.

share_global_cumulative_co2

TBA.

share_global_cumulative_co2_including_luc

TBA.

share_global_cumulative_coal_co2

TBA.

share_global_cumulative_flaring_co2

TBA.

share_global_cumulative_gas_co2

TBA.

share_global_cumulative_luc_co2

TBA.

share_global_cumulative_oil_co2

TBA.

share_global_cumulative_other_co2

TBA.

share_global_flaring_co2

TBA.

share_global_gas_co2

TBA.

share_global_luc_co2

TBA.

share_global_oil_co2

TBA.

share_global_other_co2

TBA.

share_of_temperature_change_from_ghg

TBA.

temperature_change_from_ch4

TBA.

temperature_change_from_co2

TBA.

temperature_change_from_ghg

TBA.

temperature_change_from_n2o

TBA.

total_ghg

TBA.

total_ghg_excluding_lucf

TBA.

trade_co2

TBA.

trade_co2_share

TBA.

Source

The data were obtained from the hockeystick library of R. Notice that in the dataframe only a portion of countries have been kept.

References

Examples

data(co2temp)
names(co2temp)
table(co2temp$country)
lattice::xyplot(co2~year|country,data=co2temp,type="l",as.table=TRUE)

Function to compute the needed statistics for a given contrast

Description

The function computes the statistics for inference in a given contrast, subject to a given significance level. Those statistics are as follows: estimated contrast, standard error of the contrast, and the confidence interval of the contrast.

Usage

contrast(
  model = model,
  coef.cont = coef.cont,
  grp.m = grp.m,
  grp.n = grp.n,
  alpha = 0.05,
  full = TRUE
)

Arguments

model

object containing the fitted model

coef.cont

vector with the coefficients to establish the contrasts

grp.m

a vector having the sample mean per each group, or level of the factor under study.

grp.n

a vector having the sample size per each group, or level of the factor under study.

alpha

is the significance level for building the confidence intervals. Default value is 0.05, which is 95% confidence level.

full

FALSE if want short output, TRUE for longer (i.e. more details). Default is TRUE.

Details

The contrast is established based upon an already fitted statistical model that describe the relationship among variables. The significance level ('alpha') is defined by the user, although by default has been set to 0.05, that is to say, a 95% of statistical confidence.

Value

This function returns the above described statistics for a given contrast.

Author(s)

Christian Salas-Eljatib

References

Examples


data(fertiliza)
table(fertiliza$treat)
means.trt <- tapply(fertiliza$volume,fertiliza$treat,mean);means.trt
sds.trt <- tapply(fertiliza$volume,fertiliza$treat,sd);sds.trt
ns.trt <- tapply(fertiliza$volume,fertiliza$treat,length);ns.trt
m1 <- lm(volume ~ treat, data=fertiliza)
anova(m1)
## Coefficients to be used in the contrast
#c1: (tmoA1-A2) - (tmoA3-A4)
C1.coeff <- c(0,1,1,-1,-1)
contrast(model=m1,C1.coeff,grp.m=means.trt,grp.n=ns.trt,alpha=0.1,full=TRUE)
contrast(model=m1,C1.coeff,grp.m=means.trt,grp.n=ns.trt,alpha=0.1,full=FALSE)
contrast(m1,C1.coeff,grp.m=means.trt,grp.n=ns.trt,alpha=0.05,full=TRUE)
contrast(m1,C1.coeff,grp.m=means.trt,grp.n=ns.trt)


Tree-level cork biomass data for Oak trees in Portugal

Description

Measurements of cork weight in Quercus suber (Oak) trees in Portugal.

Usage

corkoak

Format

tree

A correlative number for each sample tree.

csc

is tree circumference at 1.3 m outside bark, in cm.

cbc

is tree circumference at 1.3 m under bark, in cm.

bt

bark thickness, in cm.

hdeb

is debarking height, in m.

hblc

height to base of live crown, in m.

nb

number of branches debarked

cr.diam

crown diameter, in m.

w

total green weight of the stripped cork, in kg

stratum

Stratum

Source

Data supplied electronically to Prof. Timothy Gregoire (Yale University) by authors accompanied by a note which said "After the article was published we discovered a problem with 2 of the observations so Teresa and I decided it was best just to delete them."

References

Examples

data(corkoak)    
head(corkoak) 

Datos de biomasa de corcho en árboles de Encino en Portugal

Description

Mediciones de peso de corcho en árboles muestra de Quercus suber en Portugal.

Usage

corkoak2

Format

arbol

A correlative number for each sample tree.

perimetro.cc

is tree circumference at 1.3 m outside bark, in cm.

perimetro.sc

is tree circumference at 1.3 m under bark, in cm.

e.corteza

bark thickness, in cm.

h.desc

is debarking height, in m.

hcc

height to base of live crown, in m.

num.ram

number of branches debarked

diam.copa

crown diameter, in m.

biomasa

total green weight of the stripped cork, in kg

estrato

Estrato

Source

Datos cedidos por Prof. Timothy Gregoire (Yale University) y los autores originales mencionaron "After the article was published we discovered a problem with 2 of the observations so Teresa and I decided it was best just to delete them."

References

Examples

data(corkoak2)    
head(corkoak2) 

Deletes the first n-characters of a string

Description

Function to delete the last n-characters of a string from the left-hand side.

Usage

deleteLeft(fac, n)

Arguments

fac

is an object of class string or factor

n

is the number of characters to be deleted of a the string given in 'fac'.

Details

It is specially set to arrange data vector having alphanumeric format.

Value

This function returns an object having n-less characters from the left-hand side.

Author(s)

Christian Salas-Eljatib

References

Salas-Eljatib, C. 2021. Análisis de datos con el programa estadístico R: una introducción aplicada. Ediciones Universidad Mayor, Santiago, Chile. 170 p. https://eljatib.com/rlibro

Examples


plot.id <- c("BNE1","BNE2","PLE1")
deleteLeft(plot.id,1)
deleteLeft(plot.id,2)
deleteLeft(plot.id,3)

Deletes the last n-characters of a string

Description

Function to delete the last n-characters of a string from the right-hand side.

Usage

deleteRight(fac, n)

Arguments

fac

is an object of class string or factor

n

is the number of characters to be deleted of a the string given in 'fac'.

Details

It is specially set to arrange data vector having alphanumeric format.

Value

This function returns an object having n-less characters from the right-hand side.

Author(s)

Christian Salas-Eljatib

References

Salas-Eljatib, C. 2021. Análisis de datos con el programa estadístico R: una introducción aplicada. Ediciones Universidad Mayor, Santiago, Chile. 170 p. https://eljatib.com/rlibro

Examples


last.names.id <- c("Stage-1924","Gregoire-1958","Robinson-1967")
deleteRight(last.names.id,5)
deleteRight(last.names.id,4)

Creates a descriptive statistics table for continuous variables.

Description

Function to create a descriptive statistics table for continuous variables from a dataframe.

Usage

descstat(data = data, decnum = 3, eng = TRUE, full = FALSE)

Arguments

data

a dataframe containing numeric variables as columns.

decnum

the number of decimals to be used in the output. The default is set to 3.

eng

logical; if "TRUE" (by default), the language of the statistics will be in English; if "FALSE" will be in Spanish. descriptive statistics. The default is to "FALSE".

full

logical; if "TRUE", the output includes some extra descriptive statistics. The default is to "FALSE".

Details

The resulting table offers the main central and dispersion statistics.

Value

This function wraps descriptive statistics into a summarize table having the following statistics: sample size, minimum, maximum, mean, median, SD, and coefficient of variation. If the "full" option is set to "TRUE", the following statistics will be added to the table: 25th and 75th percentiles, the interquartile range, skewness, and kurtosis.

Author(s)

Christian Salas-Eljatib and Tomas Cayul.

References

Examples


df <- datana::idahohd
head(df)
df.h<-df[,c("dbh","height")]
## using the function
descstat(data=df.h)
descstat(data=df.h,decnum=1,eng=FALSE)
descstat(df.h,2)

Presidential election data of Florida (USA) in 2000.

Description

County-by-county vote for president in Florida in 2000 for Bush, Gore and Buchanan.

Usage

data(election)

Format

Contains three variables, as follows:

gore

Vote for Gore.

bush

Vote for Bush.

buchanan

Vote for Pat Buchanan.

Source

The data were obtained from the alr4 library.

References

Weisberg S. 2014. Applied Linear Regression. 4th edition. Hoboken NJ: Wiley

Examples

data(election)    
head(election) 

Elección presidencial en el estado de Florida (USA) en el 2000.

Description

Conteo de votos a nivel de condado en el estado de Florida, año 2000.

Usage

data(election2)

Format

Contiene las siguientes tres columnas:

gore

Votos para Gore. Número de votos para Al Gore.

bush

Votos para Bush. Número de votos para George W. Bush.

buchanan

Votos para Buchaman. Número de votos para Pat Buchanan.

Source

Los datos se obtuvieron desde el paquete alr4 de R.

References

Weisberg S. 2014. Applied Linear Regression. 4th edition. Hoboken NJ: Wiley

Examples

data(election2)    
head(election2) 

Puntaje ENDFID 2021 por carrera

Description

Puntaje promedio por carrera de la Evaluación Nacional Diagnóstica de la Formación Inicial Docente (ENDFID), enfocado en matemática. Se tienen 79 observaciones. Se incluyen dos variables binarias: cuech (pertenece 1 o no 0 al CUECH) y pace (tiene cupos PACE 1 o no 0).

Usage

data(endfid2)

Format

Variables se describen a continuación:

programa

Nombre de la carrera dictada

universidad

Universidad correspondiente al programa

zona

Ubicación de la sede de la carrera

region

Región de la sede de la carrera

tipo.programa

Tipo de carrera (1 Ped. En Matemáticas, 2 Enseñanza General Básica, 3 Programa formación pedagógica)

cuech

Universidad pertenece al Consejo de Universidades del Estado (1 si, 0 no)

pace

Carrera incluye cupos PACE (1 si, 0 no)

end.pcpg

Puntaje promedio de la carrera en la Prueba de Conocimientos Pedagógicos Generales

end.pcdd

Puntaje promedio de la carrera en la Prueba de Conocimientos Disciplinarios y Didácticos

matricula

Cantidad de estudiantes matriculados en la carrera el 2022

Source

Datos obtenidos desde el Centro de Perfeccionamiento, Experimentación e Investigaciones Pedagógicas (CPEIP) del Mineduc y desde los sitios web respectivo de cada universidad. Los datos fueron digitados por Diego Fernández, estudiante del Prof. Christian Salas-Eljatib.

Examples

data(endfid2)
head(endfid2)

Leaf measurements for Eucalyptus nitens trees in Tasmania, Australia.

Description

The length, width, and area of Eucalyptus nitens leaves were measured.

Usage

data(eucaleaf)

Format

Contains leaf-level variables, as follows:

time

Time factor, in two levels: early or Late.

tree

Sample tree code identificator.

shoot

Shoot description factor, in three levels.

l

Length of the leaf, in mm.

w

Width of the leaf, in mm.

la

leaf area, in cm^{2}.

Source

Although the original source of the measurements is the Dissertation of Dr Candy (1999), the data file used here was courtesy of Prof. Timothy Gregoire at Yale University (New Haven, CT, USA). Furthermore, these data were used by Gregoire and Salas (2009).

References

Examples

data(eucaleaf)    
head(eucaleaf) 

Mediciones foliares para árboles de Eucalyptus nitens en Tasmania, Australia.

Description

Mediciones de largo, ancho y area de hojas de Eucalyptus nitens.

Usage

data(eucaleaf2)

Format

Contiene variables a nivel de hoja, como sigue:

tiempo

Factor a dos niveles: Temprano o Tardío.

arbol

Identificador del árbol muestra.

meristema

Factor de la descripción del meristema, en tres niveles.

largo

Largo de la hoja, en mm.

ancho

Ancho de la hoja, en mm.

area

Área foliar, en cm^{2}.

Source

Aunque la fuente original de estas mediciones proviene de la tesis del Dr. Candy (1999), el archivo de datos fue cortesía del Prof. Timothy Gregoire de Yale University (New Haven, CT, USA). Además, estos datos fueron ocupados en el estudio de Gregoire y Salas (2009).

References

Examples

data(eucaleaf2)    
head(eucaleaf2) 

Leaf measurements (all, n=744) for Eucalyptus nitens trees in Tasmania, Australia.

Description

The length, width, and area of Eucalyptus nitens leaves were measured for all the samples of Candy (1999).

Usage

data(eucaleafAll)

Format

Contains leaf-level variables, as follows:

time

Time factor, in two levels: early or Late.

tree

Sample tree code identificator.

shoot

Shoot description factor, in three levels.

l

Length of the leaf, in mm.

w

Width of the leaf, in mm.

la

leaf area, in cm^{2}.

Source

Although the original source of the measurements is the Dissertation of Dr Candy (1999), the data file used here was courtesy of Prof. Timothy Gregoire at Yale University (New Haven, CT, USA). Furthermore, these data were used by Gregoire and Salas (2009).

References

Examples

data(eucaleafAll)    
head(eucaleafAll) 

Mediciones foliares (todas, n=744) para árboles de Eucalyptus nitens en Tasmania, Australia.

Description

Mediciones de largo, ancho y área de hojas de Eucalyptus nitens para toda la muestra de Candy (1999).

Usage

data(eucaleafAll2)

Format

Contiene variables a nivel de hoja, como sigue:

tiempo

Factor a dos niveles: Temprano o Tardío

arbol

Identificador del árbol muestra

meristema

Factor de la descripción del meristema, en tres niveles.

largo

Largo de la hoja, en mm

ancho

Ancho de la hoja, en mm

area

Área foliar, en cm^{2}

Source

Aunque la fuente original de estas mediciones proviene de la tesis del Dr. Candy (1999), el archivo de datos fue cortesía del Prof. Timothy Gregoire de Yale University (New Haven, CT, USA).

References

Examples

data(eucaleafAll2)    
head(eucaleafAll2) 

Extracts the last n-characters of a string

Description

Function to extract the first n-characters of a string from the left-hand side.

Usage

extractLeft(fac, n)

Arguments

fac

is an object of class string or factor

n

is the number of characters to be deleted of a the string given in 'fac'.

Details

It is specially set to arrange data vector having alphanumeric format.

Value

This function returns an object having the first n-characters from the left-hand side.

Author(s)

Christian Salas-Eljatib

References

Salas-Eljatib, C. 2021. Análisis de datos con el programa estadístico R: una introducción aplicada. Ediciones Universidad Mayor, Santiago, Chile. 170 p. https://eljatib.com/rlibro

Examples


plot.id <- c("BNE1","BNE2","PLE1")
extractLeft(plot.id,1)
extractLeft(plot.id,2)
extractLeft(plot.id,3)

Extracts the last n-characters of a string

Description

Function to extract the last n-characters of a string from the right-hand side.

Usage

extractRight(fac, n)

Arguments

fac

is an object of class string or factor

n

is the number of characters to be deleted of a the string given in 'fac'.

Details

It is specially set to arrange data vector having alphanumeric format.

Value

This function returns an object having the last n characters from the right-hand side.

Author(s)

Christian Salas-Eljatib

References

Salas-Eljatib, C. 2021. Análisis de datos con el programa estadístico R: una introducción aplicada. Ediciones Universidad Mayor, Santiago, Chile. 170 p. https://eljatib.com/rlibro

Examples


last.names.id <- c("Stage-1924","Gregoire-1958","Robinson-1967")
extractRight(last.names.id,4)
extractRight(last.names.id,2)

Foliar damage by Ozone

Description

Foliar damage by Ozone

Usage

fdamage

Format

Data frame con 52 filas y 2 columnas:

damage

Foliar decoloration, 1 with decoloration, 0 without decoloration.

ozone

Maximum charge of Ozone concentration.

Examples

data(fdamage)
table(fdamage$damage)

Daño foliar por Ozono

Description

Daño foliar por Ozono

Usage

fdamage2

Format

Data frame con 52 filas y 2 columnas:

damage

Decoloración foliar, 1 con decoloración, 0 sin decoloración.

ozone

Máxima carga de concentración de Ozono.

Examples

data(fdamage2)
table(fdamage2$damage)

Fertilization experiment data.

Description

Data contains volume data at plot-level for a fertilization experiment.

Usage

data(fertiliza)

Format

Contains two variables, as follows:

treat

Treatment level.

volume

Plot-level volume, in m^{3}.

Source

The data were provided by Dr Christian Salas-Eljatib (Universidad de Chile, Santiago, Chile).

References

not yet

Examples

data(fertiliza)
head(fertiliza)
class(fertiliza$treat)
unique(fertiliza$treat)
means.g <- tapply(fertiliza$volume,fertiliza$treat,mean);means.g
sds.g <- tapply(fertiliza$volume,fertiliza$treat,sd);sds.g
ns.g <- tapply(fertiliza$volume,fertiliza$treat,length);ns.g

Experimento de fertilización

Description

Datos a nivel de parcela de un experimento de fertilización con tratamientos y replicas.

Usage

data(fertiliza2)

Format

Contiene tres columnas como sigue:

tmo

Tratamiento.Factor medido en diferentes niveles.

vol

Volumen de madera en la parcela experimental, en m^{3}.

Source

Datos cedidos por el Prof. Christian Salas.

References

not yet

Examples

data(fertiliza2)
head(fertiliza2)
class(fertiliza2$tmo)
unique(fertiliza2$tmo)
media.g <- tapply(fertiliza2$vol,fertiliza2$tmo,mean);media.g
desvst.g <- tapply(fertiliza2$vol,fertiliza2$tmo,sd);desvst.g
n.g <- tapply(fertiliza2$vol,fertiliza2$tmo,length);n.g


Diameter growth of trees

Description

The 'ficdiamgr' is a fictitious dataframe built to show the structure of longitudinal data. The dataframe has records of tree diameter growth of five sample trees, spanning three species.

Usage

data(ficdiamgr)

Format

A time series data containing the following columns:

tree.id

an ordered factor indicating the tree on which the measurement is made. The ordering is according to increasing maximum diameter.

time

a numeric vector giving the numbers of days since establishment.

dbh

a numeric vector of diameter at breast height, in cm.

site

a factor variable, representing site conditions with two levels.

spp

a factor variable, representing tree species with three levels.

Source

This dataframe was built from the 'Orange' data of the datasets package, by Christian Salas-Eljatib.

Examples

data(ficdiamgr)

coplot(dbh ~ time | tree, data = ficdiamgr, show.given = FALSE)

Crecimiento diametral de árboles

Description

Los datos 'ficdiamgr2' son ficticios, y fue construida para mostrar la estructura de datos longitudinales. Los datos tienen registro de crecimiento en cinco árboles muestra, representando a tres especies.

Usage

data(ficdiamgr2)

Format

Una serie de tiempo conteniendo las siguientes columnas:

arbol

indica el identificador del árbol.

tiempo

número de dias desde el inicio de las mediciones.

dap

diámetro a la altura del pecho, en cm.

sitio

un factor, representando condiciones de sitio, en dos niveles.

espe

un factor, representando especie del árbol, en tres niveles.

Source

Estos datos fueron modificados desde la dataframe 'Orange' de la librería 'datasets', por Christian Salas-Eljatib.

Examples

data(ficdiamgr2)

coplot(dap ~ tiempo | arbol, data = ficdiamgr2, show.given = FALSE)

Finds the position of a specific variable.

Description

Sometimes in data manipulation we face the task of locating the position of a specific variable within a dataframe. The function finds the position in which a column name is within an object.

Usage

findColumn.byname(data = data, col.name = col.name)

Arguments

data

is a dataframe

col.name

is a string specifying the name of the variable

Details

Although the function finds the position of a specific variable, can also be used for more than one variable.

Value

This function returns the number of a specific column-name.

Note

It can be used for a vector of specified column-names as well.

Author(s)

Christian Salas-Eljatib

References

Salas-Eljatib, C. 2021. Análisis de datos con el programa estadístico R: una introducción aplicada. Ediciones Universidad Mayor, Santiago, Chile. 170 p. https://eljatib.com/rlibro

Examples


df <- data.frame(varX=1:5, varY=letters[1:5], varZ=rep("a",5),
varK=rep("b",5))
df
#using the function
findColumn.byname(df, c("varY","varZ"))
findColumn.byname(df, "varK")
#Creating an example vector
vector <- letters
vector
findColumn.byname(vector, c("h","z"))

Fish growth variables.

Description

Variables of small mouth bass (i.e, a fish) collected in West Bearskin Lake, Minnesota, in 1991.

Usage

data(fishgrowth)

Format

Contains three variables, as follows:

years

Year at capture.

length

Length at capture (mm).

scale

radius of a key scale (mm).

Source

The data were obtained from the alr4 library of R, specifically from the dataframe wblake that includes only fish of ages 8 or younger.

References

Weisberg S. 2014. Applied Linear Regression. 4th edition. Hoboken NJ: Wiley

Examples

data(fishgrowth)    
head(fishgrowth) 
plot(length~age, data=fishgrowth) 

Crecimiento de peces

Description

Variables de crecimiento de peces en el lago West Bearskin del estado de Minnesota, en 1991.

Usage

data(fishgrowth2)

Format

Contiene tres variables, como sigue:

edad

Year at capture.

largo

Length at capture, en mm.

escala

radius of a key scale, en mm.

Source

Datos obtenidos desde el paquete alr4 de R, de la dataframe wblake qie incluye peces de hasta 8 años.

References

Weisberg S. 2014. Applied Linear Regression. 4th edition. Hoboken NJ: Wiley

Examples

data(fishgrowth2)    
head(fishgrowth2)
plot(largo~edad,data=fishgrowth2)

Forest fire occurrence in central Chile

Description

Data of forest fire occurrence in central Chile having 7210 observations, with 890 cases of fire occurrence and 6320 cases of non-occurrence. The binary variable (Y) is the occurrence of forest fire, where Y=1 to denotes occurrence and Y=0, otherwise.

Usage

data(forestfire)

Format

The data frame contains four variables as follows:

fire

Occurrence of forest fire (1 yes, 0 no)

xcoord

Geographic coordinate x.utm

ycoord

Geographic coordinate y.utm

aspect

Exposure (degrees from north)

eleva

Elevation (m)

slope

Slope (degrees)

distr

Distance to dirt roads

distcity

Distance to cities

distriver

Distance to paved roads

covera

Land use classifications according to a polygon

coverb

Land use classifications according to a polygon

tempe

Minimum temperature of the coldest month

ppan

Annual precipitation

ndii

Normalized difference infrared index

nvdi

Normalized difference vegetation index

tempe2

Minimum temperature of the warmest month

ppan2

Precipitation of the driest month

frec.fire

Frequency of fires

perc.fire

Percentage of fire frequency

fireClass

Class for frecuency fire

asp.class

Class of variable exposure

eleva.class

Class of numerical variable elevation

slope.class

Class of numerical variable slope

ndii.class

Normalized difference infrared index class

nvdi.class

Normalized difference vegetation index class

Source

Data were provided by Dr Adison Altamirano at the Universidad de La Frontera (Temuco, Chile).

References

-Salas-Eljatib C, Fuentes-Ramírez A, Gregoire TG, Altamirano A, Yaitul V. 2018. A study on the effects of unbalanced data when fitting logistic regression models in ecology. Ecological Indicators 85:502-508. doi:10.1016/j.ecolind.2017.10.030

Examples

data(forestfire)
head(forestfire)

Ocurrencia de incendios forestales

Description

Datos de ocurrencia de incendios forestales en la zona central de Chile. Se tienen 7210 observaciones, de las cuales 890 tienen ocurrencia de incendios y 6320 casos de no ocurrencia. La variable binaria (Y) es la ocurrencia de un incendio forestal, donde Y=1 denota ocurrencia y Y=0, lo contrario.

Usage

data(forestfire2)

Format

Variables se describen a continuacion:

fire

Presencia de incendio forestal (1 si, 0 no)

xcoord

Coordenada geografica x.utm

ycoord

Coordenada geografica y.utm

aspect

Exposicion (grados desde el norte)

eleva

Elevacion (m)

slope

Pendiente (grados)

distr

Distancia a caminos de tierra

distcity

Distancia a ciudades

distriver

Distancia a caminos pavimentados

covera

Clasificaciones de uso del suelo segun un poligono

coverb

Clasificaciones de uso del suelo segun un poligono

tempe

Temperatura m?nima del mes m?s frio

ppan

Precipitacion anual

ndii

Indice infrarrojo de diferencia normalizado

nvdi

Indice de vegetacion de diferencia normalizado

tempe2

Temperatura m?nima del mes mas calido

ppan2

Precipitacion del mes mas seco

frec.fire

Frecuencia de incendios

perc.fire

Porcentajede la frecuencia de incendios

fireClass

Clase para variable frecuencia de incendio

asp.class

Clase de variable exposicion

eleva.class

Clase de variable numerica elevacion

slope.class

Clase de variable numerica pendiente

ndii.class

Clase de indice infrarrojo de diferencia normalizado

nvdi.class

Clase de indice de vegetacion de diferencia normalizado

Source

Datos fueron cedidos por el Dr. Adison Altamirano, Universidad de La Frontera, Temuco, Chile.

References

-Salas-Eljatib C, Fuentes-Ramírez A, Gregoire TG, Altamirano A, Yaitul V. 2018. A study on the effects of unbalanced data when fitting logistic regression models in ecology. Ecological Indicators 85:502-508. doi:10.1016/j.ecolind.2017.10.030

Examples

data(forestfire2)
head(forestfire2)

Prices of gasoline and crude oil

Description

Prices of gasoline and crude oil

Usage

gasoline

Format

Data frame of 14 rows and 3 columns:

year

Year of data

gasoline

Price of gasoline for year in cents / gallon

crude.oil

Price of crude oil fot year in $ / bbl

Source

McClave, James T. Benson, P.G. 1991. Statistics for Business and Economics, Fifth Edition. Dellen and Macmillan.

References

Statistial Abstract of the United States: 1989, pp476, 480.

Examples

data(gasoline)
plot(gasoline~year, data = gasoline, type = "b",
     ylab = "Gasoline price (cents/gallon)",
     xlab = "Year")

Precios de gasolina y petróleo

Description

Precios de gasolina y petróleo

Usage

gasoline2

Format

Data frame que contiene 14 filas y 3 columnas:

año

Año del precio

gasolina

Precio de la gasolina para el año en centavos / galón

petroleo

Precio del petróleo para el año en $ / bbl

Source

McClave, James T. Benson, P.G. 1991. Statistics for Business and Economics, Fifth Edition. Dellen and Macmillan.

References

Statistial Abstract of the United States: 1989, pp476, 480.

Examples

data(gasoline2)
plot(gasolina~año, data = gasoline2, type = "b",
     ylab = "Precio de la gasolina (centavos/galón)",
     xlab = "Año")

Datos GDP-per capita

Description

Datos del producto interno bruto per capita, por pais.

Usage

data(gdpcap)

Format

Este set de datos contiene las siguientes columnas:

pais

Nombre del país.

pais.cod

Codificación del país.

gdp.pc

GDP per capita, en US dollars.

y

GDP per capita, en miles de US dollars.

Source

Los datos fueron obtenidos desde la web https://data.worldbank.org/indicator/NY.GDP.PCAP.CD

Examples

data(gdpcap)    
head(gdpcap) 
unique(gdpcap$pais)
hist(gdpcap$y, breaks=20,xlab='PIB per capita (miles de US$)', col='orange', las=1)

Function to compute the geometric mean of a numeric vector

Description

Computes the geometric mean of a numeric vector. It is the n-th root of the product of n numbers, as follows.

y_g = \left(\prod_{i=1}^{n} y_i\right)^{1/n}

for y_i > 0. The geometric mean can be used a central position statistics of a random variable.

Usage

gmean(v)

Arguments

v

is a numeric vector

Details

Notice that can only be computed for positive values. For negative values, there are alternatives, but not covered here.

Value

This function returns the geometric mean, a numeric scalar.

Author(s)

Christian Salas-Eljatib.

References

Salas-Eljatib, C. 2021. Análisis de datos con el programa estadístico R: una introducción aplicada. Ediciones Universidad Mayor, Santiago, Chile. 170 p. https://eljatib.com/rlibro

Examples


y.var <- runif(10, min=10, max=45)
gmean(y.var)

Tree height growth of Douglas-fir sample trees in the Northwest of the United States

Description

Data contains 148 observations on the height growth of dominant trees of Pseudotsguga mensiezzi in the Northwest of the United States.

Usage

data(hgrdfir)

Format

The data frame contains seven variables as follows:

natfor.id

Code identifier.

plot.code

Plot number identification

tree.code

Tree number identification.

dbh

Diameter at breast height at sampling, in in.

toth

Total height at sa,pling, in ft.

age

Age of tree, yr.

height

Height at a given age, in ft.

Source

The data were provided by Dr Christian Salas.

References

Examples

data(hgrdfir)
head(hgrdfir)
unique(hgrdfir$tree.code)
table(hgrdfir$plot.code,hgrdfir$tree.code)
tapply(hgrdfir$dbh, hgrdfir$tree.code, mean)
tapply(hgrdfir$dbh, hgrdfir$tree.code, mean) #dbh of each sample tree
tapply(hgrdfir$toth, hgrdfir$tree.code, mean) #toth of each sample tree

Crecimiento en altura de una muestra de árboles en los Estados Unidos

Description

Data contiene 148 obserrvaciones sobre el crecimiento en altura de árboles dominantes de Pseudotsguga mensiezzi en el Nor-Oeste de los Estados Unidos

Usage

data(hgrdfir2)

Format

La data frame contiene siete variables:

bosque.id

Codigo identificador del bosque.

parcela

Codigo identificador de la parcela.

arbol

Número de identificacion árbol.

dap

Diámetro a la altura del pecho, en pulgadas.

atot

Altura total, en pies

edad

Edad, en os

altura

Altura para cada edad del árbol, en pies

Source

La data fue cedida por el Dr Christian Salas-Eljatib.

References

Monserud RA. 1984. Height growth and site index curves for Inland Douglas-fir based on stem analysis data and forest habitat type. Forest Science 30(4):943-965.

Salas C, Stage AR, and Robinson AP. 2008. Modeling effects of overstory density and competing vegetation on tree height growth. Forest Science 54(1):107-122. doi:10.1093/forestscience/54.1.107

Examples

data(hgrdfir2)
head(hgrdfir2)
unique(hgrdfir2$arbol.id)
table(hgrdfir2$parcela,hgrdfir2$arbol.id)
tapply(hgrdfir2$dap, hgrdfir2$arbol.id, mean) #dap de cada arbol muestra
tapply(hgrdfir2$atot, hgrdfir2$arbol.id, mean) #atot de cada arbol muestra

Function for building a figure having both an histogram and a boxplot for a single random variable

Description

The function creates a figure having both an histogram and a boxplot for a random variable, as a way to help understanding its distribution.

Usage

histbxp(
  y = y,
  freqlab = "Frequency",
  varlab = "Variable",
  eng = TRUE,
  refval = NA,
  print.refval = FALSE,
  col.hist = "gray",
  col.bxp = "gray",
  portrait = TRUE,
  oma = c(3, 0.5, 2, 0),
  mar = c(1, 4, 0.2, 1),
  cex.varlab = 1.2,
  refval.symbol = expression(bar(y)),
  col.refval = "blue",
  varlim = NA,
  freqlim = NA
)

Arguments

y

A numeric vector representing the random variable.

freqlab

(optional) A string specifying the frequency label. The default is set to "Frequency".

varlab

(optional) A string specifying the random variable label. The default is set to "Variable".

eng

logical; if "TRUE" (by default), the language of some default text will be in English; if "FALSE" will be in Spanish. The default is to "TRUE".

refval

A numeric value to be used for printing as reference for the random variable. By default is set to the mean of the variable y.

print.refval

A logical statement to define whether a reference value should be printed, if set to TRUE, the mean of the y vector will be plotted. The default is FALSE.

col.hist

A string specifying the histogram color. The default is "gray".

col.bxp

A string specifying the boxplot color. The default is "gray".

portrait

A logical statement, if set to TRUE, the boxplot will be located under the histogram (2 rows, 1 column). If is set to FALSE, the boxplot will be located next to the histogram (1 row, 2 columns). The default is TRUE.

oma

As in the plot environment. The default is c(3, .5, 2, 0).

mar

As in the plot environment. The default is c(1, 4.0, 0.2, 1).

cex.varlab

A numeric value for the cex option of plotting to the assigned varlab element. The default value is set to 1.2 .

refval.symbol

A string of type expression with name of the refval being printed, if print.refval is set to TRUE. The default is expression(bar(y)).

col.refval

A string specifying the refval.symbol color, if print.refval is set to TRUE. The default is "blue"

varlim

(optional) A numeric vector having the minimum and maximum, respectively for the random variable.

freqlim

(optional) A numeric vector having the minimum and maximum, respectively for the frequency axis.

Details

The variable must be numeric.

Value

The function returns the above described graph.

Author(s)

Christian Salas-Eljatib

References

Examples

df <- datana::fishgrowth
histbxp(y=df$length)

### distribution of 'length'
## with mean refval
histbxp(y=df$length, print.refval = TRUE)

## with given refval
histbxp(y=df$length, print.refval = TRUE, refval = 250)

## changing labels
histbxp(y=df$length, print.refval = TRUE, refval = 250,
        freqlab = "FREQ", varlab = "LENGTH")

## changing colors
histbxp(y=df$length, print.refval = TRUE, refval = 250,
        freqlab = "FREQ", varlab = "LENGTH",
        col.hist = "blue",
        col.bxp = "green",
        col.refval = "red")


### distribution of 'scale'
## with mean refval
histbxp(y=df$scale, print.refval = TRUE)

## landscape mode
histbxp(y=df$scale, print.refval = TRUE, portrait = FALSE)

## with limits
histbxp(y=df$scale, print.refval = TRUE, portrait = FALSE,
        freqlim = c(0,100),
        varlim = c(0, max(df$scale)))


Function to compute the harmonic mean of a numeric vector

Description

Computes the harmonic mean of a numeric vector. It is the inverse of the mean of the recriprocals of n numbers, as follows.

y_h = \frac{n}{\left(\sum_{i=1}^{n} \frac{1}{y_i}\right)}

for y_i \neq 0. The harmonic mean can be used a central position statistics of a random variable.

Usage

hmean(v)

Arguments

v

is a numeric vector

Details

Notice that can only be computed for values different from cero.

Value

This function returns the harmonic mean, a numeric scalar.

Author(s)

Christian Salas-Eljatib.

References

Salas-Eljatib, C. 2021. Análisis de datos con el programa estadístico R: una introducción aplicada. Ediciones Universidad Mayor, Santiago, Chile. 170 p. https://eljatib.com/rlibro

Examples


y.var <- runif(10, min=10, max=45)
hmean(y.var)

Tree height-diameter data from Idaho (USA)

Description

These data are forest inventory measures from the Upper Flat Creek stand of the University of Idaho Experimental Forest, dated 1991.

Usage

data(idahohd)

Format

Contains five variables, as follows:

plot

Plot number.

tree

Tree within plot.

species

A factor with levels DF = Douglas-fir, GF = Grand fir, SF = Subalpine fir, WL = Western larch, WC = Western red cedar, WP = White pine.

dbh

Diameter 137 cm perpendicular to the bole, cm.

height

Height of the tree, in m.

Source

The data were assembled from the 'ufc' dataframe from the alr4 library.

References

Weisberg S. 2014. Applied Linear Regression. 4th edition. New York: Wiley.

Examples

data(idahohd)    
head(idahohd) 
plot(height~dbh, data=idahohd)

Altura-diámetro de árboles en el estado de Idaho (USA)

Description

Estos datos provienen de un muestreo en el bosque experimental de la University of Idaho, en Upper Flat Creek, Idaho, USA. Medido en 1991.

Usage

data(idahohd2)

Format

Contiene cinco variables detalladas a continuación:

parce

Número de la parcela de muestreo.

arbol

Número del árbol dentro de la parcela.

spp

Especie del árbol, una variable factor con niveles DF = Douglas-fir, GF = Grand fir, SF = Subalpine fir, WL = Western larch, WC = Western red cedar, WP = White pine.

dap

Diámetro del fuste a los 1.3 m sobre el suelo, en cm.

atot

Altura del árbol, en m.

Source

Los datos fueron obtenidos desde la dataframe 'ufc' de la librería alr4.

References

Weisberg S. 2014. Applied Linear Regression. 4th edition. New York: Wiley.

Examples

data(idahohd2)    
head(idahohd2)
plot(atot~dap, data=idahohd2)

Índice Mensual de Actividad Económica (IMACEC)

Description

Base de datos con el Índice Mensual de Actividad Económica (IMACEC) de Chile, que incluye información desde enero de 1997 en adelante. La base cuenta con 340 observaciones, que representan meses, e incorpora diversas desagregaciones sectoriales. La variable principal es el IMACEC mensual, que representa una estimación de la evolución de la actividad económica del país respecto al mismo mes del año anterior.

Usage

data(imacec2)

Format

Variables se describen a continuación:

fecha

Fecha de la observación (formato Date, primer día del mes)

anho

Año de la observación

mes

Mes de la observación

imacec

Índice mensual de actividad económica total

crec.prod

Crecimiento del sector producción de bienes

crec.min

Crecimiento del sector minería

crec.ind

Crecimiento del sector industrial

crec.rest

Crecimiento del resto de bienes no mineros ni industriales

crec.com

Crecimiento del sector comercio

crec.serv

Crecimiento del sector servicios

imacec.fac

IMACEC ajustado por costo de factores

crec.imp

Crecimiento de los impuestos sobre los productos

imacec.nomin

Índice de actividad económica excluyendo minería

Source

Banco Central de Chile. Datos extraídos de la serie histórica de indicadores mensuales. Los datos fueron digitados por Saúl Ketterer, estudiante del Prof. Christian Salas-Eljatib.

References

Examples

data(imacec2)
head(imacec2)

Computes the sample kurtosis of a distribution

Description

The kurtosis is about the tailedness, or the degree of heaviness of the tails, in the frequency distribution. The function computes an estimator of the kurtosis.

Usage

kurto(x, na.rm = TRUE)

Arguments

x

a numeric vector of a random variable.

na.rm

logical operator to remove NA values. The default is set to TRUE.

Details

The kurtosis of a random variable is the fourth moment of the standardized variable. There are several ways of parameterizing a kurtosis estimator, such as depending on the fourth moment and the standard deviation of the random variable.

Value

An estimator of the kurtosis.

Author(s)

Christian Salas-Eljatib

Examples

y.var<-rnorm(100);x.var<-rbeta(100,.2,2)
kurto(y.var)
kurto(x.var)


Land-cover, environmental and sociodemographic data for the 34 municipalities composing the Greater Santiago area, Santiago, Chile.

Description

dataset contains 476 observations, 34 categorical and 442 numerical. Land-cover data was generated through remote sensing classification techniques using Sentinel-2 satellite images from year 2016. Temperatures were obtained from TIRS band 10 of Landsat 8 satellites images. Particulate matter concentrations were estimated using spatial modelling techniques from 10 pollution stations distributed in the city. Altitude was generated from a Digital Elevation Model. Population and poverty were gathered from Casen 2017 survey.

Usage

data(landcover)

Format

The data frame contains four variables as follows:

county

Name of Municipality

built.p

Percentage of surface covered by built-up area

vegeta.p

Percentage of surface covered by vegetation

naked.p

Percentage of surface covered by bare soil

grass.p

Percentage of surface covered by deciduous vegetation

p.Deciduo

Percentage of surface covered by evergreen vegetation

p.Siempreverde

Percentage of surface covered by evergreen vegetation

temp.winter

Land surface temperature in celsius degrees at 2pm on a winter 0% cloud day

temp.summer

Land surface temperature in celsius degrees at 2pm on a summer 0% cloud day

pm10.winter

Average particulate matter 10 micron during winter months

pm10.summer

Average particulate matter 10 micron during summer months

poor.p

Percentage of people under poverty line year 2017.

eleva

Average altitude of municipal area.

pop

Total population of municipality

Source

Data were provided by Dr Ignacio Fernandez at Universidad Adolfo Ibañez (Santiago, Chile).

References

Not yet

Examples

data(landcover)
head(landcover)

Cobertura territorial, ambiental y sociodemografica de los 34 municipios que componen el area del Gran Santiago, Santiago, Chile..

Description

El conjunto de datos contiene 476 observaciones, 34 categoricas y 442 numericas. Los datos de cobertura terrestre se generaron mediante tecnicas de clasificacion de teledeteccion utilizando imagenes de satelite Sentinel-2 del año 2016. Las temperaturas se obtuvieron de la banda TIRS 10 de las imagenes de los satelites Landsat 8. Las concentraciones de material particulado se estimaron mediante tecnicas de modelado espacial de 10 estaciones de contaminacion distribuidas en la ciudad. La altitud se genero a partir de un modelo de elevacion digital. La poblacion y la pobreza se obtuvieron de la encuesta Casen 2017.

Usage

data(landcover2)

Format

Variables se describen a continuacion:

comuna

Name of Municipality

const.p

Porcentaje de superficie cubierta por area construida

vegeta.p

Porcentaje de superficie cubierta por vegetacion

desnu.p

Porcentaje de superficie cubierta por suelo desnudo

pasto.p

Porcentaje de superficie cubierta por cesped

deci.p

Porcentaje de superficie cubierta por vegetacion de hoja caduca

sverde.p

Porcentaje de superficie cubierta por vegetacion siempre verde

temp.inv

Temperatura de la superficie terrestre en grados celsius a las 2 p.m.en un dia de invierno con 0% de nubes

temp.ver

Temperatura de la superficie de la tierra en grados celsius a las 2 p.m.en un dia de verano con 0% de nubes

pm10.inv

Material particulado promedio de 10 micrones durante los meses de invierno

pm10.ver

Material particulado promedio de 10 micrones durante los meses de verano

pobreza.p

Porcentaje de personas por debajo de la linea de pobreza año 2017

altitud

Altitud media del termino municipal

pob

Poblacion total del municipio

Source

Los datos fueron cedidos por el Dr Ignacio Fernandez de la Universidad Adolfo Ibañez (Santiago, Chile).

References

Not yet

Examples

data(landcover2)
head(landcover2)

Large trees in forests near Tolga, in Eastern Norway.

Description

The study area is situated in the municipality of Tolga, located in Hedmark County, Eastern Norway. Field plots 32 m × 32 m in size were established in forests. A total of 1109 plots were sampled. In each plot, Scots pines (Pinus sylvestris L.). trees with a stem diameter larger than 35 cm were measured and counted.

Usage

data(largetrees)

Format

Contains two variables, as follows:

plot

Plot code.

y

Number of large-diameter trees in a given sample plot.

Source

Although Christian Salas was part of the study, he just reproduced the needed data to mimic the distribution of the random variable of interest, as shown in the study of Korkhonen et al (2016).

References

Examples

data(largetrees)
head(largetrees)
hist(largetrees$y)

Árboles grandes en bosques cercanos a Tolga, en el Este de Noruega.

Description

El área de estudio esta ubicada en la municiplaidad de Tolga, en la comuna de Hedmark, al Este de Noruega. 1109 parcelas de muestreo de 32 m × 32 m se establecieron en los bosques. En cada parcela, los árboles de pino escoses (Pinus sylvestris L.). que tuvieran un diámetro mayor a 35 cm fueron medidos y contados.

Usage

data(largetrees2)

Format

Los datos poseen las siguientes dos columnas:

parc

Identificador de la parcela de muestreo.

y

Número de árboles de gran diámetro encontrados en una parcela de muestreo.

Source

Aunque el Prof. Christian Salas fue parte del estudio, acá se han reproducido los datos necesarios que imitan la distribución de la variable aleatoria de interés, tal como se muestra en el estudio de Korkhonen et al (2016).

References

Examples

data(largetrees2)
head(largetrees2)
hist(largetrees2$y)

Esperanza de vida de paises

Description

El repositorio del Observatorio Mundial de la Salud (GHO) de la Organización Mundial de la Salud (WHO) mantiene un registro del estado de salud como también otros factores relacionados, para todos los países. Las bases de datos son publicadas con el objetivo de analizarlos. La base de datos de esperanza de vida ha sido compilada en conjunto con datos económicos de las Naciones Unidas.

Usage

data(lifexpect)

Format

Este set de datos contiene 22 columnas:

country

País de origen

year

Año

status

Categoría del país Desarrollado/En desarrollo

life.expectancy

Esperanza de vida en años

adult.mortality

Mortalidad en adultos expresado como la probabilidad de morir entre 15 y 60 años de edad por cada 1000 habitantes

infant.deaths

Mortalidad en infantes cada 1000 habitantes

alcohol

Consumo de alcohol percapita en mayores de 15 años

percentage.expenditure

Porcentaje de vacunación

hepatitis.b

Porcentaje de vacunación contra hepatitis b

measles

Casos de sarampión cada 1000 habitantes

bmi

Índice de masa corporal (BMI) promedio

under.five.deaths

Muertes de menores de 5 años cada 1000 habitantes

polio

Porcentaje de vacunación contra polio

total.expenditure

Inversión en salud como porcentaje del GDP per cápita

diphtheria

Porcentaje de vacunación contra diphteria

hiv.aids

Porcentaje casos de VIH, ETS

gdp

GDP per cápita en USD

population

Población total

thinness10.19

Desnutrición entre 10 y 19 años de edad

thinness5.9

Desnutrición entre 5 y 9 años de edad

icr

Índice de desarrollo humano en términos de composición de ingresos

schooling

Promedio de años de educación

Source

Los datos fueron obtenidos desde la web https://rpubs.com/Alvian2022/LifeExpectancy. Note que solo los datos del año 2014 son utilizados acá.

Examples

data(lifexpect)    
head(lifexpect) 
table(lifexpect$status)
tapply(lifexpect$life.expectancy, lifexpect$status,mean)

Tree locations for a sample plot in the Llancahue experimental forest

Description

The Cartesian position, species, and diameter of trees within a plot were measured. The sample plot is rectangular of 130 m by 70 m. Further details can be #' reviewed in the reference.

Usage

data(llancahue)

Format

Contains tree-level variables, as follows:

tree.code

Tree identificator

spp

species abreviation as follows: AP=Aextocicon puncatatum, EC=Eucryphia cordifolia, GA=Gevuina avellana, LP=Laureliopsis philippiana, LS=Laurelia sempervirens, ND=Nothofagus dombeyi, Ot=Other, PS=Podocarpus saligna

dbh

diameter at breast height, in cm.

x.coord

Cartesian position in the X-axis, in m.

y.coord

Cartesian position in the Y-axis, in m.

Source

The data are provided courtesy of Prof. Daniel Soto at Universidad de Aysen (Coyhaique, Chile).

References

Examples

data(llancahue)    
head(llancahue) 
descstat(llancahue$dbh)
boxplot(dbh~spp, data=llancahue)

Ubicación cartesiana de árboles en el bosque de Llancahue

Description

Corresponde a la posición cartesiana, especie, y diámetro de árboles en una parcela de muestreo en el bosque de Llancahue, cerca de Valdivia, Chile. La parcela es rectangular con dimensiones de 130 m por 70 m. Mayores antecedentes aparecen en las referencias.

Usage

data(llancahue2)

Format

Contains tree-level variables, as follows:

arb.id

Identificador del árbol.

spp

Codificación de la especie como sigue: AP= Aextocicon puncatatum, EC=Eucryphia cordifolia, GA=Gevuina avellana, LP=Laureliopsis philippiana, LS=Laurelia sempervirens, ND=Nothofagus dombeyi, Ot=Other, PS=Podocarpus saligna.

dap

Diámetro a la altura del pecho, en cm.

coord.x

Posición cartesiana en el eje-X, en m.

coord.y

Posición cartesiana en el eje-Y, en m.

Source

Los datos fueron cedidos por el Prof. Daniel Soto de Universidad de Aysen (Coyhaique, Chile).

References

Examples

data(llancahue2)    
head(llancahue2) 
descstat(llancahue2$dap)
boxplot(dap~spp, data=llancahue2)

Performs a likelihood ratio test between two models being fitted by maximum likelihood.

Description

Function to perform a likelihood ratio test (LRT) between a reduced model (modA) versus a more complex model (modB), provided both models were fitted by maximum likelihood. The function requires to be filled with the needed values used to perform a LRT.

Usage

lrt(
  llma = llma,
  llmb = llmb,
  qa = qa,
  qb = qb,
  nfit = nfit,
  modA = "modA",
  modB = "modB",
  alpha = 0.05
)

Arguments

llma

maximized log-likelihood of the reduced model (or modA).

llmb

maximized log-likelihood of the more-complex model (or modB).

qa

the number of parameters of the reduced model.

qb

the number of parameters of the more-complex model.

nfit

the sample size used for fitted both models.

modA

is a character with a name to be assigned to object modA.

modB

is a character with a name to be assigned to object modB.

alpha

is the level of sifnificance to used for computing as a reference only, the tabulated value of the respective Chi-Squared statistic. By the defaul is set to 0.05.

Details

The resulting output offers statistical inference estimates of the LRT, as well as other maximum likelihood-based statistics. Notice that the function only works if the number of parameters for modA is lower than the ones of modB.

Value

This function wraps two outputs: (i) a table that computes the AIC, BIC and AICc goodness-of-fit statistics for both models, and (ii) the result of the likelihood ratio test, such as the value of the statistic being computed, its respective p-value, and the tabulated value of the statistics using the a defined alpha significance of level.

Author(s)

Christian Salas-Eljatib.

References

Salas-Eljatib, C. 2025. Estadística Aplicada e Inferencial. Borrador de libro, Universidad de Chile, Santiago, Chile. https://eljatib.com/rlibro

Examples


#Maximized values for two probability mass functions
max.ll.pois<- -39.86337; max.ll.bneg<--33.823003
c(max.ll.pois,max.ll.bneg)
sample.size<-26
#Number of parameters
num.para.pois<- 1; num.para.bneg<- 3
c(num.para.pois, num.para.bneg)
#Names to be used for each model
 modA="Poisson"; modB="hiper"
outall<-lrt(llma=max.ll.pois,llmb=max.ll.bneg,qa=num.para.pois,
qb=num.para.bneg,nfit = sample.size,modA = "Poisson",
modB = "Hipergeometrico")
#Output1: A comparative table 
tab.out<-outall$tab.models
tab.out
#Output2: the results of the LRT
out<-outall$lrt.out
out$r.tab
out$Ldif

Computes a likelihood ratio test between a reduced model and a full model. Both models must be already fitted using and R function.

Description

Computes a likelihood ratio test between a reduced model (modr) and a full model (modr). Both models must be previously fitted by maximum likelihood using an R function such as nlme() and such, that are part of the generalized lineal models.

Usage

lrt.glm(modr, modf)

Arguments

modr

is the object containing a previously fitted reduced model, using a glm-type of function, having less parameters than modf.

modf

is the object containing a previously fitted full model, using a glm-type of function, having more parameters than modr.

Details

Double-check the order of the reduced and full model, before of using the model

Value

This function returns an object having the following elements: "loglik.Modr" maximized log-likelihood of modr; "loglik.Modf" maximized log-likelihood of modf; "dif.loglik" difference in log-likelihood between both models, and "dif.df" difference in degrees of freedong of both models, and "p-value" is the p-value for the LRT.

Author(s)

Christian Salas-Eljatib.

References

Pinheiro JC, and Bates DM. 2000. Mixed-effects models in S and Splus. Springer-Verlag, New York, NY. 528 p.

Examples


#not yet implemented

Computes the mode

Description

Computes the mode of a random variable.

Usage

moda(y = y)

Arguments

y

is a numeric vector.

Details

The mode is an statistics representing the most "used" value of the random variable as a way of central position.

Value

The function returns the mode, a numeric scalar.

Author(s)

Christian Salas-Eljatib.

Examples


set.seed(1234)
variable <- rnorm(10, mean=45,sd=6)
#using the function
moda(y=variable)
moda(variable)

Productividad científica de estudiantes de postgrado

Description

Corresponde a un estudio realizado en la Universidad de Indiana, sobre el número de papers publicados por estudiantes egresados de programas de doctorado en bioquímica luego de 3 años.

Usage

data(papersdocstu)

Format

Este set de datos contiene las siguientes columnas:

papers

Es el número de artículos cientificos publicados luego de 3 años de egresado.

genero

Hombre/mujer.

est.civil

Estado civil del egresado.

nin.men5

Número de hijos menores a 6 años que dependen del egresado.

prog.prest

Puntaje asignado al prestigio del programa de postgrado.

papers.guia

Número de papers publicados por el profesor(a) guía del egresado, en el mismo periodo de tiempo.

Source

Los datos fueron obtenidos desde el paquete 'AER'.

References

Long, J.S. (1997). The Origin of Sex Differences in Science.

Examples

data(papersdocstu)
df<-papersdocstu    
head(df)
barplot(table(df$papers),xlab="Numero de papers publicados",
   ylab="Frecuencia (num. de estudiantes)")
table(df$genero)
table(df$est.civil,df$genero)
tapply(df$papers,df$est.civil,summary)

Peso de hojas

Description

Peso de hojas

Usage

pesohojas

Format

Data frame con 64 filas y 2 columnas:

peso

peso foliar en gramos (g)

area

área foliar en centímetros cuadrados (cm²)

Examples

data(pesohojas)
plot(peso~area, data = pesohojas)

Presence or absence of sea ice from logbook records of annual cruises

Description

Data containing 52717 observations about presence of sea ice from logbook records of annual cruises to the B-C-B in an unbroken record between years 1850 to 1910.

Usage

data(presenceIce)

Format

The dataframe contains the following columns:

ship.id

The code number for ships.

move.type

Type of movement of ships. 0 indicates a sail-powered vessel and 1 indicates an auxiliary-powered vessel.

year

Year of registry.

month

Month of registry.

day

Day of registry.

lat.dec

Decimal latitude.

long.dec

Decimal longitude.

e.w

East or west of the Prime Meridian.

ice.cov

Sea Ice Observed. 0 no see (Not registered) and 1 presence sea ice (Registered).

Source

The data were provided from Sea Ice Group at the Geophysical Institute.

References

Mahoney A, Bockstoce J, Botkin D, Eicken H, Nisbet R. 2011. Sea-Ice Distribution in the Bering and Chukchi Seas: Information from Historical Whaleships' Logbooks and Journals ARCTIC. 64(4): 465-477.

Examples

data(presenceIce)
head(presenceIce)

Eleccion presidencial del 2021 en Chile.

Description

Datos de mesa de la eleccion presidencial del 2012 en Chile. La eleccion se llevo a cabo el 19 de Diciembre del 2021.

Usage

data(president)

Format

Los datos contienen las siguientes columnas:

region.no

Número de la region adminsitrativa de Chile.

region

Nombre de la region administrativa de Chile

provincia

Provincia.

circu.senatorial

Circunscripcion senatorial.

distrito

Distrit.

comuna

County.

circu.elec

Circunscripcion electoral.

local

Local de votacion. Generalmente es un colegio.

no.mesa

Número de mesa.

tipo.mesa

Tipo de mesa de votacion.

mesas.fusionadas

Mesa de votacion fucionada.

electores

Electores.

nro.en.voto

.

candidato

Candidato, ya sea Gabriel Boric o Jose A. Kast

votos.tricel

Número total de votos segun el TRICEL (Tribunal calificador de elecciones).

Source

Los datos fueron obtenidos desde el sitio web del Servicio Electoral del Gobierno de Chilean (SERVEL) en https://www.servel.cl. El archivo de datos descargado el 24 de Octubre del 2022 tenia el nombre Resultados mesa presidencial TRICEL 2v 2021-1.xlsx.

Examples

data(president)
head(president)

Elección primaria para la presidencia de Chile

Description

Datos a nivel de mesa de la votación para elecciones primarias para Presidente de Chile en 2021.

Usage

data(primarias)

Format

Este set de datos contiene las siguientes columnas:

region.no

Región administrativa de Chile.

region

Nombre de la región.

provincia

Provincia.

distrito

Distrito.

comuna

Comuna.

circu.elec

Circunscripción electoral.

local

Local de votación.

tipo.mesa

tipo de mesa.

mesa

Código identificador de la mesa.

mesas.fusionadas

Mesas fusionadas.

nro.voto

.

lista

Lista política del candidato.

pacto

Pacto político del candidato.

partido

Partido político del candidato.

candidato

Nombre del candidato.

votos

Número total de votos.

Source

Los datos fueron obtenidos desde el servicio electoral de Chile (SERVEL) en el web https://www.servel.cl. El nombre del archivo era Resultados Primarias Presidenciales 2021 CHILE.xlsx, y fue descargado el 4 de octubre del 2022. Los datos fueron ordenados, y solo aquellas filas que contenian información en la columna 'votos' son parte de la dataframe.

Examples

data(primarias)
head(primarias)
table(primarias$region)
table(primarias$region,primarias$candidato)
tapply(primarias$votos,primarias$candidato,sum)

Ubicación cartesiana de árboles en el bosque de Llancahue para uso del libro.

Description

Corresponde a la posición cartesiana, especie, y diámetro de árboles en una parcela de muestreo en el bosque de Llancahue, cerca de Valdivia, Chile. La parcela es rectangular con dimensiones de 130 m por 70 m. Mayores antecedentes aparecen en las referencias.

Usage

data(pspLlancahue)

Format

Contains tree-level variables, as follows:

arb.id

Identificador del árbol.

spp

Codificación de la especie como sigue: AP= Aextocicon puncatatum, EC=Eucryphia cordifolia, GA=Gevuina avellana, LP=Laureliopsis philippiana, LS=Laurelia sempervirens, ND=Nothofagus dombeyi, Ot=Other, PS=Podocarpus saligna.

dap

Diámetro a la altura del pecho, en cm.

coord.x

Posición cartesiana en el eje-X, en m.

coord.y

Posición cartesiana en el eje-Y, en m.

Source

Los datos fueron cedidos por el Prof. Daniel Soto de Universidad de Aysen (Coyhaique, Chile).

References

Examples

data(pspLlancahue)    
head(pspLlancahue) 
descstat(pspLlancahue$dap)
boxplot(dap~spp, data=pspLlancahue)

Tree spatial coordinates in the Rucamanque forest

Description

Tree-level variables and spatial coordinates in a permanent sample plot of 1 ha (100 x 100m) in the Rucamanque experimental forest, near Temuco, Chile.

Usage

data(pspruca)

Format

The data frame contains four variables for the standing-alive trees as follows:

tree.no

tree number

species

Species name, "N. obliqua" is Nothofagus obliqua, "Ap" is Aexitocicum puncatatum, etc.

crown.class

Crown class (1: superior, 2: intermediate, 3; inferior)

dbh

diameter at breast-height, in cm

x.coord

Cartesian position at the X-axis, in m

y.coord

Cartesian position at the Y-axis, in m

Source

Data were provided by Dr Christian Salas-Eljatib (Universidad de Chile, Santiago, Chile).

References

Salas C, LeMay V, Nunez P, Pacheco P, and Espinosa A. 2006. Spatial patterns in an old-growth Nothofagus obliqua forest in south-central Chile. Forest Ecology and Management 231(1-3): 38-46. doi:10.1016/j.foreco.2006.04.037

Examples

data(pspruca)
head(pspruca)
table(pspruca$species)

Ubicación espacial de árboles en el bosque de Rucamanque

Description

Medidas a nivel de árbol y coordenadas espaciales en un parcela de muestreo permanente de 1 ha (100 x 100m) en el bosque de Rucamanque, cerca de Temuco, Chile. Mayores antecedentes en las referencias.

Usage

data(pspruca2)

Format

Las columnas describen características de los árboles vivos en pie, como sigue:

arbol

Número del árbol

especie

Nombre de la especie, "N. obliqua" es Nothofagus obliqua, "Ap" es Aexitocicum puncatatum, etc.

clase.copa

Clase de copa (1: superior, 2: intermedio, 3; inferior)

dap

Diámetro a la altura del pecho, en cm

coord.x

Posicion cartesiana en el eje X, en m

coord.y

Posicion cartesiana en el eje Y, en m

Source

Los datos fueron cedidos por el Dr Christian Salas-Eljatib (Santiago, Chile).

References

Salas C, LeMay V, Nunez P, Pacheco P, and Espinosa A. 2006. Spatial patterns in an old-growth Nothofagus obliqua forest in south-central Chile. Forest Ecology and Management 231(1-3): 38-46. doi:10.1016/j.foreco.2006.04.037

Examples

data(pspruca2)
table(pspruca2$especie)

Height growth of Pinus taeda (Loblolly pine) trees

Description

The Loblolly data frame has 84 rows and tree columns of records of the tree height growth of Loblolly pine trees. This dataframe is a slight modification to the original dataframe "Loblolly" from the datasets R package.

Usage

data(ptaeda, package="datana")

Format

A dataframe containing the following columns:

seed.id

an ordered factor indicating the seed source for the tree. The ordering is according to increasing maximum height.

age

a numeric vector of tree ages, in yr.

toth

a numeric vector of tree heights, in m.

Source

Pinheiro, J. C. and Bates, D. M. (2000) Mixed-effects Models in S and S-PLUS. Springer.

Examples


data(ptaeda, package="datana")
head(ptaeda)
plot(toth ~ age, data = subset(ptaeda, seed.id == 329),
     xlab = "Age (yr)", las = 1,
     ylab = "Height (m)")

Crecimiento en altura de Pinus taeda

Description

Esta dataframe contiene 84 folas y tres columnas de crecimiento en altura de árboles de Pinus taeda (Loblolly pine). Es una modificación de la dataframe "Loblolly" del paquete 'datasets' de R.

Usage

data(ptaeda2)

Format

Los datos contienen las siguientes columnas:

semilla.id

Un factor indicando el origen de la semilla del árbol.

edad

Edad del árbol, en años.

atot

Altura total, en m.

Source

Pinheiro, J. C. and Bates, D. M. (2000) Mixed-effects Models in S and S-PLUS. Springer.

Examples


data(ptaeda2, package="datana")
head(ptaeda2)
plot(atot ~ edad, data = subset(ptaeda2, semilla.id == 329),
     xlab = "Edad (años)", las = 1,
     ylab = "Altura (m)")

Obtain the P-value for a Standard t-distributed random variable

Description

Function to compute the P-value for a Standard t-distributed random variable.

Usage

pvalt(t.value, df, decnum = 14)

Arguments

t.value

A numeric random variable following a t-student pdf distribution.

df

degrees of freedom of the random variable following a t-student pdf distribution.

decnum

the number of decimals to be used in the output. The default is set to 5.

Details

It is suited to compute the P-value for any random variable following a Standard t probability density function (pdf). For instance, to obtain the p-value in a t-test.

Value

The function returns the P-value or probability of getting a value as large as t.value.

Author(s)

Christian Salas-Eljatib

Examples

# Load dataset
 df <- datana::fertiliza2
 head(df)
 ## Computes the t-test statistics (from the 'stats' package)
t.value <- stats::t.test(df$vol)
t.value
 t.v <- as.numeric(t.value$statistic);t.v
 deg.f <- as.numeric(t.value$parameter);deg.f

 ## Obtaining the p ##  pvalt(t.v,deg.f)


Obtain the P-value for a Standard Gaussian random variable

Description

Function to computes the P-value for a Standard Gaussian random variable.

Usage

pvalz(zval, decnum = 5)

Arguments

zval

A numeric random variable following a Standard Gaussian distribution.

decnum

the number of decimals to be used in the output. The default is set to 5.

Details

It is suited to compute the P-value for any random variable following a Standard Gaussian probability density function.

Value

This function returns the P-value or probability of getting a value as large as 'zval'.

Author(s)

Christian Salas-Eljatib

Examples


pvalz(1.96)


Datos de precipitación en Californa

Description

Datos de precipitación medidos en distintos lugares de california, con las coordenadas de los puntos y su distancia a la costa.

Usage

data(rainfallCA)

Format

Este set de datos contiene las siguientes columnas:

saple.id

Identificador del punto de muestreo.

easting

Coordenada este del punto.

northing

Coordenada norte del punto.

pp

Precipitación, en pulgadas.

ele

Elevación, en pies.

lat

Latitud del punto.

d.coast

Distancia a la costa, en millas.

Source

Los datos provienen de mediciones hechas en California

Examples

data(rainfallCA)    
head(rainfallCA) 
plot(pp~ele, data=rainfallCA)
hist(rainfallCA$pp)

Height growth of Nothofagus alpina trees in Chile.

Description

Time series data of height for rauli (Nothofagus alpina) trees in south-central Chile. These sampled trees are part of the ones used in Salas-Eljatib (2021, Ecological Applications). The full citation is provided below.

Usage

data(raulihg)

Format

The data frame contains four variables as follows:

tree.code

tree id code

spp

species common name

bha.t

breast-height age, in yrs.

h.t

total height, in m.

Source

Data were provided by Dr Christian Salas-Eljatib (Santiago, Chile).

References

Examples

data(raulihg)
head(raulihg)

Crecimiento en altura de árboles de Nothofagus alpina.

Description

Datos de series de tiempo de altura para árboles muestreados de Nothofagus alpina (raulí) en el centro-sur de Chile. Estos árboles son parte de los usados en Salas-Eljatib (2021, Ecological Applications). La cita completa se da en referencias.

Usage

data(raulihg2)

Format

Contiene variables de nivel individual, como se describen a continuacion::

tree.code

Codigo del árbol

spp

Nombre comun especie

bha.t

Edad a la altura del pecho, en años.

h.t

Altura total, en m.

Source

Datos cedidos por el Prof. Christian Salas-Eljatib.

References

Examples

data(raulihg2)
head(raulihg2)

Rendimiento escolar por estudiante en Chile 2024

Description

Base de datos con información anónima de rendimiento escolar por estudiante, correspondiente al año 2024. Contiene 687033 observaciones de estudiantes de Enseñanza Media Humanístico Científica modalidad Jóvenes, pertenecientes a establecimientos municipales, particulares subvencionados y particulares pagados. Cada fila representa un estudiante y sus características básicas, incluyendo su promedio general, asistencia y situación final del curso.

Usage

data(rendesc2)

Format

Variables se describen a continuación:

region

Región de Chile del registro

comuna

Comuna de la region correspondiente

mrun

Identificador anónimo del estudiante

cod.depe

Código de dependencia administrativa del establecimiento (1 = municipal, 2 = particular subvencionado, 3 = particular pagado)

gen.alu

Género del estudiante (1 = hombre, 2 = mujer)

edad.alu

Edad del estudiante

prom.gral

Promedio general de notas (escala de 1.0 a 7.0)

asistencia

Porcentaje de asistencia anual del estudiante

sit.fin

Situación final del estudiante (P = promovido, R = reprobado)

Source

Ministerio de Educación de Chile (MINEDUC), portal de datos abiertos: https://datosabiertos.mineduc.cl/. Los datos fueron digitados por Saúl Ketterer, estudiante del Prof. Christian Salas-Eljatib.

References

Examples

data(rendesc2)
head(rendesc2)

Puntaje SIMCE 2023 en matemática 4to Básico por RBD

Description

Puntaje promedio por establecimiento del SIMCE 2023 en matemática de 4to Básico. Se tienen 6534 observaciones. La variable binaria (Y) es la presencia de convenio PIE en el establecimiento, donde Y=1 denota presencia y Y=0, lo contrario.

Usage

data(simce2)

Format

Variables se describen a continuación:

rbd

Rol Base de Datos del establecimiento

region

Región del establecimiento

comuna

Comuna del estableciimento

dependencia

Dependencia administrativa del establecimiento

prom.mate4b

Puntaje promedio del establecimiento en la prueba de matemática del SIMCE de 4to básico en 2023

mat.total

Cantidad de estudiantes matriculados en el establecimiento

convenio.pie

Establecimiento tiene convenio PIE (1 si, 0 no)

Source

Datos obtenidos desde la Agencia de Calidad de la Educación del Mineduc y desde el portal de DatosAbiertos del Mineduc (datosabiertos.mineduc.cl). Los datos fueron digitados por Diego Fernández, estudiante del Prof. Christian Salas-Eljatib.

Examples

data(simce2)
head(simce2)

Computes the skewness of a numeric vector

Description

The skewness is about the departure from symmetry of a frequency distribution. Therefore, It is about asymmetry. One way to assess asymmetry of a random variable is to compute an statistics representing its skewness. The current function an dimensionless statistics of the skewness of given vector.

Usage

skewn(x, na.rm = TRUE)

Arguments

x

A numeric vector representing a random variable.

na.rm

Logical value to remove NA values. The default is set to TRUE.

Details

The skewness of a random variable is the third moment of the standardized variable. There are several ways of parameterizing an skewness estimator, such as depending on the third moment and the standard deviation of the random variable.

Value

The value of the the skewness of given vector

Author(s)

Christian Salas-Eljatib.

Examples

y.var<-rnorm(100);x.var<-rbeta(100,.2,2)
skewn(y.var)
skewn(x.var)


Sludge data are at different cities, with a value of concentration zinc.

Description

Dataset contains 36 observations

Usage

data(sludge)

Format

Contains four variables, as follows:

city

Name of city.

rate

Concentration rate of sludge.

zinc

Value of concentration ( in ppm).

trt.comb

Combination between city and rate factors.

Source

The data were provided from.. still remember.

References

not yet

Examples

data(sludge2)    
table(sludge$city,sludge$rate) 
levels(sludge$city)
tapply(sludge$zinc, list(sludge$city,sludge$rate), mean)

Sludge data are at different cities, with a value of concentration zinc.

Description

Datos de contenido de Zinc en el tratamiento de lodos

Usage

data(sludge2)

Format

Contiene las siguinetes cuatro variables:

ciudad

Nombre de la ciudad.

tasa

Tasa de concentracion de lodo.

zinc

Concentracion de Zinc, en ppm.

trt.comb

Identificador de la combinacion de niveles entre los factores ciudad y tasa.

Source

The data were provided from.. still remember.

References

not yet

Examples

data(sludge2)    
table(sludge2$ciudad,sludge2$tasa) 
levels(sludge2$ciudad)
tapply(sludge2$zinc, list(sludge2$ciudad,sludge2$tasa), mean)

On the National System of State Protected Wild Areas (SNASPE) of Chile.

Description

Units of the National System of State Protected Wild Areas (SNASPE).

Usage

data(snaspe)

Format

Contains the following variables:

unit.id

Number for the unit.

unit

Name of the protected area.

category

Category of the unit. It can be either a National Park, a National Reserve or a Natural Monument.

county

Name of the county where the unit is located.

province

Province where the unit is located.

region

Region where the unit is located.

perim.km

Perimeter, in km.

area.ha

Area, in hectares.

area.m2

Area, in m^{2}.

Source

These data are freely available at https://ide.minagri.gob.cl

References

The Chilean SNASPE is under the direction of the Chilean Forest Service (CONAF). Further information and documentation can be found at https://www.conaf.cl

Examples

data(snaspe)
head(snaspe)
table(snaspe$category)
tapply(snaspe$area.ha,snaspe$category,mean)

Sistema nacional de areas protegidas del estado (SNASPE) de Chile

Description

Contiene variables general de las unidades del sistema de areas protegidas por el estado de Chile (SNASPE).

Usage

data(snaspe2)

Format

Contiene las siguientes variables para cada unidad del SNASPE:

uni.id

Número indentificador de la unidad.

unidad

Nombre de la unidad.

categoria

Categoría de la unidad. Puede ser Parque Nacional, Reserva Nacional, o Monumento Natural.

comuna

Nombre de la communa donde esta la unidad.

province

Nombre de la provincia donde esta la unidad.

region

Nombre de la región.

perim.km

Perimetro, en km.

area.ha

Área, en hectareas.

area.m2

Área, en m^{2}.

Source

Estos datos fueron obtenidos desde https://ide.minagri.gob.cl

References

EL SNASPE esta bajo la administración de la Corporación Nacional Forestal (CONAF) de Chile. Mayor información se puede encontrar en https://www.conaf.cl

Examples

data(snaspe2)
head(snaspe2)
table(snaspe2$categoria)
tapply(snaspe2$area.ha,snaspe2$categoria,mean)

Soil treatment experiment in tree seedlings

Description

A test was made of the effect of three soil treatments on the height growth of 2-year-old seedlings. Treatments were assigned at random to the three plots within each of 11 blocks. Each plot was made up of 50 seedlings. Average 5-year height growth was the criterion for evaluating treatments.

Usage

data(soiltreat)

Format

Contains the four following columns, at the plot-level,

block

Block unit.

treat

Treatment level.

ini.h

Initial height, in m.

inc.h

Increment in height during 5-year, in m.

Source

Table in page 71 of Freese (1967). The data were entered by Miss Nayeli Ramirez, a former student of Prof. Christian Salas-Eljatib.

References

Examples

data(soiltreat)
head(soiltreat)
tapply(soiltreat$inc.h,soiltreat$treat,summary)
tapply(soiltreat$inc.h,soiltreat$treat,sd)

Tratamientos del suelo en el crecimiento de plantulas.

Description

Un experimento sobre el efecto de tres tratamientos del suelo en el crecimiento en altura de plantulas de 2-años de edad. Los tratamientos fueron asignados aleatoriamente a tres parcelas dentro de cada uno de 11 bloques. Cada parcela esta constituida por hasta 50 plantulas. El promedio del incremento en altura de los últimos 5 años fue la variable de interes para evaluar los tratamientos.

Usage

data(soiltreat2)

Format

Los datos, a nivel de parcela, tienen las siguientes columnas,

bloque

Bloque del experimento.

tmo

Factor tratamiento, medido en tres nivels.

alt.ini

Altura initial, rn m.

alt.inc

Incremento en altura durante los últimos cinco años, en m.

Source

Cuadro de la página 71 de Freese (1967). Los datos fueron digitados por la Srta. Nayeli Ramirez, una estudiante del Prof. Christian Salas-Eljatib.

References

Examples

data(soiltreat2)
head(soiltreat2)
tapply(soiltreat2$alt.inc,soiltreat2$tmo,summary)
tapply(soiltreat2$alt.inc,soiltreat2$tmo,sd)

Tree locations for several plots of Norway spruce (Picea abies) in Austria

Description

The Austrian Research Center for Forests established a spacing experiment with Norway spruce (Picea abies) in the Vienna Woods. In the 'Hauersteig' experiment, several tree-level variables were measured within four sample plots over time. The current dataframe has only the measurements carried out in 1944.

Usage

data(spataustria)

Format

Contains cartesian position of trees, and covariates, in sample plots, as follows:

plot

Plot number.

tree

Tree number.

species

Species code as follows: PCAB=Picea abies, LADC=Larix decidua, PNSY=Pinus sylvestris, FASY=Fagus Sylvatica, QCPE=Quercus petraea, BTPE=Betula pendula.

x.coord

Cartesian position in the X-axis, in m.

y.coord

Cartesian position in the Y-axis, in m.

year

Measurement year.

dbh

diameter at breast-height, in cm.

References

  1. 109 years of forest growth measurements from individual Norway spruce trees. Sci. Data 5:180077 doi:10.1038/sdata.2018.77

Examples

data(spataustria)
head(spataustria)
df<-spataustria
oldpar<-par(mar=c(4,4,0,0))
bord<-data.frame(
 x=c(min(df$x.coord),max(df$x.coord),min(df$x.coord),max(df$x.coord)),
 y=c(min(df$y.coord),min(df$y.coord),max(df$y.coord),min(df$y.coord))
 )
plot(bord,type="n", xlab="x (m)", ylab="y (m)", asp=1, bty='n')
points(df$x.coord,df$y.coord,col=df$plot,cex=0.5)
par(oldpar)

Creates a LaTeX file having an ANOVA table for a previously fitted linear regression model

Description

Function to create a LaTeX file of an ANOVA table.

Function to create a LaTeX file for a table with the main fitting statistics from a fitted regression model.

Usage

tabtexanova(
  mod = mod,
  nametab = nametab,
  cap = cap,
  save.file = FALSE,
  filename = "tabregre.tex",
  eng = TRUE,
  rowlab = "Source of variation",
  decnum = 3,
  font.size.tab = "normalsize",
  font.type.tab = "normalfont"
)

tabtexregre(
  mod = mod,
  nametab = nametab,
  cap = cap,
  save.file = FALSE,
  filename = "tabregre.tex",
  eng = TRUE,
  rowlab = "Parameter",
  decnum = 3,
  font.size.tab = "normalsize",
  font.type.tab = "normalfont"
)

Arguments

mod

an object containing the fitted model by using the lm() function.

nametab

a string having a brief name to be used in both the label of the table and the file name. For instance, if "=mod1", the table can be refered in your LaTeX document by using ⁠\ref{tab:mod1}⁠

cap

a string having the caption of the LaTeX table.

save.file

The defauls is set to “FALSE”, if is set to TRUE, then the option filename must be provided.

filename

A string having the name of the resulting LaTeX file having the table. The default is set to "tabdescdata.tex".

eng

The language to be used in the output. English is the default, meanwhile if eng=FALSE, Spanish is used.

rowlab

a character with the name to be used as label for the column where the variables will be printed. The default is set to "Parameter".

decnum

the number of decimals to be used in the output. The default is set to 3.

font.size.tab

The defauls is set to "normalsize". You could also try with "footnotesize".

font.type.tab

The defauls is set to "normalfont".

Details

The resulting file is a LaTeX table, that can be added to your main LaTeX document by using ⁠\input{filename}⁠.

The resulting file is a LaTeX table, that can be added to your main LaTeX document by using ⁠\input{filename}⁠.

Value

This function creates a LaTeX file having an ANOVA table, from a fitted regression model.

This function creates a LaTeX file having the main fitting statistics of a linear regression model.

Author(s)

Christian Salas-Eljatib.

References

Salas-Eljatib, C. 2021. Análisis de datos con el programa estadístico R: una introducción aplicada. Ediciones Universidad Mayor, Santiago, Chile. 170 p. https://eljatib.com/rlibro

Salas-Eljatib, C. 2021. Análisis de datos con el programa estadístico R: una introducción aplicada. Ediciones Universidad Mayor, Santiago, Chile. 170 p. https://eljatib.com/rlibro

Examples


df <- datana::fishgrowth2
head(df)
descstat(df[,c("largo","edad")])
plot(largo ~ edad, data=df)
mod1<-lm(largo ~ edad, data=df)
##example 1
tabtexanova(mod=mod1,nametab="anovatab",
cap="ANOVA-style table of the fitted regression model")
##example 2
tabtexanova(mod=mod1,nametab="anovatab",
cap="Cuadro estilo ANOVA para modelo de regresion ajustado",
eng=FALSE)

df <- datana::fishgrowth2
head(df)
datana::descstat(df[,c("largo","edad")])
graphics::plot(largo ~ edad, data=df)
mod1<-stats::lm(largo ~ edad, data=df)
##example 1
tabtexregre(mod=mod1,nametab="basicmodel",
cap="Parameter estimates of the fitted regression model")
##example 2
tabtexregre(mod=mod1,nametab="basicmodel",
cap="Cuadro con parametros estimados del modelo de regresion",
eng=FALSE)

Creates a LaTeX file having a descriptive statistics table for continuous variables

Description

Function to create a LaTeX file for a table of descriptive statistics of continuous variables from a dataframe.

Usage

tabtexdescstat(
  data = data,
  colnames = colnames,
  varnames = varnames,
  cap = cap,
  nametab = nametab,
  save.file = FALSE,
  filename = "tabdescdata.tex",
  eng = TRUE,
  rowlab = "Variable",
  decnum = 3,
  font.size.tab = "normalsize",
  font.type.tab = "normalfont"
)

Arguments

data

a dataframe containing numeric variables as columns.

colnames

a string having the column names of the dataframe to which the descriptive statistics will be computed.

varnames

a string having the name of each of the variables to be used in the LaTeX table.

cap

a string having the caption of the LaTeX table.

nametab

a string having a brief name to be used in both the label of the table and the file name. For instance, if "=descdata", the table can be refered in your LaTeX document by using ⁠\ref{tab:descdata}⁠

save.file

The defauls is set to “FALSE”, if is set to TRUE, then the option filename must be provided.

filename

A string having the name of the resulting LaTeX file having the table. The default is set to "tabdescdata.tex".

eng

The language to be used in the output. English is the default, meanwhile if eng=FALSE, Spanish is used.

rowlab

a character with the name to be used as label for the column where the variables will be printed. The default is set to "Variables".

decnum

the number of decimals to be used in the output. The default is set to 3.

font.size.tab

The defauls is set to "normalsize". You could also try with "footnotesize".

font.type.tab

The defauls is set to "normalfont".

Details

The resulting file is a LaTeX table, that can be added to your main LaTeX document by using ⁠\input{filename}⁠.

Value

This function creates a LaTeX file having the following descriptive statistics: sample size, minimum, maximum, mean, median, SD, and coefficient of variation. If the full option is set to TRUE, the following statistics are added to the table: 25th and 75th percentiles, the interquartile range, skewness, and kurtosis.

Author(s)

Christian Salas-Eljatib.

References

Salas-Eljatib, C. 2021. Análisis de datos con el programa estadístico R: una introducción aplicada. Ediciones Universidad Mayor, Santiago, Chile. 170 p. https://eljatib.com/rlibro

Examples


df <- datana::idahohd
head(df)
##example 1
tabtexdescstat(data=df,nametab="idaho",
cap="Descriptive statistics table",
colnames=c("dbh","height"),varnames = c("Diameter","Height"))
##example 2
tabtexdescstat(data=df,nametab="idaho",
cap="Cuadro con estadistica descriptiva",
 colnames=c("dbh","height"),varnames = c("Diametro","Altura"),
 eng=FALSE)

Produces a time series plot

Description

Produces a time series plot, of variable 'y' as a function of 'x' by an observational unit factor.

Usage

timeserplot(
  data = data,
  y = y,
  x = x,
  obs.unit = obs.unit,
  factor1 = NA,
  factor2 = NA,
  only.lines = FALSE,
  ylab = NA,
  xlab = NA,
  linetype.lab = NA,
  factor2.line = TRUE,
  factor2.col = FALSE,
  col.lines = "black",
  max.y.all = NA,
  levels.i.want = FALSE,
  col.lev.i.want = FALSE
)

Arguments

data

a dataframe with at least tree columns representing the response variable ("y"), the main predictor variable ("x"), and a variable indicating the observational unit ("obs.unit").

y

a character giving the column name of the response variable or variable of interest.

x

a character giving the column name of the main predictor variable. Generally this variable is time.

obs.unit

a character giving the column name containing the info of the observational unit.

factor1

an optional character having the name of a column having a factor variable (e.g., treatment). The detault value is set to NULL.

factor2

an optional character having the name of a column having another factor variable (e.g., species). The detault value is set to NULL.

only.lines

a logic value if only lines, but not including dots, are going to be drwan in the plot. The detault value is set to FALSE.

ylab

Label for the Y-axis

xlab

Label for the X-axis

linetype.lab

is an optional string to be used as the title of the factor being represented by lines. It is only needed if factor1 and factor2 are defined. See example.

factor2.line

a logic value if the second factor, factor2, is going to be segregated according to the type of lines. The detault value is set to TRUE.

factor2.col

a logic value if the second factor, factor2, is going to be segregated according to the color of the lines only. The detault value is set to FALSE.

col.lines

A string specifying the single color to be used for the lines of the timeseries

max.y.all

A number representing the maximum level of Y-axis for all classes

levels.i.want

A vector having the levels for the factor under study

col.lev.i.want

A vector having the colors to be used for the factor under study

Details

Both 'y' and 'x' must be numeric variables, and the column representing the observational unit, must be a factor. This factor identifies the longitudinal context of the data, for instance, a student being measured on time. Besides, two more factors can be added to the plotting details, in order to represent the potential variability among them.

Value

This function returns a time series plot

Note

Please, uses the function with caution, and run first the examples to understand it better.

Author(s)

Christian Salas-Eljatib

References

Salas-Eljatib, C. 2021. Análisis de datos con el programa estadístico R: una introducción aplicada. Ediciones Universidad Mayor, Santiago, Chile. 170 p. https://eljatib.com/rlibro

Examples

data(ficdiamgr, package="datana")
df <- ficdiamgr
head(df)
str(df)
df$site<-as.factor(df$site)
df$species<-as.factor(df$species)
table(df$tree,df$species)
table(df$species,df$site)
#
timeserplot(df, y="dbh", x="time", obs.unit = "tree")
timeserplot(df, y="dbh", x="time", obs.unit = "tree", only.lines = TRUE)
#
## Otros ejemplos de uso de la funcion
timeserplot(df, y="dbh", x="time", obs.unit = "tree", col.lines = "blue",
only.lines = TRUE)
timeserplot(df, y="dbh", x="time", obs.unit = "tree", only.lines = FALSE)
#
timeserplot(df, y="dbh", x="time", obs.unit = "tree", factor1="site")
timeserplot(df, y="dbh", x="time", obs.unit = "tree", factor1="site",
factor2= "species")
timeserplot(df, y="dbh", x="time", obs.unit = "tree", factor1="site",
 factor2= "species", factor2.col = TRUE, only.lines = TRUE)

Diameter, height and volume for Black Cherry Trees

Description

This data set provides measurements of the diameter, height and volume of timber in 31 felled black cherry trees. The records are a slight modification to the original dataframe "trees" from the datasets R package.

Usage

data(treevol)

Format

A data frame with 31 observations and three variables

dbh

Diameter at breast height, in cm.

toth

Total height, in m.

vtot

Timber volume, in cubic meters.

Source

Ryan TA, Joiner BL, and Ryan BF. 1976. The Minitab Student Handbook. Duxbury Press.

Examples

pairs(treevol, panel = panel.smooth, main = "treevol dataframe")
plot(vtot ~ dbh, data = treevol, log = "xy")
coplot(log(vtot) ~ log(dbh) | toth, data = treevol,
       panel = panel.smooth)
summary(m1 <- lm(log(vtot) ~ log(dbh), data = treevol))
summary(m2 <- update(m1, ~ . + log(toth), data = treevol))
anova(m1,m2)

Volumen, altura, y diámetro para árboles de Black Cherry

Description

Estos datos provienen de mediciones de volumen, altura y diámetro en 31 árboles volteados de black cherry (Prunus serotina). Son una modificacion la dataframe 'trees' del paquete datasets de R.

Usage

data(treevol2)

Format

Datos con 31 observaciones y tres variables

dap

diámetro a la altura del pecho, en cm

atot

altural total, en m

vtot

volumen total, en m^{3}

Source

Ryan, T. A., Joiner, B. L. and Ryan, B. F. (1976) The Minitab Student Handbook. Duxbury Press.

Examples

pairs(treevol2, panel = panel.smooth, main = "treevol dataframe")
plot(vtot ~ dap, data = treevol2, log = "xy")
coplot(log(vtot) ~ log(dap) | atot, data = treevol2,
       panel = panel.smooth)
summary(m1 <- lm(log(vtot) ~ log(dap), data = treevol2))
summary(m2 <- update(m1, ~ . + log(atot), data = treevol2))
anova(m1,m2)

Tree volume of roble (Nothofagus obliqua) in the Rucamanque forest

Description

These are tree-level measurement data of sample trees in the Rucamanque experimental forest, near Temuco, in the Araucania region in south-central Chile, measured in 1999. The data are the same as in the dataframe "treevolruca", but only having observations for the species Nothofagus obliqua (roble).

Usage

data(treevolroble)

Format

Contains tree-level variables, as follows:

tree.no

Tree id

dbh

Diameter at breast height, in cm

toth

Total height, in m.

d6

Upper-stem diameter at 6 m, in cm

totv

Tree gross volume, in m^{3} with bark.

Source

The data are provided courtesy of Dr Christian Salas at the Universidad de Chile (Santiago, Chile).

References

Examples

data(treevolroble)    
head(treevolroble) 

Volumen a nivel de árbol para roble (Nothofagus obliqua) especie en el bosque de Rucamanque

Description

Volumen, altura y diámetro, entre otras para árboles muestra de Nothofagus obliqua (roble) en el bosque de Rucamanque, cerca de Temuco, en la región de la Araucania, en el sur de Chile.

Usage

data(treevolroble2)

Format

Las siguientes columnas son parte de la dataframe:

arbol

Número del árbol.

especie

Especie.

dap

Diámetro a la altura del pecho, en cm.

atot

Altura total, en m.

d6

Diámetro fustal a los 6 m, en cm.

vtot

Volumen bruto total, en m^{3} with bark.

Source

Los datos son proporcionados por el Prof. Christian Salas (Universidad de Chile).

References

Examples

data(treevolroble2)    
head(treevolroble2) 

convert the first n-characters of a string to upper-case letters.

Description

Function to upper-case the first n-characters of a string from the left-hand side.

Usage

upperleft(fac, n = 1)

Arguments

fac

is an object of class string or factor

n

is the number of characters to be converted of a the string given in fac.

Details

It is specially set to arrange data vector having alphanumeric (i.e., letters) format.

Value

This function returns an object having the first n-characters from the left-hand side in upper-case.

Author(s)

Christian Salas-Eljatib

References

Salas-Eljatib, C. 2021. Análisis de datos con el programa estadístico R: una introducción aplicada. Ediciones Universidad Mayor, Santiago, Chile. 170 p. https://eljatib.com/rlibro

Examples


fac.x<-"willkommen"
upperleft(fac.x)
upperleft(fac.x,n = 2)
upperleft(fac.x,2)
upperleft(fac.x,3)
#A longer vector of characters
fac.x<-c("willkommen","welcome","bem-vindo","bievenido")
upperleft(fac.x,1)

Function to compute prediction statistics based on observed values

Description

Computes three prediction statistics as a way to compare observed versus predicted values of a response variable of interest. The statistics are: the aggregated difference (AD), the root mean square differences (RMSD), and the aggregated of the absolute value differences (AAD). All of them area based on

r_i = y_i - \widehat{y}_i

where y_i and \widehat{y}_i are the observed and the predicted value of the response variable y for the i-th observation, respectively. Both the observed and predicted values must be expressed in the same units.

Usage

valesta(y.obs = y.obs, y.pred = y.pred)

Arguments

y.obs

observed values of the variable of interest

y.pred

predicted values of the variable of interest

Details

The function computes the three aforementioned statistics expressed in both (a) the units of the response variable and (b) the percentage. Notice that to represent each statistic in percentual terms, we divided them by the mean observed value of the response variable.

Value

The main output following six prediction statistics as a vector: (RMSD, RMSD.p, AD, AD.p, AAD, AAD.p); where RMSD.p stands for RMSD expressed as a percentage, and the same applies to AD.p and AAD.p.

Author(s)

Christian Salas-Eljatib.

References

Examples


#Creates a fake dataframe
set.seed(1234)
df <- as.data.frame(cbind(Y=rnorm(30, 30,9), X=rnorm(30, 450,133)))
#fitting a candidate model
mod1 <- lm(Y~X, data=df)
#Using the valesta function
valesta(y.obs=df$Y,y.pred=fitted(mod1))

Function for building a scatterplot with superposing boxplots

Description

The function creates a scatterplot with superposing boxplots for the Y-axis variable segregated by classes (i.e., groups) of the X-axis variable. For a scatterplot between a response variable Y and a predictor variable X, this function superposes boxplots of the response by groups of the predictor variable. The main aim of the above described graph is to get a sense of the distribution of the response variable depending upon the predictor variable.

Usage

xyboxplot(
  x = x,
  y = y,
  col.dots = "blue",
  transp.dots = 0.1,
  xlab = NULL,
  ylab = NULL,
  num.classes = 10,
  segre.type = "percentile",
  limi.classes = NA,
  x.category = FALSE,
  pch.dots = 19,
  col.box = "red",
  transp.boxp = 0.07,
  xlim = NA,
  ylim = NA,
  class.ticks.lwd = 1,
  class.ticks.col = "red",
  class.marks.col = "black",
  cex.dots = 0.7,
  class.marks = FALSE,
  class.ticks = TRUE
)

Arguments

x

A numeric vector representing the X-axis variable.

y

A numeric vector representing the Y-axis variable (response).

col.dots

A string specifying the dot colors. The default value is "blue".

transp.dots

A numeric value to be used as transparency for the dots of the figure to be produced. The defauls is set to 0.2

xlab

(optional) A string specifying X-axis label.

ylab

(optional) A string specifying Y-axis label.

num.classes

The number of classes to be used for computing the prediction capabilities. The default is set to 10.

segre.type

A string specifying the type of segregation to build the classes. The types are: (a) percentile implies to segregate with the same amount, or close, of observations to each of the defined num.classes. (b) user.defined implies that the user must provided the limits of the num.classes-1. The default is set to percentile. Notice if user.defined is specified, the option

limi.classes

A vector of size num.classes-1 containing the limits to be used for defining the classes.

x.category

A logical statement, if set to TRUE, the X-axis variable will be treated as categorical for the drawing of the boxplots. The default is set to FALSE.

pch.dots

A numeric factor altering the shape of the dots.

col.box

A string specifying the boxplot color. The default is "red"

transp.boxp

A numeric value to be used as transparency for the boxpot of the figure to be produced. The defauls is set to 0.1

xlim

(optional) A numeric vector having the minimum and maximum, respectively for the X-axis variable.

ylim

(optional) A numeric vector having the minimum and maximum, respectively for the Y-axis variable.

class.ticks.lwd

The numeric width of the tick line for each of the X-axis variable classes. By default is set to 1.

class.ticks.col

A string with the color of the tick line for each of the X-axis variable classes. By default is set to "red".

class.marks.col

A string with the color of the mark value for each of the X-axis variable classes. By default is set to "black".

cex.dots

A numeric factor altering the size of the dots. The default value is 0.7.

class.marks

Whether (logic: TRUE or FALSE) the number value of each of the X-axis variable classes should be printed. By default is set to FALSE.

class.ticks

Whether (logic: TRUE or FALSE) the number tick of each of the X-axis variable classes should be printed. By default is set to TRUE.

Details

Notice that the superposing boxplots for the Y-axis variable are computed by grouping the X-axis variable in 10 classes. Those classes are set by computing the 0.1, 0.2, ..., 0.9-percentiles of the X-axis variable, therefore each group has the same number of observations. The wide of the boxplot represent the extend of the respective X-axis variable used for drawwing each boxplot.

Value

The function returns the above described graph.

Author(s)

Christian Salas-Eljatib

References

Examples

df <- datana::fishgrowth
xyboxplot(x=df$length,y=df$scale)
xyboxplot(x=df$length,y=df$scale,col.dots = "red",
xlab="Variable X")
xyboxplot(x=df$length,y=df$scale,xlab="Variable X")

## dots with alpha channel
xyboxplot(x=df$length,y=df$scale,xlab="Variable X",
transp.dots = 0.4)

## with categorical x
xyboxplot(x=df$age,y=df$length,x.category = TRUE)

## fixed x axis limits
xyboxplot(x=df$age,y=df$length,x.category = TRUE, xlim = c(0,10))

## x marks width to .5
xyboxplot(x=df$age,y=df$length,x.category = TRUE, xlim = c(0,10),
          class.ticks.lwd = .5)

## x marks red and width 2
xyboxplot(x=df$age,y=df$length,x.category = TRUE, xlim = c(0,10),
          class.ticks.lwd = 2, class.ticks.col = "red")

## larger dots
xyboxplot(x=df$age,y=df$length,x.category = TRUE, xlim = c(0,10),
          cex.dots = 1.5)

## print classes ticks
xyboxplot(x=df$age,y=df$length,x.category = TRUE, xlim = c(0,10),
          class.marks = FALSE, class.ticks.col = "green")

### the x-variable not recorded such as a categorical variable
df <- datana::fishgrowth
## print classes ticks, by default with red color
xyboxplot(x=df$length, y=df$scale)

## don't print ticks
xyboxplot(x=df$length, y=df$scale, class.ticks=FALSE)

## print classes marks values
xyboxplot(x=df$length, y=df$scale, class.marks=TRUE)

## print classes marks values without ticks
xyboxplot(x=df$length, y=df$scale, class.marks=TRUE, class.ticks=FALSE)

## change class marks and ticks colors
xyboxplot(x=df$length, y=df$scale, class.marks=TRUE,
          class.marks.col = "red",
          class.ticks.col = "blue")

## bigger ticks
xyboxplot(x=df$length, y=df$scale, class.marks=TRUE,
          class.marks.col = "red",
          class.ticks.col = "blue", class.ticks.lwd=3)

## Changing the number of the X-variable classes
xyboxplot(x=df$length,y=df$scale,num.classes=5)

## Defining the classes not by percentiles, but by fixed values
xyboxplot(x=df$length,y=df$scale,xlim=c(0,410),
ylim=c(0,20),num.classes=4,
segre.type="fixed",limi.classes=c(140,195,250))

## Note that the limits must be in agreement with the num.classes
xyboxplot(x=df$length,y=df$scale,xlim=c(0,410),ylim=c(0,20),
num.classes=5,segre.type="fixed",limi.classes=c(100,160,200,250))


A scatterplot with marginal histograms

Description

The function produces a scatterplot between the 'y'-axis variable and the 'x'-axis variable, but also adding the marginal histograms for both variables.

Usage

xyhist(
  x = x,
  y = y,
  col.x = "blue",
  col.y = "red",
  xlab = NULL,
  ylab = NULL,
  x.lim = NULL,
  y.lim = NULL
)

Arguments

x

A numeric vector representing the X-axis variable

y

A numeric vector representing the Y-axis variable

col.x

(optional) A string specifying the color of the histogram of the X-variable. Default is "blue".

col.y

(optional) A string specifying the color of the histogram of the Y-variable. Default is "red".

xlab

(optional) A string specifying X-axis label. Default is "xvar".

ylab

(optional) A string specifying Y-axis label. Default is "yvar".

x.lim

(optional) A vector of two elements with the limits of the Y-axis. Default is the range of the X-variable.

y.lim

(optional) A vector of two elements with the limits of the Y-axis. Default is the range of the Y-variable.

Details

Both the response variable (Y-axis) and the predictor variable (X-axis) must be numeric.

Value

The function returns the above described graph.

Author(s)

Christian Salas-Eljatib

References

Examples

data(treevolroble)
df <- datana::treevolroble
head(df)
xyhist(x=df$dbh,y=df$toth)
xyhist(x=df$dbh,y=df$toth, xlab="Variable X",  ylab="Variable Y")
xyhist(x=df$dbh,y=df$toth, xlab="Variable X", ylab="Variable Y",
  col.x = "gray",col.y="white")

Figure of a matrix of scatterplots and histograms for several variables.

Description

The function produces a panel of multiple scatterplots and histograms, showing the correlation coefficient among all pairs of variables. Notice that the data must contain only numeric variables.

Usage

xymultiplot(
  x,
  smooth = TRUE,
  scale = FALSE,
  density = TRUE,
  digits = 2,
  method = "pearson",
  pch = 20,
  lm = FALSE,
  cor = TRUE,
  jiggle = FALSE,
  factor = 2,
  col.hist = "cyan",
  col.densi.curve = "black",
  show.points = TRUE,
  col.points = "gray",
  smoother = FALSE,
  col.smooth = "red",
  ellipses = FALSE,
  col.ellip = "blue",
  col.cent.point = "green",
  rug = TRUE,
  breaks = "Sturges",
  cex.cor = 1,
  ci = FALSE,
  alpha = 0.05,
  ...
)

Arguments

x

is a dataframe containing all the numeric variables to be used for drawing the panel plot

smooth

a logical value for drawing smooth curves. The default is set to TRUE.

scale

scales the correlation font by the size of the absolute correlation. The default is set to FALSE.

density

a logical value for drawing a density curve. The default is set to TRUE.

digits

an optional numeric value for the digits to be used for drawing the correlation coefficient in the panel. Defaults is set to 2.

method

a string giving the method to be used for computing the correlation coefficient. Default is set to "pearson".

pch

The plot character (The default is 20, which looks like '.').

lm

Plot the linear fit rather than the LOESS smoothed fits. The default is FALSE.

cor

If plotting regressions, should correlations be reported? The default is TRUE.

jiggle

Should the points be jittered before plotting? The default is FALSE.

factor

factor for jittering (1-5), therefore only needed if "jiggle" is set to TRUE.

col.hist

a string giving the color to be used for the histograms of the panel. Default is set to "cyan".

col.densi.curve

a string with the name of the color to be used for the density curve. The default is set to "black".

show.points

a logical value for drawing the points in the scatter-plots. Defauls is set to TRUE.

col.points

a string giving the color to be used for the data points. Default is set to "gray".

smoother

If TRUE, then smooth.scatter the data points-slow but pretty with lots of subjects

col.smooth

a string giving the color to be used for the smoothed curve of the scatterplot. Default is set to "red".

ellipses

an optional logical value for drawing an ellipse for the scatter-plots. The default is set to FALSE.

col.ellip

a string giving the color to be used for the ellipse of the scatterplot. The default is set to "blue".

col.cent.point

a string giving the color to be used for the centroid point of the ellipse of the scatterplot. The default is set to "blue".

rug

a logical value for drawing the rugs in the histograms. Defauls is set to TRUE.

breaks

a string giving the method to be used for obtaining the breaks of the histogram. Defauls is set to "Sturges".

cex.cor

If this is specified, this will change the size of the text in the correlations. this allows one to also change the size of the points in the plot by specifying the normal cex values. If just specifying cex, it will change the character size, if cex.cor is specified, then cex will function to change the point size.

ci

Draw confidence intervals for the linear model or for the loess fit, defaults to ci=FALSE. If confidence intervals are not drawn, the fitting function is lowess.

alpha

an optional numeric value for the significance level. Defauls is set to 0.05.

...

other graphical parameters (see par and section ‘Details’ below).

Details

Generates a multipanel (matrix) of scatterplots and histograms to explore potential relationships among variables.

Value

This function returns a multipanel of scatterplots and histograms

Author(s)

A modification of Christian Salas-Eljatib of the function pairs.panels of the package psych.

References

Examples

##First example
data(bears2)
head(bears2)
df <- bears2[,c('peso','edad','cabezaL','cabezaA','largo','pechoP')]
descstat(df)
xymultiplot(df)
xymultiplot(df,ellipse=TRUE)
xymultiplot(df,ellipses=TRUE,col.cent.point = "yellow",
 col.densi.curve = "dark green",col.hist = "white")

mirror server hosted at Truenetwork, Russian Federation.