introduction_to_commecometrics

2. Data Loading

We use example data from an ecometric analysis of carnivoran carnassial tooth relative blade length (RBL) from Siciliano-Martina et al (2024). The following datasets are used to assess ecometric relationships:

samplingPoints: Contains environmental data (precipitation and vegetation categories) as well as geographic coordinates for each sampling point.
traits: Includes trait (RBL) measurements for each species, identified by taxon name.
geography: Provides species range maps (from the IUCN), used to assemble carnivoran communities at each sampling point.

Note: To run this vignette locally with the full dataset, download and unzip the external data using the code below. The data is hosted on Figshare. This step is not run automatically to comply with CRAN policies.

options(timeout = 600)
download.file("https://ndownloader.figshare.com/files/56228033", destfile = "data.zip", mode = "wb")
unzip("data.zip")

samplingPoints <- read.csv("data/sampling_points.csv")
head(samplingPoints)
traits <- read.csv("Data/traits.csv")
head(traits)

geography <- sf::st_read("data/data_0.shp", quiet = TRUE)
geography$SCI_NAME <- gsub(" ", "_", geography$SCI_NAME)

3. Summarize Traits by Sampling Point

We begin by summarizing trait values (RBL) for each sampling location. This identifies where the species ranges overlap to assemble the communities and, by default, calculates the mean trait value (summ_trait_1) and the standard deviation (summ_trait_2) for each sampling point, along with the species richness (richness) per community. The mean and standard deviation are commonly used metrics in ecometric analyses. However, users can supply any custom summary metric functions if alternative descriptors are desired.

traitsByPoint <- summarize_traits_by_point(
  points_df = samplingPoints,
  trait_df = traits,
  species_polygons = geography,
  trait_column = "RBL",
  species_name_col = "SCI_NAME",
  continent = FALSE,
  parallel = FALSE
)

head(traitsByPoint[[1]])

4. Ecometric Modeling

This section demonstrates how to build ecometric models using both quantitative and qualitative environmental variables. Each type of model links community-level trait distributions to environmental conditions.

4.1 Quantitative Environmental Variables (Precipitation)

Now we model annual precipitation as a function of trait mean and standard deviation. We use the sampling points and apply a transformation (log(x + 1)) to the environmental variable (precipitation) to maintain normality, although this step is optional. The inv_transform_fun ensures that predictions can be back-transformed to the original units.

To calculate community trait mean and standard deviation, we filter the data to include only communities with at least three species. After filtering, we retained 90.1% of the original sampling points for a total of 48,721 communities out of the original 54,090.

This function also calculates the optimal number of bins for each trait metric (mean and standard deviation) using Scott’s rule. In this dataset, the mean (first trait summary metric) is divided into 94 bins, and the standard deviation (second trait summary metric) is divided into 90 bins.

ecoModel <- ecometric_model(
  points_df = traitsByPoint$points,
  env_var = "precip",
  transform_fun = function(x) log(x + 1),
  inv_transform_fun = function(x) exp(x) - 1,
  min_species = 3
)

4.2 Assessing the model

To evaluate the model, we can examine the relationship between community-level trait distributions (mean and standard deviation) and the environmental variable (precipitation) using the linear model fit. Here, we find a significant relationship (t = 311.1, p < 2e-16), indicating that trait distributions explain substantial variation in precipitation.

We can also assess the correlation between the predicted and observed precipitation using the Pearson’s Correlation Coefficient (R). In this case, the resulting R value of 0.815, corresponds to an R-squared of 0.665, meaning the the model explains roughly 66.5% of the variance in precipitation.

summary(ecoModel$model)

print(ecoModel$correlation)

4.3 Visualize Ecometric Space

Here, we visualize the ecometric space generated from the ecoModel. Communities are binned according to their trait values, mean on the x-axis and standard deviation on the y-axis. Each pixel represents at least one community with that combination of trait mean and standard deviation; in many cases, multiple communities fall within a single bin.

The bins are color-coded by the environmental variable (precipitation), with colors representing the estimation of the maximum likelihood estimate of precipitation for the communities in each trait bin.

Here, we see communities with higher trait variability (i.e., higher standard deviation on the y-axis) are often associated with greater precipitation. Given that RBL is an indicator of carnivoran diet, where higher values often reflect increased vertebrate prey consumption, this pattern suggests that communities in wetter environments exhibit greater dietary diversity.

ecoPlot <- ecometric_space(
  model_out = ecoModel,
  env_name = "Precipitation (log)",
  x_label = "Community mean",
  y_label = "Community standard deviation"
) 

print(ecoPlot)

4.4. Viewing communities per bin

We can examine how many communities are assigned to each trait bin within found within the ecometric space. In the example below, we display a subset of the bin count matrix, focusing on bins near the center of the trait distribution.

ecoModel$diagnostics$bin_counts[35:45, 26:37]

head(ecoModel$eco_space)

4.5 Sensitivity Analysis

To evaluate how varying sample sizes affect the sensitivity and transferability of the model, we can conduct a sensitivity analysis. This analysis repeatedly subsamples the data at different community sample sizes, ranging from 100 to 1000 in increments of 100 (sample_sizes = seq(100, 1000, 300)), and evaluates model performance at each level. The results are visualized in four plots and summarized in accompanying tables. Each plot shows performance metrics across a range of community sample sizes (x-axis), based on repeated subsampling of the data.

The four panels display the following metrics: A) Training correlation (how well the model fits the training data), B) Testing correlation (the model’s generalizability to new data), C) Training mean anomaly (the average prediction error within the training set), D) Testing mean anomaly (the average prediction error in the test set).

In Panel A, we observe that training correlation remains relatively consistent across sample sizes, with ful stabilization occurring around 800 communities. In panel B, the testing correlation becomes stable with roughly 700 communities. Panel C shows that the training prediction error decreases and stabilizates around 700 communities, while Panel D reveals that testing error declines and plateaus around 400 communities.

These plots help identify the minimum sample size needed for robust generalizable model performance. More detailed results can be examinined directly in the summary tables returned by the sensitivity analysis function.

sensitivityResults <- sensitivity_analysis(
  points_df = traitsByPoint$points,
  env_var = "precip",
  sample_sizes = seq(100, 1000, 100),
  iterations = 5,
  transform_fun = function(x) log(x + 1),
  parallel = FALSE
)

head(sensitivityResults$combined_results)

print(sensitivityResults$summary_results)

5. Qualitative Environmental Variables (Vegetation)

Alternatively, we can model the ecometric space using a categorical environmental variables. In this example, we use vegetation type, classifying communities into five categories: arctic, deciduous, desert, evergreen, grassland. Due to the slight difference in the available environmental data, filtering for communities with at least three species now retains 47,671 out of the original 52,306 communities. Using Scott’s Rule, the model identifies 82 bins for the trait mean and 85 bins for the trait standard deviation as the optimal binning scheme.

table(samplingPoints$VegSimple)

samplingPoints$VegSimple <- factor(samplingPoints$VegSimple,
                   levels = 1:5,
                   labels = c("Arctic", "Deciduous", "Desert", "Evergreen", "Grassland"))


ecoModelQual <- ecometric_model_qual(
  points_df = traitsByPoint$points,
  category_col = "VegSimple",
  min_species = 3
)

5.1 Visualize Ecometric Space

We can similarily visualize this ecometric space

ecoPlotQual <- ecometric_space_qual(
  model_out = ecoModelQual,
  x_label = "Community mean",
  y_label = "Community standard deviation"
) 

print(ecoPlotQual$ecometric_space_plot)  # Qualitative model
print(ecoPlotQual$probability_maps$Grassland)  # One of the category-specific maps

5.2. Sensitivity Analysis

sensitivityQual <- sensitivity_analysis_qual(
  points_df = traitsByPoint$points,
  category_col = "VegSimple",
  sample_sizes = seq(100, 1000, 100),
  iterations = 10,
  parallel = FALSE
)

print(sensitivityQual$summary_results)