Ecometrics quantify the relationships between functional traits and
environmental conditions at the community level. The
commecometrics
package provides tools for conducting
ecometric analyses using trait data, species distributions, and
environmental variables. It supports workflows for both continuous and
categorical environmental variables, and includes utilities for fossil
data integration and sensitivity analyses.
This vignette introduces the main functions and workflows available in the package.
We use example data from an ecometric analysis of carnivoran carnassial tooth relative blade length (RBL) from Siciliano-Martina et al (2024). The following datasets are used to assess ecometric relationships:
samplingPoints
: Contains environmental data
(precipitation and vegetation categories) as well as geographic
coordinates for each sampling point.traits
: Includes trait (RBL) measurements for each
species, identified by taxon name.geography
: Provides species range maps (from the IUCN),
used to assemble carnivoran communities at each sampling point.Note: To run this vignette locally with the full dataset, download and unzip the external data using the code below. The data is hosted on Figshare. This step is not run automatically to comply with CRAN policies.
options(timeout = 600)
download.file("https://ndownloader.figshare.com/files/56228033", destfile = "data.zip", mode = "wb")
unzip("data.zip")
We begin by summarizing trait values (RBL) for each sampling
location. This identifies where the species ranges overlap to assemble
the communities and, by default, calculates the mean trait value
(summ_trait_1
) and the standard deviation
(summ_trait_2
) for each sampling point, along with the
species richness (richness
) per community. The mean and
standard deviation are commonly used metrics in ecometric analyses.
However, users can supply any custom summary metric functions if
alternative descriptors are desired.
This section demonstrates how to build ecometric models using both quantitative and qualitative environmental variables. Each type of model links community-level trait distributions to environmental conditions.
Now we model annual precipitation as a function of trait mean and
standard deviation. We use the sampling points and apply a
transformation (log(x + 1)
) to the environmental variable
(precipitation) to maintain normality, although this step is optional.
The inv_transform_fun
ensures that predictions can be
back-transformed to the original units.
To calculate community trait mean and standard deviation, we filter the data to include only communities with at least three species. After filtering, we retained 90.1% of the original sampling points for a total of 48,721 communities out of the original 54,090.
This function also calculates the optimal number of bins for each trait metric (mean and standard deviation) using Scott’s rule. In this dataset, the mean (first trait summary metric) is divided into 94 bins, and the standard deviation (second trait summary metric) is divided into 90 bins.
To evaluate the model, we can examine the relationship between community-level trait distributions (mean and standard deviation) and the environmental variable (precipitation) using the linear model fit. Here, we find a significant relationship (t = 311.1, p < 2e-16), indicating that trait distributions explain substantial variation in precipitation.
We can also assess the correlation between the predicted and observed precipitation using the Pearson’s Correlation Coefficient (R). In this case, the resulting R value of 0.815, corresponds to an R-squared of 0.665, meaning the the model explains roughly 66.5% of the variance in precipitation.
Here, we visualize the ecometric space generated from the
ecoModel
. Communities are binned according to their trait
values, mean on the x-axis and standard deviation on the y-axis. Each
pixel represents at least one community with that combination of trait
mean and standard deviation; in many cases, multiple communities fall
within a single bin.
The bins are color-coded by the environmental variable (precipitation), with colors representing the estimation of the maximum likelihood estimate of precipitation for the communities in each trait bin.
Here, we see communities with higher trait variability (i.e., higher standard deviation on the y-axis) are often associated with greater precipitation. Given that RBL is an indicator of carnivoran diet, where higher values often reflect increased vertebrate prey consumption, this pattern suggests that communities in wetter environments exhibit greater dietary diversity.
We can examine how many communities are assigned to each trait bin within found within the ecometric space. In the example below, we display a subset of the bin count matrix, focusing on bins near the center of the trait distribution.
To evaluate how varying sample sizes affect the sensitivity and
transferability of the model, we can conduct a sensitivity analysis.
This analysis repeatedly subsamples the data at different community
sample sizes, ranging from 100 to 1000 in increments of 100
(sample_sizes = seq(100, 1000, 300)
), and evaluates model
performance at each level. The results are visualized in four plots and
summarized in accompanying tables. Each plot shows performance metrics
across a range of community sample sizes (x-axis), based on repeated
subsampling of the data.
The four panels display the following metrics: A) Training correlation (how well the model fits the training data), B) Testing correlation (the model’s generalizability to new data), C) Training mean anomaly (the average prediction error within the training set), D) Testing mean anomaly (the average prediction error in the test set).
In Panel A, we observe that training correlation remains relatively consistent across sample sizes, with ful stabilization occurring around 800 communities. In panel B, the testing correlation becomes stable with roughly 700 communities. Panel C shows that the training prediction error decreases and stabilizates around 700 communities, while Panel D reveals that testing error declines and plateaus around 400 communities.
These plots help identify the minimum sample size needed for robust generalizable model performance. More detailed results can be examinined directly in the summary tables returned by the sensitivity analysis function.
Alternatively, we can model the ecometric space using a categorical environmental variables. In this example, we use vegetation type, classifying communities into five categories: arctic, deciduous, desert, evergreen, grassland. Due to the slight difference in the available environmental data, filtering for communities with at least three species now retains 47,671 out of the original 52,306 communities. Using Scott’s Rule, the model identifies 82 bins for the trait mean and 85 bins for the trait standard deviation as the optimal binning scheme.
table(samplingPoints$VegSimple)
samplingPoints$VegSimple <- factor(samplingPoints$VegSimple,
levels = 1:5,
labels = c("Arctic", "Deciduous", "Desert", "Evergreen", "Grassland"))
ecoModelQual <- ecometric_model_qual(
points_df = traitsByPoint$points,
category_col = "VegSimple",
min_species = 3
)
We can similarily visualize this ecometric space
This vignette demonstrates the key steps in an ecometric workflow
using commecometrics
. Users can extend these examples to
their own data, customizing the trait variable and environment as
needed.
For detailed help on each function, see the package documentation and individual function examples.