Fitting many curves using rTPC

Daniel Padfield

2023-08-16

A brief example of how to fit many models to multiple TPCs using rTPC, nls.multstart and the tidyverse.


# load packages
library(rTPC)
library(nls.multstart)
library(broom)
library(tidyverse)

In the final part of the general pipeline, we demonstrate how multiple models can be fitted to multiple TPCs. Instead of picking all 24 model formulations to demonstrate this approach, we will use only 2 models in this example: gaussian_1987() and sharpeschoolhigh_1981().

We can demonstrate the fitting of multiple curves by modelling all 60 TPCs from the example dataset of rTPC curve from the example dataset rTPC. These TPCs are of respiration and photosynthesis of the aquatic algae, Chlorella vulgaris. These algae differed in their growth temperature, growth_temp, and how long they had been grown at that temperature, process, either for ~100 or ~10 generations.

Using a similar approach to vignette('fit_many_models'), models can be fitted to each curve using list columns and purrr::map() to fit and store multiple models in a data frame.

When fitting lots of models at once, it is useful to know the progress the code as it may take a long time to run.The R package purrr does now has progress bars built in; you can add a progress bar to the map() function using the argument .progress. However, when fitting multiple models with multiple map() calls a new progress bar each time, so currently I have not found a way to have a single progress bar for multiple map() calls, so we will keep with what we have at the moment.

# load in data
data("chlorella_tpc")
d <- chlorella_tpc

# when scaling up our code to fit hundreds of models, its nice to have a progress bar
# edit nls_multstart to allow for a progress bar
nls_multstart_progress <- function(formula, data = parent.frame(), iter, start_lower, 
                                   start_upper, supp_errors = c("Y", "N"), convergence_count = 100, 
                                   control, modelweights, ...){
  if(!is.null(pb)){
    pb$tick()
  }
  nls_multstart(formula = formula, data = data, iter = iter, start_lower = start_lower, 
                start_upper = start_upper, supp_errors = supp_errors, convergence_count = convergence_count, 
                control = control, modelweights = modelweights, ...)
}

# start progress bar and estimate time it will take
number_of_models <- 2
number_of_curves <- length(unique(d$curve_id))

# setup progress bar
pb <- progress::progress_bar$new(total = number_of_curves*number_of_models,
                                 clear = FALSE,
                                 format ="[:bar] :percent :elapsedfull")

# fit two chosen model formulation in rTPC
d_fits <- nest(d, data = c(temp, rate)) %>%
  mutate(gaussian = map(data, ~nls_multstart_progress(rate~gaussian_1987(temp = temp, rmax,topt,a),
                        data = .x,
                        iter = c(3,3,3),
                        start_lower = get_start_vals(.x$temp, .x$rate, model_name = 'gaussian_1987') - 10,
                        start_upper = get_start_vals(.x$temp, .x$rate, model_name = 'gaussian_1987') + 10,
                        lower = get_lower_lims(.x$temp, .x$rate, model_name = 'gaussian_1987'),
                        upper = get_upper_lims(.x$temp, .x$rate, model_name = 'gaussian_1987'),
                        supp_errors = 'Y',
                        convergence_count = FALSE)),
         sharpeschoolhigh = map(data, ~nls_multstart_progress(rate~sharpeschoolhigh_1981(temp = temp, r_tref,e,eh,th, tref = 15),
                        data = .x,
                        iter = c(3,3,3,3),
                        start_lower = get_start_vals(.x$temp, .x$rate, model_name = 'sharpeschoolhigh_1981') - 10,
                        start_upper = get_start_vals(.x$temp, .x$rate, model_name = 'sharpeschoolhigh_1981') + 10,
                        lower = get_lower_lims(.x$temp, .x$rate, model_name = 'sharpeschoolhigh_1981'),
                        upper = get_upper_lims(.x$temp, .x$rate, model_name = 'sharpeschoolhigh_1981'),
                        supp_errors = 'Y',
                        convergence_count = FALSE)))
[=========================================================] 100% 00:00:40

Like previous vignettes, the predictions of each model can be estimated using broom::augment(). To do this, we first create a new list column containing high resolution temperature values by taking the min and max of each curve. Next we stack the models and finally we get the new predictions using the map2(), which allows us to use both fit and new_data list columns. After unnesting the preds column, we are then left with high resolution predictions for each curve. As this code covers a lot of steps, each line of the code is commented.

# create new list column of for high resolution data
d_preds <- mutate(d_fits, new_data = map(data, ~tibble(temp = seq(min(.x$temp), max(.x$temp), length.out = 100)))) %>%
  # get rid of original data column
  select(., -data) %>%
  # stack models into a single column, with an id column for model_name
  pivot_longer(., names_to = 'model_name', values_to = 'fit', c(gaussian,sharpeschoolhigh)) %>%
  # create new list column containing the predictions
  # this uses both fit and new_data list columns
  mutate(preds = map2(fit, new_data, ~augment(.x, newdata = .y))) %>%
  # select only the columns we want to keep
  select(curve_id, growth_temp, process, flux, model_name, preds) %>%
  # unlist the preds list column
  unnest(preds)

glimpse(d_preds)
#> Rows: 12,000
#> Columns: 7
#> $ curve_id    <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
#> $ growth_temp <dbl> 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20…
#> $ process     <chr> "acclimation", "acclimation", "acclimation", "acclimation"…
#> $ flux        <chr> "respiration", "respiration", "respiration", "respiration"…
#> $ model_name  <chr> "gaussian", "gaussian", "gaussian", "gaussian", "gaussian"…
#> $ temp        <dbl> 16.00000, 16.33333, 16.66667, 17.00000, 17.33333, 17.66667…
#> $ .fitted     <dbl> 0.02790076, 0.03175769, 0.03607053, 0.04088150, 0.04623512…

We can then plot the predictions of each curve using ggplot2.

# plot
ggplot(d_preds) +
  geom_line(aes(temp, .fitted, col = model_name)) +
  geom_point(aes(temp, rate), d) +
  facet_wrap(~curve_id, scales = 'free_y', ncol = 6) +
  theme_bw() +
  theme(legend.position = 'none') +
  scale_color_brewer(type = 'qual', palette = 2) +
  labs(x = 'Temperature (ºC)',
       y = 'Metabolic rate',
       title = 'All fitted thermal performance curves',
       subtitle = 'gaussian in green; sharpeschoolhigh in orange')

The traits of each thermal performance curve can also easily be calculated.

# stack models and calculate extra params
d_params <- pivot_longer(d_fits, names_to = 'model_name', values_to = 'fit', c(gaussian,sharpeschoolhigh)) %>%
  mutate(params = map(fit, calc_params)) %>%
  select(curve_id, growth_temp, process, flux, model_name, params) %>%
  unnest(params)

glimpse(d_params)
#> Rows: 120
#> Columns: 16
#> $ curve_id              <dbl> 1, 1, 2, 2, 3, 3, 4, 4, 5, 5, 6, 6, 7, 7, 8, 8, …
#> $ growth_temp           <dbl> 20, 20, 20, 20, 23, 23, 27, 27, 27, 27, 30, 30, …
#> $ process               <chr> "acclimation", "acclimation", "acclimation", "ac…
#> $ flux                  <chr> "respiration", "respiration", "respiration", "re…
#> $ model_name            <chr> "gaussian", "sharpeschoolhigh", "gaussian", "sha…
#> $ rmax                  <dbl> 1.4972473, 1.8127063, 1.7935416, 1.9798352, 0.94…
#> $ topt                  <dbl> 36.338, 41.646, 35.554, 39.007, 34.615, 35.170, …
#> $ ctmin                 <dbl> 18.695, 2.539, 18.835, 6.086, 17.368, 11.970, 15…
#> $ ctmax                 <dbl> 53.981, 45.558, 52.272, 47.680, 51.861, 53.064, …
#> $ e                     <dbl> 0.6758954, 0.5802496, 0.8427748, 0.6621701, 1.02…
#> $ eh                    <dbl> 1.0767825, 11.4840311, 1.0954099, 2.4774826, 1.1…
#> $ q10                   <dbl> 2.373766, 2.063147, 2.927011, 2.305129, 3.722119…
#> $ thermal_safety_margin <dbl> 17.643, 3.912, 16.718, 8.673, 17.246, 17.894, 18…
#> $ thermal_tolerance     <dbl> 35.286, 43.019, 33.437, 41.594, 34.493, 41.094, …
#> $ breadth               <dbl> 9.651, 5.371, 9.165, 6.495, 9.442, 7.714, 10.354…
#> $ skewness              <dbl> -4.008871e-01, -1.090378e+01, -2.526351e-01, -1.…

mirror server hosted at Truenetwork, Russian Federation.