Welcome to the Climate Futures Toolbox’s Firehose function

This is an abridged version of the full firehose vignette. The vignette can be viewed in its entirety at https://github.com/earthlab/cft-CRAN/blob/master/full_vignettes/firehose.md

This vignette provides a walk-through of a common use case of the Firehose function.

The purpose of the Firehose functions is to download the data as quickly as possible by distributing tasks across multiple processors. The more cores you use, the faster this process will go.

Note that the Firehose function is best used for downloading MACA climate model data for multiple climate variables from multiple climate models and emission scenarios in their entirety for a single lat/long location. If you want to download data about multiple variables from multiple models and emission scenarios over a larger geographic region and a shorter time period, you should use the available_data function. A vignette about how to use the available_data function is available at https://github.com/earthlab/cft-CRAN/blob/master/full_vignettes/available-data.md.

Load the cft package and other libraries required for vignette. If you need to install cft, install it from CRAN.

library(cft)
library(future)
library(furrr)
library(sf)
library(tidync)

We will start by setting up our computer to run code on multiple cores instead of just one. The availableCores() function first checks your local computer to see how many cores are available and then subtracts one so that you still have an available core for running your operating system. The plan() function then starts a back-end structure where tasks can be assigned. These backend systems can sometimes have difficulty shutting down after the process is done, especially if you force quite an operation in the works. If you find your code stalling without good explanation, it’s good to restart your computer to clear any of these structures that may be stuck in memory.

n_cores <- availableCores() - 1
plan(multicore, workers = n_cores)

We pull all of our data from the internet. Since internet connections can be a little variable, we try to make a strong link between our computer and the data server by creating an src object. Run this code to establish the connection and then use the src object you created to call on that connection for information. Because this src object is a connection, it will need to be reconnected each time you want to use it. You cannot make it once and then use if forever.

web_link = "https://cida.usgs.gov/thredds/dodsC/macav2metdata_daily_future"

# Change to "https://cida.usgs.gov/thredds/catalog.html?dataset=cida.usgs.gov/macav2metdata_daily_historical" for historical data. 

src <- tidync::tidync(web_link)
## not a file: 
## ' https://cida.usgs.gov/thredds/dodsC/macav2metdata_daily_future '
## 
## ... attempting remote connection
## Connection succeeded.

After a connection is made to the server, we can run the available_data() function to check that server and see what data it has available to us. The available_data() function produces three outputs:

  1. a raw list of available data
  2. a table of date times available
  3. a table summarizing available variables and the attributes of those variables

Here we print that list of variables. This may take up to a minutes as you retrieve the information from the server.

# This is your menu
inputs <- cft::available_data()
## Trying to connect to the USGS.gov API
## not a file: 
## ' https://cida.usgs.gov/thredds/dodsC/macav2metdata_daily_future '
## 
## ... attempting remote connection
## Connection succeeded.
## Reading results
## Converting into an R data.table
inputs[[1]]
## # A tibble: 350 × 9
##    Available varia…¹ Varia…² Units Model Model…³ Scena…⁴ Varia…⁵ Model…⁶ Scena…⁷
##    <chr>             <chr>   <chr> <chr> <chr>   <chr>   <chr>   <chr>   <chr>  
##  1 huss_BNU-ESM_r1i… Specif… kg k… Beij… r1i1p1  RCP 4.5 huss    BNU-ESM rcp45  
##  2 huss_BNU-ESM_r1i… Specif… kg k… Beij… r1i1p1  RCP 8.5 huss    BNU-ESM rcp85  
##  3 huss_CCSM4_r6i1p… Specif… kg k… Comm… r6i1p1  RCP 4.5 huss    CCSM4   rcp45  
##  4 huss_CCSM4_r6i1p… Specif… kg k… Comm… r6i1p1  RCP 8.5 huss    CCSM4   rcp85  
##  5 huss_CNRM-CM5_r1… Specif… kg k… Cent… r1i1p1  RCP 4.5 huss    CNRM-C… rcp45  
##  6 huss_CNRM-CM5_r1… Specif… kg k… Cent… r1i1p1  RCP 8.5 huss    CNRM-C… rcp85  
##  7 huss_CSIRO-Mk3-6… Specif… kg k… Comm… r1i1p1  RCP 4.5 huss    CSIRO-… rcp45  
##  8 huss_CSIRO-Mk3-6… Specif… kg k… Comm… r1i1p1  RCP 8.5 huss    CSIRO-… rcp85  
##  9 huss_CanESM2_r1i… Specif… kg k… Cana… r1i1p1  RCP 4.5 huss    CanESM2 rcp45  
## 10 huss_CanESM2_r1i… Specif… kg k… Cana… r1i1p1  RCP 8.5 huss    CanESM2 rcp85  
## # … with 340 more rows, and abbreviated variable names ¹​`Available variable`,
## #   ²​Variable, ³​`Model ensemble type (only CCSM4 relevant)`, ⁴​Scenario,
## #   ⁵​`Variable abbreviation`, ⁶​`Model abbreviation`, ⁷​`Scenario abbreviation`

From the table that was returned, we want to decide which variables we would like to request. If you are using the Firehose function, you likely have a long list of variables you’d like to download in their entirety. Use the filter() function to select the variables you’d like to download. Store those choices in an object called input_variables to pass on to the Firehose function.

input_variables <- inputs$variable_names %>% 
  filter(Variable %in% c("Minimum Relative Humidity",          
                       "Minimum Temperature",                 
                       "Precipitation")) %>% 
  filter(Scenario %in% c( "RCP 8.5", "RCP 4.5")) %>% 
  filter(Model %in% c(
    "Beijing Climate Center - Climate System Model 1.1",
    "Beijing Normal University - Earth System Model")) %>%
  
  pull("Available variable")

You can check the object you created to see that it is a list of formatted variable names ready to send to the server.

# This is the list of things you would like to order off the menu
input_variables
##  [1] "pr_BNU-ESM_r1i1p1_rcp45"        "pr_BNU-ESM_r1i1p1_rcp85"       
##  [3] "pr_bcc-csm1-1_r1i1p1_rcp45"     "pr_bcc-csm1-1_r1i1p1_rcp85"    
##  [5] "rhsmin_BNU-ESM_r1i1p1_rcp45"    "rhsmin_BNU-ESM_r1i1p1_rcp85"   
##  [7] "rhsmin_bcc-csm1-1_r1i1p1_rcp45" "rhsmin_bcc-csm1-1_r1i1p1_rcp85"
##  [9] "tasmin_BNU-ESM_r1i1p1_rcp45"    "tasmin_BNU-ESM_r1i1p1_rcp85"   
## [11] "tasmin_bcc-csm1-1_r1i1p1_rcp45" "tasmin_bcc-csm1-1_r1i1p1_rcp85"

Set your area of interest

The Firehose function downloads all the requested variables, for all available time points, for a SINGLE geographic location. The underlying data are summarized to points on a grid and don’t include every conceivable point you could enter. You need to first suggest a lat/long pair and then find the aggregated download point that is closest to your suggested point. In the code here, we use Open Street Map to download a polygon for Rocky Mountain National Park and then find the centroid as our suggested point.

aoi_name <- "colorado"
bb <- getbb(aoi_name)
my_boundary <- opq(bb, timeout=300) %>%
  add_osm_feature(key = "boundary", value = "national_park") %>%
  osmdata_sf()
my_boundary
my_boundary$osm_multipolygons
## Simple feature collection with 13 features and 42 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: -109.3377 ymin: 36.28472 xmax: -103.4169 ymax: 40.74693
## Geodetic CRS:  WGS 84
## First 10 features:
##          osm_id                                       name
## 390960   390960               Rocky Mountain National Park
## 395552   395552 Black Canyon of the Gunnison National Park
## 395555   395555         Curecanti National Recreation Area
## 5453168 5453168                 Colorado National Monument
## 5725308 5725308             Great Sand Dunes National Park
## 5725309 5725309         Great Sand Dunes National Preserve
## 5749022 5749022            Browns Canyon National Monument
## 5749060 5749060   Florissant Fossil Beds National Monument
## 5749153 5749153                 Dinosaur National Monument
## 5749169 5749169              Yucca House National Monument
##                   attribution      boundary  boundary.type  ele gnis.county_id
## 390960  National Park Service national_park protected_area                    
## 395552  National Park Service national_park                2045            085
## 395555  National Park Service national_park protected_area                    
## 5453168                       national_park                                   
## 5725308 National Park Service national_park protected_area 2590            109
## 5725309 National Park Service national_park protected_area                    
## 5749022                       national_park protected_area                    
## 5749060                       national_park                                   
## 5749153                       national_park                                   
## 5749169                       national_park                                   
##         gnis.created gnis.feature_id gnis.state_id heritage heritage.operator
## 390960                                                                       
## 395552    10/13/1978          203271            08                           
## 395555                                                                       
## 5453168                                                                      
## 5725308   08/31/1992          192558            08                           
## 5725309                                                                      
## 5749022                                                                      
## 5749060                                                                      
## 5749153                                                                      
## 5749169                       202628                                         
##                leisure name.ar name.de name.en name.es name.fr name.ja name.nl
## 390960                                                                        
## 395552  nature_reserve                                                        
## 395555  nature_reserve                                                        
## 5453168 nature_reserve                                                        
## 5725308 nature_reserve                                                        
## 5725309 nature_reserve                                                        
## 5749022 nature_reserve                                                        
## 5749060 nature_reserve                                                        
## 5749153 nature_reserve                                                        
## 5749169                                                                       
##         name.ru name.zh                            operator operator.short
## 390960                                National Park Service            NPS
## 395552                  United States National Park Service               
## 395555                                National Park Service            NPS
## 5453168                                                                   
## 5725308                               National Park Service            NPS
## 5725309                               National Park Service            NPS
## 5749022                                            BLM;USFS               
## 5749060                 United States National Park Service               
## 5749153                 United States National Park Service               
## 5749169                 United States National Park Service               
##         operator.type operator.wikidata       operator.wikipedia ownership
## 390960         public           Q308439 en:National Park Service  national
## 395552                                                            national
## 395555         public           Q308439                           national
## 5453168                                                           national
## 5725308        public           Q308439 en:National Park Service  national
## 5725309        public           Q308439 en:National Park Service  national
## 5749022                                                           national
## 5749060                                                           national
## 5749153                                                           national
## 5749169                                                           national
##         protect_class protect_id protect_title  protected
## 390960              2          2 National Park perpetuity
## 395552                         2               perpetuity
## 395555              5                          perpetuity
## 5453168             3                          perpetuity
## 5725308                        2               perpetuity
## 5725309                        2               perpetuity
## 5749022             3                          perpetuity
## 5749060             5                                    
## 5749153             3                                    
## 5749169            22                                    
##                 protection_title ref.whc
## 390960             National Park        
## 395552             National Park        
## 395555  National Recreation Area        
## 5453168        National Monument        
## 5725308            National Park        
## 5725309        National Preserve        
## 5749022        National Monument        
## 5749060        National Monument        
## 5749153        National Monument        
## 5749169        National Monument        
##                                                                  source
## 390960                                                                 
## 395552                                                                 
## 395555                   http://science.nature.nps.gov/im/gis/index.cfm
## 5453168 http://science.nature.nps.gov/im/gis/index.cfm;en.wikipedia.org
## 5725308                                                                
## 5725309                                                                
## 5749022                                                                
## 5749060                                                                
## 5749153                                                                
## 5749169                                                                
##         source.geometry     type                            website
## 390960                  boundary           http://www.nps.gov/romo/
## 395552                  boundary                                   
## 395555                  boundary                                   
## 5453168                 boundary https://www.nps.gov/colm/index.htm
## 5725308                 boundary                                   
## 5725309                 boundary                                   
## 5749022                 boundary                                   
## 5749060                 boundary https://www.nps.gov/flfo/index.htm
## 5749153                 boundary https://www.nps.gov/dino/index.htm
## 5749169                 boundary           http://www.nps.gov/yuho/
##         whc.criteria whc.inscription_date wikidata
## 390960                                     Q777183
## 395552                                     Q305880
## 395555                                            
## 5453168                                   Q1111312
## 5725308                                    Q609097
## 5725309                                    Q609097
## 5749022                                           
## 5749060                                   Q1430179
## 5749153                                   Q1226698
## 5749169                                    Q602414
##                                              wikipedia
## 390960                 en:Rocky Mountain National Park
## 395552   en:Black Canyon of the Gunnison National Park
## 395555                                                
## 5453168                  en:Colorado National Monument
## 5725308 en:Great Sand Dunes National Park and Preserve
## 5725309 en:Great Sand Dunes National Park and Preserve
## 5749022                                               
## 5749060    en:Florissant Fossil Beds National Monument
## 5749153                  en:Dinosaur National Monument
## 5749169               en:Yucca House National Monument
##                               geometry
## 390960  MULTIPOLYGON (((-105.7615 4...
## 395552  MULTIPOLYGON (((-107.8019 3...
## 395555  MULTIPOLYGON (((-107.6507 3...
## 5453168 MULTIPOLYGON (((-108.7384 3...
## 5725308 MULTIPOLYGON (((-105.517 37...
## 5725309 MULTIPOLYGON (((-105.5522 3...
## 5749022 MULTIPOLYGON (((-105.9675 3...
## 5749060 MULTIPOLYGON (((-105.2509 3...
## 5749153 MULTIPOLYGON (((-108.9678 4...
## 5749169 MULTIPOLYGON (((-108.6855 3...
my_boundary$osm_multipolygons[1,]
## Simple feature collection with 1 feature and 42 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: -105.9137 ymin: 40.15777 xmax: -105.4936 ymax: 40.55379
## Geodetic CRS:  WGS 84
##        osm_id                         name           attribution      boundary
## 390960 390960 Rocky Mountain National Park National Park Service national_park
##         boundary.type ele gnis.county_id gnis.created gnis.feature_id
## 390960 protected_area                                                
##        gnis.state_id heritage heritage.operator leisure name.ar name.de name.en
## 390960                                                                         
##        name.es name.fr name.ja name.nl name.ru name.zh              operator
## 390960                                                 National Park Service
##        operator.short operator.type operator.wikidata       operator.wikipedia
## 390960            NPS        public           Q308439 en:National Park Service
##        ownership protect_class protect_id protect_title  protected
## 390960  national             2          2 National Park perpetuity
##        protection_title ref.whc source source.geometry     type
## 390960    National Park                                boundary
##                         website whc.criteria whc.inscription_date wikidata
## 390960 http://www.nps.gov/romo/                                    Q777183
##                              wikipedia                       geometry
## 390960 en:Rocky Mountain National Park MULTIPOLYGON (((-105.7615 4...
boundaries <- my_boundary$osm_multipolygons[1,] 
pulled_bb <-  st_bbox(boundaries)
pulled_bb
##       xmin       ymin       xmax       ymax 
## -105.91371   40.15777 -105.49358   40.55379
pt <- st_coordinates(st_centroid(boundaries))
## Warning in st_centroid.sf(boundaries): st_centroid assumes attributes are
## constant over geometries of x

So, our suggested point is pt.

pt
##           X        Y
## 1 -105.6973 40.35543

Now we can check to see which point in the dataset most closely resembles our suggested point.

lat_pt <- pt[1,2]
lon_pt <- pt[1,1]
lons <- src %>% activate("D2") %>% hyper_tibble()
lats <- src %>% activate("D1") %>% hyper_tibble()
known_lon <- lons[which(abs(lons-lon_pt)==min(abs(lons-lon_pt))),]
known_lat <- lats[which(abs(lats-lat_pt)==min(abs(lats-lat_pt))),]
chosen_pt <- st_as_sf(cbind(known_lon,known_lat), coords = c("lon", "lat"), crs = "WGS84", agr = "constant")

The chosen point is relatively close to the suggested point.

chosen_pt
## Simple feature collection with 1 feature and 0 fields
## Geometry type: POINT
## Dimension:     XY
## Bounding box:  xmin: -105.6891 ymin: 40.3545 xmax: -105.6891 ymax: 40.3545
## Geodetic CRS:  WGS 84
##                    geometry
## 1 POINT (-105.6891 40.3545)

Run the Firehose function

You will need to supply the Firehose function with two things for it to work properly:

  1. an input_variable object specifying the properly formatted list of variables you want to download
  2. the latitude and longitude coordinates for the available data point you want to extract

Provide those two things to the single_point_firehose() function and it will download your data as fast as possible, organize those data into an sf spatial dataframe, and return that dataframe.

Some notes of caution.

When you use numerous cores to make a lot of simultaneous requests, the server occasionally mistakes you for a DDOS attack and kills your connection. The firehose function deals with this uncertainty, and other forms of network disruption, by conducting two phases of downloads. It first tries to download everything you requested through a ton of independent api requests, it then scans all of the downloads to see which ones failed and organizes a follow-up download run to recapture downloads that failed on your first attempt. Do not worry if you see the progress bar get disrupted during either one of these passes, it will hopefully capture all of the errors and retry those downloads.

This function will run faster if you provide it more cores and a faster internet connection. In our test runs with 11 cores on a 12 core laptop and 800 MB/s internet it took 40-80 minutes to download 181 variables. The variance was largely driven by our network connection speed and error rate (because they require a second try to download).

out <- single_point_firehose(input_variables, known_lat, known_lon )
out
## Simple feature collection with 34333 features and 13 fields
## Attribute-geometry relationship: 13 constant, 0 aggregate, 0 identity
## Geometry type: POINT
## Dimension:     XY
## Bounding box:  xmin: -105.6891 ymin: 40.3545 xmax: -105.6891 ymax: 40.3545
## Geodetic CRS:  WGS 84
## First 10 features:
##    pr_BNU-ESM_r1i1p1_rcp85  time pr_bcc-csm1-1_r1i1p1_rcp45
## 1                     32.2 38716                        0.0
## 2                      1.1 38717                        1.0
## 3                      1.9 38718                        0.0
## 4                      7.8 38719                        0.0
## 5                      0.7 38720                        0.0
## 6                      5.4 38721                        0.0
## 7                      1.1 38722                        0.6
## 8                      0.0 38723                        0.7
## 9                      0.0 38724                        3.1
## 10                     0.6 38725                        1.9
##    rhsmin_BNU-ESM_r1i1p1_rcp45 rhsmin_BNU-ESM_r1i1p1_rcp85
## 1                           66                          65
## 2                           40                          40
## 3                           41                          40
## 4                           55                          55
## 5                           54                          53
## 6                           53                          53
## 7                           34                          35
## 8                           38                          38
## 9                           51                          55
## 10                          48                          50
##    rhsmin_bcc-csm1-1_r1i1p1_rcp45 tasmin_BNU-ESM_r1i1p1_rcp45
## 1                              59                       271.2
## 2                              54                       269.9
## 3                              50                       266.6
## 4                              54                       270.4
## 5                              51                       274.6
## 6                              49                       272.9
## 7                              31                       267.1
## 8                              37                       265.8
## 9                              42                       265.8
## 10                             33                       267.0
##    tasmin_BNU-ESM_r1i1p1_rcp85 tasmin_bcc-csm1-1_r1i1p1_rcp45
## 1                        271.0                          264.2
## 2                        269.7                          269.7
## 3                        266.6                          265.3
## 4                        269.8                          264.7
## 5                        274.9                          260.9
## 6                        272.8                          260.3
## 7                        267.2                          261.0
## 8                        265.9                          264.3
## 9                        265.7                          268.8
## 10                       266.9                          269.5
##    pr_BNU-ESM_r1i1p1_rcp45 pr_bcc-csm1-1_r1i1p1_rcp85
## 1                     34.9                        0.0
## 2                      1.2                        1.2
## 3                      2.1                        0.0
## 4                      7.7                        0.0
## 5                      0.5                        0.0
## 6                      5.0                        0.0
## 7                      1.3                        0.0
## 8                      0.0                        1.8
## 9                      0.0                        0.9
## 10                     0.4                        1.2
##    rhsmin_bcc-csm1-1_r1i1p1_rcp85 tasmin_bcc-csm1-1_r1i1p1_rcp85
## 1                              56                          263.9
## 2                              54                          269.3
## 3                              51                          265.2
## 4                              56                          263.5
## 5                              51                          261.5
## 6                              50                          261.4
## 7                              35                          261.0
## 8                              39                          263.3
## 9                              39                          267.0
## 10                             34                          268.9
##                     geometry
## 1  POINT (-105.6891 40.3545)
## 2  POINT (-105.6891 40.3545)
## 3  POINT (-105.6891 40.3545)
## 4  POINT (-105.6891 40.3545)
## 5  POINT (-105.6891 40.3545)
## 6  POINT (-105.6891 40.3545)
## 7  POINT (-105.6891 40.3545)
## 8  POINT (-105.6891 40.3545)
## 9  POINT (-105.6891 40.3545)
## 10 POINT (-105.6891 40.3545)

mirror server hosted at Truenetwork, Russian Federation.