Home

Awesome

Swallowtail Climate Change

Data and code for North American Swallowtail and larval host plant distributions in relation to climate change

Currently under development

General approach:

Retrieves data from online sources (in this case, the Global Biodiversity Information Facility, GBIF) and perform quality control processes to ensure observations are only from Canada, Mexico, and the United States of America. The data are analyzed to create species distribution models based on presence and pseudo-absence data using a variety of models (e.g. Maximum Entropy, generalized linear model). The models are then used to predict presence or absence under a variety of conditions, including current climate and forecast climate models. These predictions are used to estimate change in the range sizes of individual butterfly species and the relative size of range overlap of their known host plant species. Finally, these predictions are combined with land use data to assess the importance of protected areas for maintaining suitable habitat for Papilio species.

Dependencies

The project uses the following additional R packages:

Data to support much of the rnaturalearth package functionality are stored in two data packages that should be installed using the following: devtools::install_github("ropensci/rnaturalearthdata")
devtools::install_github("ropensci/rnaturalearthhires")

Workflow

The workflow has the general structure of:

  1. Data retrieval and cleaning
  2. Preparing R scripts for analyses of individual species
  3. Bulk processing of single-species analyses; for each species distribution method (boosted regression trees (BRT), generalized additive model (GAM), Lasso, maximum entropy (MaxEnt), and random forest (RF)):
    1. Evaluate models using spatial cross validation (CV)
    2. Using best models from CV step, above, re-estimate model parameters using all observational data (no training/testing split)
    3. Predict suitability values for each species, for each of five climate models (one contemporary, four forecast); combine these for an ensemble suitability raster and presence/absence prediction
    4. Combine predicted consensus distributions of each insect species with the consensus predictions for all of its respective host plants to create a single raster with distributional information (see documentation in functions/overlap_raster.R for interpretations of values in those rasters)
  4. Synthesizing results of single-species analyses
  5. Comparing areas of suitable habitat to areas that are categorized as protected by the IUCN.

Scripts, in order of use

Descriptions below are limited to scripts that are part of the analysis workflow, including data retrieval and preparation. It does not includes some scripts that are used for quality control purposes (e.g. src/data/count-gbif-names.R).

  1. Data retrieval and cleaning (in src/data)
    1. src/data/gbif-1-download.R: Download observational data from GBIF to the data folder; note by default the data files that are downloaded by this script are not under version control
    2. src/data/gbif-2-filter.R: Run quality assurance on downloaded data, and retain only those records that:
      1. are not observations based on barcodes only,
      2. are observations from 2000-2023,
      3. are in locations with climate data (which effectively restricts observations to North America), and
      4. are thinned to a max of X observations per grid cell (of climate raster),
      5. are inside the 98% contour of observations
    3. src/data/gbif-3-presence-absence.R: Generate a presence/absence dataset for each species, to be used in any species distribution model; also create a shapefile defining geographical limits of predictions.
    4. src/data/prep-aridity-data.R: DEPRECATED Download measure(s) of aridity and calculate mean and median values for each insect species; data stored as a csv in data/aridity-statistics.csv.
    5. src/data/prep-climate-data.R: Download monthly climate data for time span of interest (2000-2018) and calculate the average values for the 19 standard bioclimatic variables (should not need to be run locally; data are available in data/wc2-1 directory); resulting rasters are in 2.5 minute resolution.
    6. src/data/prep-forecast-data.R: Download monthly climate data for ensemble of forecast climate models and calculate the average values for the 19 standard bioclimatic variables (should not need to be run locally; data are available in data/ensemble sub-directories); resulting rasters are in 2.5 minute resolution.
    7. src/data/protected-areas-management.R: Categorize all polygons in IUCN protected area shapefile into four management types (National, State, Local, Private); resulting shapefile too large for GitHub, so currently stored on Google Drive.
  2. Bulk processing of single-species analyses (see below for example graphic)
    1. src/run-indiv/run-all-1-CV.R: Run model evaluation for individual species; can toggle on/off to run all species or just a subset. Output includes full data estimation for MaxEnt method.
    2. src/run-indiv/run-all-2-SDMs-full.R: Estimate SDMs on full data set for individual species' (for BRT, GAM, Lasso, and RF methods); can toggle on/off to run all species or just a subset.
    3. src/run-indiv/run-all-3-predict.R: Predict suitabilities and distributions (presence/absence rasters) for individual species (for BRT, GAM, Lasso, MaxEnt, and RF methods); can toggle on/off to run all species or just a subset.
  3. Running analyses on HPC (if the the run-indiv scripts of step 3 are to be run on a high-performance computing cluster)
    1. src/hpc/run-all-1-CV.slurm: Run the R script src/run-indiv/run-all-1-CV.R via slurm
    2. src/hpc/run-all-2-SDMs-full.slurm: Run the R script src/run-indiv/run-all-2-SDMs-full.R via slurm
    3. src/hpc/run-all-3-predict.slurm: Run the R script src/run-indiv/run-all-3-predict.R via slurm
  4. Synthesizing results of single-species analyses
    1. src/summary/summary-1-create-overlap-rasters.R: Create predicted overlap rasters for each species of insect; see details of raster cell values in the script. Will also create maps (ggplot-produced png files) if indicated.
    2. src/summary/summary-2-compare-ranges.R: Compare the ranges of current to forecast distributions, both considering insect ranges alone, and considering only the areas where insects are predicted to overlap with one or more host plant species; several metrics calculated, including area and median latitude. Also create raster of predicted differences in range between contemporary climate and forecast climate models. Will also create maps (currently png files) if indicated.
    3. src/summary/summary-3-draw-species-richness-maps.R: Draw maps of Papilio species richness for current and forecast climate conditions and a map showing the change between current and forecast estimates.
    4. src/summary/create-delta-maps.R: Create maps (graphics files) showing changes in areas predicted as suitable between time periods.
    5. src/summary/create-observations-maps.R: Create graphics files (currently png) of filtered observations on a map.
    6. src/protected-areas-1-calc-species.R: Calculate proportion of species' distributions that are in protected areas.
    7. src/protected-areas-2-calc-hotspots.R: Calculate proportion of areas with high swallowtail species richness that are protected.
    8. src/protected-areas-3-plot-species.R: Create plot of area (sq km) and proportion of individual species' suitable range that is on land with some form of protection.
    9. src/protected-areas-4-plot-hotspots.R: Create plot of area (sq km) and proportion of areas deemed "hotspots" (currently areas with >= 4 Papilio species) that is on land with some form of protection.

Analysis workflow example

Analysis workflow with Papilio rumiko and one of its host plants, Casimiroa greggii, from bulk processing of single-species analyses (step 3, above). Example of analysis workflow with Papilio rumiko and one of its host plants, Ptelea trifoliata

Directory structure

Miscellany

Additional resources