Home

Awesome

R-CMD-check

ingestr

The package ingestr provides functions to extract (ingest) environmental point data (given longitude, latitude, and required dates) from large global files or remote data servers and create time series at user-specified temporal resolution (currently, just daily implemented). The main functionalities are:

This is to make your life simpler when downloading and reading site-scale data, using a common interface with a single function for single-site and multi-site ingest, respectively, and a common and tidy format of ingested data across a variety of data sources and formats of original files. Sources, refers to both data sets hosted remotely and accessed through an API and local data sets. ingestr is particularly suited for preparing model forcing and offers a set of functionalities to transform original data into common standardized formats and units. This includes interpolation methods for converting monthly climate data (CRU TS currently) to daily time steps.

The key functions are ingest_bysite() and ingest() for a single-site data ingest and a multi-site data ingest, respectively. For the multi-site data ingest, site meta information is provided through the argument siteinfo which takes a data frame with columns lon for longitude, lat for latitude, and (for time series downloads) year_start and year_end, specifying required dates (including all days of respective years). Sites are organised along rows. An example site meta info data frame is provided as part of this package for sites included in the FLUXNET2015 Tier 1 data set (siteinfo_fluxnet2015, additional columns are not required by ingest_bysite() and ingest()).

The following sources can be handled currently:

Data sourceData typeCoverageSource IDReading fromRemark
FLUXNETtime series by sitesitefluxnetlocal filesExtraction by site name
WATCH-WFDEItime series raster mapglobalwatch_wfdeilocal files
WFDE5time series raster mapglobalwfde5local filesCucchi et al. (2020)
CRUtime series raster mapglobalcrulocal files
MODIS LP DAACtime series raster mapglobalmodisremote serverusing MODISTools
Google Earth Enginetime series raster mapglobalgeeremote serverusing Koen Hufken's gee_suset library
ETOPO1raster mapglobaletopo1local files
Mauna Loa CO2time seriessiteco2_mloremote serverusing the climate R package
HWSDraster map, databaseglobalhwsdlocal filesusing an adaption of David Le Bauer's rhwsd R package
WWF Ecoregionsshapefile mapglobalwwflocal filesOlsen et al. (2001)
N depositiontime series raster mapglobalndeplocal filesLamarque et al. (2011)
SoilGridsraster mapglobalsoilgridsremote serverHengl et al. (2017)
ISRIC WISE30secraster mapglobalwiselocal filesBatjes (2016)
GSDE Soilraster mapglobalgsdelocal filesShangguan et al. 2014
WorldClimraster mapglobalgsdelocal filesFick & Hijmans, 2017

Examples to read data for a single site for each data type are given in Section 'Examples for a single site'. Handling ingestion for multiple sites is described in Section 'Example for a set of sites'. Unless remarked otherwise, extraction goes by longitude/latitude values. Note that this package does not provide the original data. Please follow links to data sources above where data is read from local files, and always cite original references.

Variable names and units

All ingested data follows standardized variable naming and SI units. For example:

VariableVariable nameUnits
Gross primary productiongppg CO$^{-2}$ m$^{-2}$
Air temperaturetemp$^\circ$C
Daily minimum air temperaturetmin$^\circ$C
Daily maximum air temperaturetmax$^\circ$C
Precipitationprecmm s$^{-1}$
Vapour pressure deficitvpdPa
Atmospheric pressurepatmPa
Net radiationnetradJ m$^{-2}$ s$^{-1}=$ W m$^{-2}$
Photosynthetic photon flux densityppfdmol m$^{-2}$ s$^{-1}$
Elevation (altitude)elvm a.s.l.

Use these variable names for specifying which variable names they correspond to in the original data source (see argument getvars to functions ingest() and ingest_bysite()). gpp is cumulative, corresponding to the time scale of the data. For example, if daily data is read, gpp is the total gross primary production per day (g CO$^{-2}$ m$^{-2}$ d$^{-1}$).

Installation

To install and load the rsofun package using the latest release run the following command in your R terminal:

if(!require(devtools)){install.packages("devtools")}
devtools::install_github("geco-bern/ingestr")
library(ingestr)

Dependencies

The ingestr package relies heavily on the tidyverse. Dependencies are dplyr, purrr, lubridate, tidyr, raster, lubridate, stringi, stringr, sp, ncdf4, signal, climate. To install all required packages, do:

list_pkgs <- c("dplyr", "purrr", "lubridate", "tidyr", "raster", "lubridate", "stringi", "stringr", "sp", "ncdf4", "signal", "climate", "rgdal", "hwsdr", "gdalUtils", "MODISTools")
new_pkgs <- list_pkgs[!(list_pkgs %in% installed.packages()[,"Package"])]
if(length(new_pkgs)) install.packages(new_pkgs)

Example

Are described in vignette example, available here.

Usage and contribution

This package is designed to be extendible to ingesting other data types (sources). The developer (Beni Stocker) would appreciate if you made sure that your developments can be fed back to this repository. To do so, please use git. See here for a brief introduction to git.

I recommend the following steps if you would just like to use this package (no development):

devtools::install_github("stineb/ingestr")

I recommend the following steps if you would like to use and further develop the package (even just for your own application - But keep in mind: others may benefit from your efforts too!):

  1. Make sure you have a Github account.
  2. Log on to Github, and go to https://github.com/stineb/ingestr and click on 'Fork' in the upper right corner. This makes a copy of the repository that belongs to you, meaning that you can modify, commit, and push changes back to your forked repository as you please.
  3. Clone your fork to your local computer by entering in your terminal (here, it's cloned to a subdirectory ingestr placed in your home directory):
cd home
git clone https://github.com/<your_github_username>/ingestr.git
  1. In RStudio, create a new project in your local directory ~/ingestr/. This opens the repository in RStudio and you have access to the code where all ingestr-functions are implemented (see subdirectory ./R/).
  2. In RStudio, after having edited code, select the 'Build' tab and click on 'Install and Restart' to build the package again. For quick edits and checks, you may simply source the edited files instead of re-building the whole package. If you like to add new functions, create new a source file in subdirectory ./R/, write a nice roxygen header (see other source files as an example), then click on 'Build' -> 'More' -> 'Document', and then again on 'Install and Restart'.
  3. If you're happy with your new edits and additions to the package, you may want to have it fet back to the original repository. To do so, please create a new pull request in GitHub: Click on 'New pull request' on the repository page and follow the inuitive steps. Thanks!

This package is still in its maturing phase. To stay up-to-date with the latest version, regularly re-install from GitHub (devtools::install_github("stineb/ingestr")), or - if you're building from a locally (git) cloned repository - regularly do a git pull and re-install the package.