Awesome
<!-- README.md is generated from README.Rmd. Please edit that file -->CWAC
<!-- badges: start --> <!-- badges: end -->This packages provides functionality to access, download, and manipulate data from the Coordinated Waterbird Counts Project. It is possible to download these same data using the CWAC API, but being able to pull these data straight into R, in a standard format, should make them more accessible, easier to analyse, and eventually make our analyses more reliable and reproducible.
There is another package named
ABAP
that provides similar
functionality, but in this case to download count data from the African
Bird Atlas Project. In addition, there is a companion package the
ABDtools
package, which
adds the functionality necessary to annotate different data formats
(points and polygons) with environmental information from the Google
Earth Engine data
catalog.
INSTRUCTIONS TO INSTALL
To install CWAC
from GitHub using the
remotes package, run:
install.packages("remotes")
remotes::install_github("AfricaBirdData/CWAC")
DOWNLOAD DATA FOR A SITE
There are two possible ways of downloading data from CWAC. We can select a site and download data for all species in that site or, vice versa, we can select a species and download all data for that species across all sites. In this section, we will explore the former and download all data associated with a CWAC site.
Let’s use Barberspan, a CWAC site located in the North West province of
South Africa as an example. The first thing we need to do is find our
site’s ID code. We can use the function listCwacSites
to list all the
sites in the North West and find this code.
library(CWAC)
# List all sites at the North West province
nw_sites <- listCwacSites(.region_type = "province", .region = "North West")
# Find the code for Barberspan
site_id <- nw_sites[nw_sites$LocationName == "Barberspan", "LocationCode", drop = TRUE]
# We can find more info about this site with getCwacSiteInfo()
getCwacSiteInfo(site_id)
Once we have the code for the CWAC site, we can download count data
collected there using the function getCwacSiteCounts
. This will
download all the CWAC cards submitted for Barberspan.
bp_counts <- getCwacSiteCounts(site_id)
We can do all of this using a dplyr approach.
library(dplyr, warn.conflicts = FALSE)
bp_counts <- listCwacSites(.region_type = "province",
.region = "North West") %>%
filter(LocationName == "Barberspan") %>%
pull("LocationCode") %>%
getCwacSiteCounts()
DOWNLOAD DATA FOR A SPECIES
We might not be interested in any particular site, but in all records of certain species. We can follow a similar procedure: 1) find the species code, and 2) download the data.
To find the species code we can use the function listCwacSpp
, which
lists all CWAC species, and from this list we can find the one we want.
Say we are interested in the African Black Duck.
# List all species
spp_all <- listCwacSpp()
# Find the code for Black Duck
sp_id <- spp_all[spp_all$Common_species == "African Black" &
spp_all$Common_group == "Duck",
"SppRef", drop = TRUE]
Once we have the code, we can download the data for this species across
all CWAC sites with the function getCwacSppCounts()
bd_counts <- getCwacSppCounts(sp_id)
You may get some warnings from the parser because data weren’t entered in the standard format they were supposed to. We preferred printing these messages, so that you can make sure your analysis won’t be impacted by this.
Again, we can also do all this using dplyr.
library(dplyr, warn.conflicts = FALSE)
bd_counts <- listCwacSpp() %>%
filter(Common_species == "African Black",
Common_group == "Duck") %>%
pull("SppRef") %>%
getCwacSppCounts()
DOWNLOAD SITE BOUNDARIES
With the function getCwacSiteBoundary
we can download the polygons
enclosing CWAC sites. Please, note that these polygons are not always up
to date or even present for all sites in the database and therefore, we
should always check that they are accurate after downloading.
getCwacSiteBoundary
can retrieve boundaries for multiple sites at
once, in a single API call, making the process much more efficient. It
is always a good idea to retrieve all sites in a single call instead of
one at a time.
# Download our counts for the Black Duck just in case they got lost
counts <- listCwacSpp() %>%
filter(Common_species == "African Black",
Common_group == "Duck") %>%
pull("SppRef") %>%
getCwacSppCounts()
# Then let's extract the boundaries of the CWAC sites in our data
# At the moment getCwacSiteBoundary() can only retrieve boundaries from sites of
# one province/country at a time
# First identify the countries in our data
unique(counts$Country)
# It's only South Africa at the time, so we can pull all the sites at once. Otherwise,
# we would just repeat the process for the different countries.
sites <- unique(counts$LocationCode)
boundaries <- getCwacSiteBoundary(loc_code = sites,
region_type = "country",
region = "South Africa")
# Add boundaries to the count data
counts_bd <- counts %>%
dplyr::left_join(boundaries, by = "LocationCode")
INSTRUCTIONS TO CONTRIBUTE CODE
First clone the repository to your local machine:
- In RStudio, create a new project
- In the ‘Create project’ menu, select ‘Version Control’/‘Git’
- Copy the repository URL (click on the ‘Code’ green button and copy the link)
- Choose the appropriate directory and ‘Create project’
- Remember to pull the latest version regularly
For site owners:
There is the danger of multiple people working simultaneously on the project code. If you make changes locally on your computer and, before you push your changes, others push theirs, there might be conflicts. This is because the HEAD pointer in the main branch has moved since you started working.
To deal with these lurking issues, I would suggest opening and working on a topic branch. This is a just a regular branch that has a short lifespan. In steps:
- Open a branch at your local machine
- Push to the remote repo
- Make your changes in your local machine
- Commit and push to remote
- Create pull request:
- In the GitHub repo you will now see an option that notifies of changes in a branch: click compare and pull request.
- Delete the branch. When you are finished, you will have to delete the
new branch in the remote repo (GitHub) and also in your local machine.
In your local machine you have to use Git directly, because apparently
RStudio doesn´t do it:
- In your local machine, change to master branch.
- Either use the Git GUI (go to branches/delete/select branch/push).
- Or use the console typing ‘git branch -d your_branch_name’.
- It might also be necessary to prune remote branches with ‘git remote prune origin’.
Opening branches is quick and easy, so there is no harm in opening multiple branches a day. However, it is important to merge and delete them often to keep things tidy. Git provides functionality to deal with conflicting branches. More about branches here:
https://git-scm.com/book/en/v2/Git-Branching-Branches-in-a-Nutshell
Another idea is to use the ‘issues’ tab that you find in the project header. There, we can identify issues with the package, assign tasks and warn other contributors that we will be working on the code.