Awesome
Tsammalex Data
This repository holds the data served by http://tsammalex.clld.org/ The data is licensed under a Creative Commons Attribution 4.0 International License.
The data is stored as collection of csv files, suitable for editing with LibreOffice.
Adding images
Adding an image is done in two steps:
- Adding a row to
staged_images.csv
, specifying a publicly available URL to access the image in theid
column (this may be a temporary GitHub repository, wikimedia commons or other publicly available webspace). - Providing the image at the specified URL for download.
Periodically (or upon request), a process is run, which
- loops through
staged_images.csv
, - retrieving the files and uploading them to our file server (computing the md5 hash) on the way,
- enriching the metadata, in case the image is from a known provider (Wikimedia, Flickr, EOL, ...),
- moving the metadata from
staged_images.csv
toimages.csv
, replacing theid
.
Image providers
For several providers of flora and fauna imagery we provide support for downloading images
and associated metadata (by specifying matching URLs as id
in staged_images.csv
):
- EOL images specified by a URL of the form
http://media.eol.org/data_objects/21916329
. (You typically get to a page with such a URL by clicking on any image you encounter while browsing EOL) - Flickr images specified by a URL of the form
https://www.flickr.com/photos/damouns/78968973
, i.e. a photo's details page. - African Plants photos from Senckenberg,
specified by an URL of the form
http://www.westafricanplants.senckenberg.de/root/index.php?page_id=14&id=722#image=26800
, i.e. the URL you see in your browser's location bar after clicking to enlarge an image. - Wikimedia
- Flora of Zimbabwe or Flora of Mozambique
Referential Integrity
Currently, insuring referential integrity of the data is the responsibility of the editor. In particular editors have to make sure
id
columns hold unique values,<name>__id
columns hold values which exist asid
in the referenced table,<name>__ids
columns hold comma-separated lists of values which exist asid
in the referenced table.
Upon push, referential integrity will be checked by travis-ci.
Changing the scientific name of a species
Since the id
of a species is created from its scientifc name, changing this name involves the following steps:
- Add a new species with the updated name and
id
tospecies.csv
. - Update all references to the
id
inimages.csv
andwords.csv
. - TODO redirect the old species to the new one by ...
Supplemental Data
In addition to the data managed in this repository, we fetch data from other sources to enhance the usability of the site.
- Given a proper scientific name for a species we can fetch a corresponding identifier from EOL.
- Given an EOL identifier, we can information about the biological classification of a species and a common english name.
- We also use information on terrestrial ecoregions from WWF, which can be referenced using ecoregion identifiers in the format
AT0101
. - WWF ecoregions and countries in which a species occurs can be computed by matching occurrence information from GBIF against region borders.
Notes
Notes to self:
-
To diff sorted lines, you may run:
diff -b <(sort words.csv) <(sort words_CN.csv)