Home

Awesome

dl_image_segmentation

Contains code for retrieving and preparing data for running image segmentation deep learning models, with a focus on the Descartes Labs API.

Repository consists of a python package dl_segmentation_utils, and three Jupyter notebooks demonstrating use of the package functionality, as described below:

Creation of training data

The notebook create_training_samples.ipynb contains code for generating training data, retrieving imagery from the Descartes Labs catalog and creating corresponding label data from a provided spatial dataset. The data retrieval is based on the Descartes Labs API - it uses their functionality to divide the AOI into tiles, and retrieves imagery from their catalog. As such, registration / authentication with Descartes Labs is needed before this can be used.

The training data are created in the form of two parallel folders named /images and /labels with the following properties:

This format is equivalent to that created by the ArcGIS Export Training Data For Deep Learning tool in the "Classified Tiles" format. Following the ESRI convention we refer to the files as "image chips".

The images retrieved comprise a mosaic of available imagery, after optionally filtering by date range and cloud cover. At each pixel the output value is selected by a choice of methods: either taken from the image that is closest in time to a specified reference date (whilst also optionally falling between a min/max date and having cloud cover not greater than that specified); or a median of pixels remaining after cloud masking and optional date filtering.

Image retrieval can be run in parallel on a single machine; speed varies based on the response time of the DL API and depends on the size of images and the size of the catalog. As a guide, retrieving VHR RGB data from the Airbus Pleiades dataset to 256x256 pixel tiles averages about 4 images per second when appropriately parallelised.

Translation of image chips to TFRecords

The notebook translate_chips_to_tfrecords.ipynb contains code for translating image chips in the format described above into sharded TFRecord files. This code is not specific to Descartes Labs and can be used on image chip datasets created by other means such as the ESRI toolset.

Various options are provided for storing the data in the TFRecords, depending on the requirements for minimising file size vs ease of use. These boil down to a choice between storing the raw bytes of the (compressed) JPG/PNG/GeoTIFF-encoded imagery in the TFRecord, vs decoding the images to numerical arrays and storing those. Where the images are being decoded, there is also a choice between using the TF I/O image decoders to do this vs using more flexible python libraries (GDAL/Rasterio). In the former case the decoding is highly optimised and can be multithreaded, but only RGB 8-bit images in PNG/JPG/BMP format can be used. In the latter case, any GDAL-compatible format can be used but the decoding is slightly slower and parallelisation must be by multiprocessing. In practice, the penalty is not large and a folder of ~6000 256x256 pixel RGB images can be converted in a few seconds by either approach.

Parsing TFRecords

The notebook parse_tfrecords.ipynb contains sample code for parsing TFRecord datasets created by the above means. This notebook does not provide any end-to-end workflows but rather demonstration code for how to parse the TFRecords for use within a model development and training pipeline.