Home

Awesome

Using machine learning to improve free topography data for flood modelling

As part of the requirements for the Master of Disaster Risk & Resilience programme at the University of Canterbury, this research project explored the potential for machine learning models to make free Digital Surface Models (such as the widely-used SRTM) more applicable for flood modelling, by stripping away vertical biases relating to vegetation & built-up areas to get a "bare earth" Digital Terrain Model.

The image below visualises the performance of one of these models (a fully-convolutional neural network) in one of the three test zones considered (i.e. data unseen during model training & validation, used to assess the model's ability to generalise to new locations). A more detailed description is provided in the associated open-access journal article: Meadows & Wilson 2021.

graphical_abstract <br/>

Python scripts

All Python code fragments used during this research are shared here (covering preparing input data, building & training three different ML models, and visualising the results), in the hope that they'll be useful for others doing related work or extending/improving this approach. Please note this code includes lots of exploratory steps & some dead ends, and is not a refined step-by-step template for applying this approach in a new location.

Scripts are stored in folders relating to the virtual environments within which they were run, along with a text file summarising all packages loaded in each environment:

Brief summary of datasets used

The data processed for use in this project comprised the feature data (free, global datasets relevant to the vertical bias in DSMs, to be used as inputs to the machine learning models), target data (the reference "bare earth" DTM from which the models learn to predict vertical bias), and some supplementary datasets (not essential to the modelling but used to explore/understand the results).

Feature data

A guiding principle for the project was that all feature (input) data should be available for free and with global (or near-global) coverage, so as to maximise applicability in low-income countries/contexts. While these datasets were too big to store here, all can be downloaded for free and relatively easily (some require signing up to the provider platform) based on the notes below.

Digital Surface Models (DSMs)

Multi-spectral imagery

Night-time light

Others

Target data

In order to learn how to predict (and then correct) the vertical biases present in DSMs, the models need reference data - "bare earth" DTMs assumed to be the "ground truth" that we're aiming for. For this project, we used three of the high-resolution LiDAR-derived DTMs published online by the New Zealand Government, accessible to all via the Land Information New Zealand (LINZ) Data Service. The specific LiDAR surveys used are summarised below, from the Marlborough & Tasman Districts (in the north of Aotearoa New Zealand's South Island):

To find similar target/reference DTM data in other parts of the world, the OpenTopography initiative maintains a catalogue of freely available sources.

Supplementary data

A few other datasets are referred to in the code, not as inputs to the machine learning models but just as references to better understand the results.

Brief summary of approach taken

The broad approach taken is summarised below as succinctly as possible, with further details provided as comments in the relevant scripts.

  1. For each available LiDAR survey zone, process the DSMs and DTM in tandem: clipping each DSM (SRTM, ASTER and AW3D30) to the extent covered by the LiDAR survey, and resampling the DTM to the same resolution & grid alignment as each DSM. Various DSM derivatives (such as slope, aspect & topographical index products) are also prepared here.

  2. Based on a comparison of differences between each DSM and the DTM (resampled to match that particular DSM), the SRTM DSM was selected as the "base" for all further processing (script).

  3. Process all other input datasets - resampling to match the SRTM resolution & grid alignment, masking out clouds for the multi-spectral imagery, applying bounds where appropriate (e.g. for percentage variables):

    • Landsat-7 multi-spectral imagery (script)
    • Landsat-8 multi-spectral imagery (script)
    • ASTER DEM (script)
    • AW3D30 DEM (script)
    • Night-time light (script)
    • Global forest canopy height (script)
    • Global forest cover (script)
    • Global surface water (script)
    • OpenStreetMap layers (script)
  4. Divide all available data into training (90%), validation (5%) and testing (5%) subsets, and prepare for input to the pixel-based approaches (random forest & standard neural network) and patch-based approach (convolutional neural network) (script).

  5. Use step floating forward selection (SFFS) (with a random forest estimator) to select relevant features based on the training & validation datasets (script)

  6. Train the random forest model, tuning hyperparameters with reference to the validation data subset (script)

  7. Train the densely-connected neural network model, tuning hyperparameters with reference to the validation data subset (script)

  8. Train the fully-convolutional neural network model, tuning hyperparameters with reference to the validation data subset (script)

  9. Visualise results for the three zones of the testing data subset (unseen during model development) (script)