Home

Awesome

ResDepth: A Deep Residual Prior For 3D Reconstruction From High-resolution Satellite Images

ResDepth

This repository provides the code to train and evaluate ResDepth, an efficient and easy-to-use neural architecture for learned DSM refinement from satellite imagery. It represents the official implementation of the paper:

ResDepth: A Deep Residual Prior For 3D Reconstruction From High-resolution Satellite Images

Corinne Stucker, Konrad Schindler

Abstract: Modern optical satellite sensors enable high-resolution stereo reconstruction from space. But the challenging imaging conditions when observing the Earth from space push stereo matching to its limits. In practice, the resulting digital surface models (DSMs) are fairly noisy and often do not attain the accuracy needed for high-resolution applications such as 3D city modeling. Arguably, stereo correspondence based on low-level image similarity is insufficient and should be complemented with a-priori knowledge about the expected surface geometry beyond basic local smoothness. To that end, we introduce ResDepth, a convolutional neural network that learns such an expressive geometric prior from example data. ResDepth refines an initial, raw stereo DSM while conditioning the refinement on the images. I.e., it acts as a smart, learned post-processing filter and can seamlessly complement any stereo matching pipeline. In a series of experiments, we find that the proposed method consistently improves stereo DSMs both quantitatively and qualitatively. We show that the prior encoded in the network weights captures meaningful geometric characteristics of urban design, which also generalize across different districts and even from one city to another. Moreover, we demonstrate that, by training on a variety of stereo pairs, ResDepth can acquire a sufficient degree of invariance against variations in imaging conditions and acquisition geometry.

Requirements

This code has been developed and tested on Ubuntu 18.04 with Python 3.7, PyTorch 1.9, and GDAL 2.2.3. It may work for other setups but has not been tested thoroughly.

On Ubuntu 18.04, gdal can be installed with apt-get:

sudo apt update
sudo apt install libgdal-dev gdal-bin

Setup

Before proceeding, make sure that GDAL is installed and set up correctly.

To create a Python virtual environment and install the required dependencies, please run:

git clone https://github.com/stuckerc/ResDepth.git
cd ResDepth
python3 -m venv tmp/resdepth
source tmp/resdepth/bin/activate
(resdepth) $ pip install --upgrade pip setuptools wheel
(resdepth) $ pip install -r requirements.txt

in your working directory. Next, use the following command to install GDAL in the previously created virtual environment:

(resdepth) $ pip install --global-option=build_ext --global-option="-I/usr/include/gdal" GDAL==`gdal-config --version`

Quick Start

ResDepth follows a residual learning strategy, i.e., it is trained to refine an imperfect input DSM by regressing a per-pixel correction to the height, using both the DSM and ortho-rectified panchromatic (stereo) images as input.

Data Preparation

We assume that the initial surface reconstruction has been generated with existing multi-view stereo matching and/or depth map fusion techniques. Furthermore, we assume that the images have already been ortho-rectified with the help of the initial surface estimate. Please note that this repository does not provide any functionality to perform data pre-processing (initial surface reconstruction, ortho-rectification) nor image pair selection.

When preparing your data as input for ResDepth, make sure to meet the following requirements:

Note: Throughout our experiments, we use DSMs with a grid spacing of 0.25 m. By construction, ResDepth is generic and can be trained to refine any DSM. However, if the spatial resolution deviates from our setting, it might be required to adapt the tile size tile_size of the DSM patches and/or the depth depth of the U-Net
(number of downsampling and upsampling layers).

For evaluation, ResDepth accepts the following additional rasters as input:

Note: All rasters (DSMs, ortho-images, mask rasters) must be co-registered and cropped to the same spatial (rectangular) extent. Furthermore, the spatial resolution must be the same.

Data Structure

We do not expect a particular structure of how the data is stored. The path to every initial DSM, to the corresponding ground truth DSM, and possibly to raster masks must be listed in the configuration file (see below). The ortho-images and the definition of the image pairs must be provided as text files.

Image List

Prepare a text file imagelist.txt that lists the absolute paths to the pre-computed ortho-rectified satellite images (one file path per line):

path/to/ortho-image1.tif
path/to/ortho-image2.tif
path/to/ortho-image3.tif
path/to/ortho-image4.tif
path/to/ortho-image5.tif
path/to/ortho-image6.tif
path/to/ortho-image7.tif
...

It is possible to use the same image list for training, validation, and testing.

Note: The filename of every image listed in the image list has to be unique (irrespective of the absolute file path due to the definition of the image pair list, see below).

Image Pair List

Prepare a text file pairlist.txt that comprises a comma-separated list of filenames, where every line defines one image pair. If multiple image pairs are specified, each pair needs to be of equal length (i.e., the same number of
images per image pair).

Example image pair list for ResDepth-stereo: The following image pair list defines a single stereo pair composed of the images ortho-image1.tif and ortho-image2.tif. The absolute paths to the ortho-images will be derived by matching the image filenames listed in pairlist.txt and imagelist.txt.

ortho-image1.tif, ortho-image2.tif

Example image pair list for ResDepth-stereo, generalized across viewpoints: To train a ResDepth-stereo network that generalizes across variations in acquisition geometry and the images' radiometry, one simply has to provide multiple stereo pairs in the image pair list, for example:

ortho-image1.tif, ortho-image2.tif
ortho-image2.tif, ortho-image5.tif
ortho-image4.tif, ortho-image5.tif

In this example, ResDepth-stereo will be trained using the three image pairs (ortho-image1.tif, ortho-image2.tif), (ortho-image2.tif, ortho-image5.tif), and (ortho-image4.tif, ortho-image5.tif).

Example image pair list for ResDepth-mono: The following example shows an image pair list to train ResDepth-mono
using the single image ortho-image1.tif as guidance:

ortho-image1.tif, 

Example image pair list for ResDepth-0: The network variant ResDepth-0 does not leverage any satellite views. Therefore, to train or evaluate ResDepth-0, neither the image list imagelist.txt nor the image pair list pairlist.txt have to be provided.

Training

To train ResDepth, run the script train.py with a JSON configuration file as the unique argument:

(resdepth) $ python train.py config.json

The configuration file config.json specifies the input data and the output directory (see below for details). Furthermore, it configures the model architecture, hyperparameters, and training settings. All parameters and their default settings are described in ./lib/config.py.

To monitor and visualize the training process, you can start a tensorboard session with:

(resdepth) $ tensorboard --logdir <tboard_log_dir>

Evaluation

To evaluate the ResDepth prior, run the script test.py with a JSON configuration file as the unique argument:

(resdepth) $ python test.py config_test.json

The configuration file config_test.json specifies the input data, the model architecture of ResDepth and its weights, and the output directory (see below for details).

The script test.py uses a tiling-based strategy to refine the given input DSM. First, it cuts the DSM into a regular grid of overlapping tiles, where the tile size amount to tile_size and the stride to 0.5*tile_size. The DSM patches are then individually refined. Lastly, the refined DSM patches are merged to output a single refined DSM raster. If multiple images (ResDepth-mono) or image pairs (ResDepth-stereo) are provided in the image pair list, the same initial DSM is refined multiple times by using every image (pair) once for guidance. Finally, the error metrics are reported both over all predictions and for every prediction separately.

Configuration File: Training

The following example shows the bare minimum JSON configuration file config.json to train ResDepth. It consists of two objects datasets and output with mandatory and optional name-value pairs that need to be completed by the user.

{
  "datasets": [
    {
      "name": "my_dataset",
      "raster_gt": "path/to/ground_truth_DSM.tif",
      "raster_in": "path/to/initial_DSM.tif",
      "path_image_list": "path/to/imagelist.txt",
      "path_pairlist_training": "path/to/pairlist_training.txt",
      "path_pairlist_validation": "path/to/pairlist_validation.txt",
      "area_type": "train+val",
      "allocation_strategy": "5-crossval_vertical",
      "test_stripe": 1,
      "crossval_training": false,
      "n_training_samples": 20000
    }
    ],
  "output": {
    "suffix": "",
    "output_directory": "path/to/output_directory",
    "tboard_log_dir": "path/to/tboard_log_directory"
  }
}

Input Data

The datasets object defines a list of objects with mandatory and optional key-value pairs. Every object in the list describes a dataset, i.e., a (rectangular) geographic region for which an initial DSM, a corresponding ground truth DSM, and ortho-images are available. For training, ResDepth expects at least one training dataset and one validation dataset (i.e., the list is composed of two objects). Alternatively, ResDepth accepts one (or multiple) dataset(s) split into mutually exclusive stripes for training, validation, and testing (i.e., the list is composed of at least one object). Every dataset (object in the list) has the following mandatory key-value pairs:

The keys path_image_list, path_pairlist_training, and path_pairlist_validation are not required for ResDepth-0.

Additionally, the user can specify the following optional key-value pairs:

Warning: At runtime, all rasters (DSMs and ortho-images listed in imagelist.txt) are loaded to memory.

Output Settings

The name of the results folder consists of the code execution day and time and an optional suffix YYYY-MM-DD_HH-MM_${suffix}. The output object consists of the following key-value pairs:

The results directory is structured as follows:

${output_directory}/YYYY-MM-DD_HH-MM_${suffix}/
├── checkpoints
│   ├── Model_after_${save_model_rate}_epochs.pth
│   ├── Model_after_${2*save_model_rate}_epochs.pth
│   ├── ...
│   └── Model_best.pth
├── config.json
├── config.json.orig
├── DSM_normalization_parameters.p
├── Image_normalization_parameters.p
├── model_config.json
├── run.log
└── training.log

It stores the following files:

Switching between ResDepth-stereo, ResDepth-mono, and ResDepth-0

The content of the image pair list pairlist.txt determines whether ResDepth is trained using stereo information or a single image as guidance. To train ResDepth-0, neither the image list imagelist.txt nor the image pair list pairlist.txt have to be provided.

In addition to modifying the image pair list pairlist.txt, one must also adjust the network architecture accordingly. For ResDepth-stereo, add the following settings to your JSON configuration file:

  "model": {
    "input_channels": "geom-stereo"
  },
  "stereopair_settings": {
    "use_all_stereo_pairs": true,
    "permute_images_within_pair": true
  }

These are the default settings to train ResDepth-stereo, which generalizes across variations in acquisition geometry and imaging conditions. Ideally, the image pair list specified by path_pairlist_training comprises more than one image pair. Set permute_images_within_pair to False if the goal is to train a ResDepth-stereo prior tailored to the specific image characteristics and acquisition geometry of a single image pair.

For ResDepth-mono, please specify:

  "model": {
    "input_channels": "geom-mono"
  }

Similarly, for ResDepth-0, specify:

  "model": {
    "input_channels": "geom"
  }

Lastly, we also provide the option to train a U-Net variant that directly regresses a DSM from an ortho-rectified stereo pair1:

  "model": {
    "input_channels": "stereo",
    "outer_skip": false
  }

Changing the Default Model and Training Settings (Hyperparameters, Optimizer, Learning Rate Scheduler)

We provide a detailed description of all parameters and their default settings in ./lib/config.py. Most likely, the parameters depth and tile_size have to be fine-tuned if the spatial resolution of the DSMs deviates from 0.25 m (our setting). Add the parameters that you wish to modify to your JSON configuration file to overwrite the respective default setting.

Templates

We provide several template files in the directory ./configs/ to train ResDepth-0, ResDepth-mono, and ResDepth-stereo on a single dataset. Furthermore, we provide a template for the generalized ResDepth-stereo variant.

Configuration File: Evaluation

In the following, we show an example JSON configuration file config_test.json to evaluate ResDepth:

{
  "datasets": [
    {
      "name": "my_dataset",
      "raster_gt": "path/to/ground_truth_DSM.tif",
      "raster_in": "path/to/initial_DSM.tif",
      "path_image_list": "path/to/imagelist.txt",
      "path_pairlist": "path/to/pairlist_test.txt",
      "mask_ground_truth": "path/to/ground_truth_mask.tif",
      "mask_building": "path/to/building_mask.tif",
      "mask_water": "path/to/water_mask.tif",
      "mask_forest": "path/to/forest_mask.tif",
      "area_type": "test",
      "allocation_strategy": "5-crossval_vertical",
      "test_stripe": 1,
      "crossval_training": false
    }
  ],
  "model": {
    "weights": "${output_directory}/YYYY-MM-DD_HH-MM_${suffix}/checkpoints/Model_best.pth",
    "architecture": "${output_directory}/YYYY-MM-DD_HH-MM_${suffix}/model_config.json",
    "normalization_geom": "${output_directory}/YYYY-MM-DD_HH-MM_${suffix}/DSM_normalization_parameters.p",
    "normalization_image": "${output_directory}/YYYY-MM-DD_HH-MM_${suffix}/Image_normalization_parameters.p"
  },
  "general": {
    "tile_size": 256
  },
  "output": {
    "directory": "path/to/results/folder"
  }
}

The key-value pairs of the datasets object are equal to those used for the training configuration file. Additionally, the user can specify the file paths of ground truth masks, building masks, water masks, and forest masks (see Section 'Data Preparation' above). To evaluate cross-validation, set crossval_training to True and area_type to 'val'. Furthermore, use the same values for allocation_strategy and test_stripe as during training.

The model object specifies the model weights, the model architecture, and the parameters used for data normalization. The directory ${output_directory}/YYYY-MM-DD_HH-MM_${suffix} corresponds to the output folder of the training script train.py. Note that normalization_image is required for ResDepth-mono and ResDepth-stereo but not for ResDepth-0.

Finally, it is essential to set the same tile size tile_size as used during training.

Pretrained Models

We provide the checkpoints of two ResDepth-stereo models used to test geographical generalization between Berlin and Zurich (see Section Geographical Generalization Across Cities in the paper). Furthermore, we provide the checkpoint of our ResDepth-stereo multi-city model (see Section Multi-city Model in the paper).

To download these models, please run:

(resdepth) $ bash ./scripts/download_pretrained_models.sh

Additionally, we provide all the models used in the ablation studies (see Section Influence of Image Guidance in the paper).

To download these models, please run:

(resdepth) $ bash ./scripts/download_pretrained_models_ablations.sh

All the models will be downloaded and extracted to ./logs/pretrained_models/ and ./logs/pretrained_models_ablations/.

Demo

Due to the commercial nature of VHR imagery, we cannot share our complete datasets. In this demo, we thus provide two variants of DSM refinement using a single DSM patch of 256×256 pixels only (64×64 m in world coordinates).

Example Dataset

To download the demo, please run:

(resdepth) $ bash ./scripts/download_demo.sh

The data, configuration files, and pretrained models will be downloaded and extracted to ./demo/.

The data is stored in ./demo/data/:

./demo/data/
├── dsm
│   ├── DSM_Zurich_ground_truth.tif
│   └── DSM_Zurich_initial.tif
├── image_selection
│   ├── imagelist.txt
│   ├── pairlist_simple.txt
│   └── pairlist_generalized.txt
├── mask
│   └── Zurich_building_mask.tif
└── satellite_views
    ├── 15MAR17102414-P1BS-502980289080_01_P002.tif
    ├── 17OCT07103158-P1BS-501653123070_01_P002.tif
    ├── 18APR10105411-P1BS-502091706050_01_P002.tif
    ├── 18JAN29104120-P1BS-502980288020_01_P005.tif
    └── 18MAR24105605-P1BS-501687882040_02_P006.tif

We provide an initial DSM patch DSM_Zurich_initial.tif, the corresponding ground truth DSM patch DSM_Zurich_ground_truth.tif, and a building mask Zurich_building_mask.tif. The example DSM is located in the test stripe of the region ZUR1 in Zurich. Furthermore, we provide five ortho-images stored in the subdirectory ./demo/data/satellite_views/. The subdirectory ./demo/data/image_selection/ comprises the image list imagelist.txt and two pair lists, where pairlist_simple.txt defines a single stereo pair and pairlist_generalized.txt two stereo pairs.

Pretrained Models

The pretrained model weights are stored in ./demo/models/:

./demo/models/
├── ResDepth-stereo
│   ├── checkpoints
│   │    └── Model_best.pth
│   ├── DSM_normalization_parameters.p
│   ├── Image_normalization_parameters.p
│   └── model_config.json
└── ResDepth-stereo_generalized
    ├── checkpoints
    │    └── Model_best.pth
    ├── DSM_normalization_parameters.p
    ├── Image_normalization_parameters.p
    └── model_config.json    

The subdirectory ./demo/models/ResDepth-stereo/ contains the pretrained weights of a ResDepth-stereo prior. We used the training stripes of the region ZUR1 in Zurich and the single stereo pair listed in ./demo/data/image_selection/pairlist_simple.txt for training.

Similarly, the subdirectory ./demo/models/ResDepth-stereo_generalized/ specifies the pretrained weights of a ResDepth-stereo prior that has learned to generalize to unseen viewing directions, lighting conditions, and urban styles. For training, we sampled training data from all training stripes in ZUR1, ZUR2, and ZUR3 and leveraged multiple stereo pairs that are different from the ones listed in ./demo/data/image_selection/pairlist_generalized.txt.

Running the Demo

Use the available test configuration ./demo/configs/config_simple.json to refine the initial DSM using the simple ResDepth-stereo prior (tailored to the specific image characteristics and acquisition geometry of the training image pair):

(resdepth) $ python test.py ./demo/configs/config_simple.json

To refine the initial DSM using the generalized ResDepth-stereo prior, use the test configuration ./demo/configs/config_generalized.json and run:

(resdepth) $ python test.py ./demo/configs/config_generalized.json

Results

The refined DSMs will be stored in ./demo/results/ResDepth-stereo/ and <nobr>./demo/results/ResDepth-stereo_generalized/</nobr>. For visualization, you can use Python visualization packages or off-the-shelf DSM visualization software such as Quick Terrain Reader or planlauf/TERRAIN.

As reference, ./demo/results_expected/ stores the expected results.

Preview of the results: Demo

Contact

If you run into any problems or have questions, please contact Corinne Stucker.

Citations

If you find this code or work helpful, please cite:

@article{stucker2022resdepth,
title = {{ResDepth}: A deep residual prior for 3D reconstruction from high-resolution satellite images},
author = {Stucker, Corinne and Schindler, Konrad},
journal = {ISPRS Journal of Photogrammetry and Remote Sensing},
volume = {183},
pages = {560--580},
year = {2022}
}

Footnotes

  1. C. Stucker, K. Schindler, ResDepth: Learned residual stereo reconstruction, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2020, pp. 707-716.