Home

Awesome

GitHub license DOI

Lausanne tree canopy

Tree canopy map of Lausanne at the 1m resolution obtained with DetecTree [1] from SWISSIMAGE 2016. The actual raster file (231.3 MB) can be downloaded from Zenodo.

Figure

Technical specifications

Citation

If you use this dataset, the source, i.e., SWISSIMAGE 2016 must be acknowledged. Additionally, a citation to DetecTree would certainly be appreciated. Note that DetecTree is based on the methods of Yang et al. [2], therefore it seems fair to reference their work too. An example citation in an academic paper might read as follows:

The tree canopy dataset for the agglomeration of Lausanne has been obtained from the SWISSIMAGE 2016 aerial imagery dataset with the Python library DetecTree (Bosch, 2020), which is based on the approach of Yang et al. (2009).

Steps to reproduce

In order to reproduce this workflow, you need to have access to the SWISSIMAGE 2016 dataset, e.g., in GeoVITe. If so, you can download it for the following extent (in CH1903/LV03 coordinates):

E: 524843 546153
N: 148578 159128

and place it in the data/raw/swissimage.tif path of this repository.

1. Split SWISSIMAGE TIF into tiles

In order to obtain the train/test split of the dataset, the data/raw/swissimage.tif must be split into a set of tiles, which can be done as in:

make swissimage_tiles

The generated image tiles will be stored in the data/interim/swissimage-tiles directory.

2. Compute the train/test split

This step is optional, since this repository already includes a CSV train/test split. If you still want to generate your own train/test split, you might do so as in:

make swissimage_tiles

This will generate a CSV with the train/test split data frame at data/interim/swissimage-tiles/split.csv. Note that since train/test split uses a randomized k-Means algorithm, the generated CSV file will likely be different from the one commited in this repository. See the detectree-example repository for more details.

3. Make the response tiles

The list of tiles for which a ground-truth mask must be provided manually can be obtained with Python as follows:

import pandas as pd

split_df = pd.read_csv('path/to/split.csv', index_col=0)
split_df[split_df['train']]['img_filepath']

The ground truth masks can be generated by an image editing software such as GIMP, and must be saved in grayscale mode with the same file name to the data/interim/response-tiles directory. Note: it is very important that the ground truth mask consists of two and only two pixel values. By default, DetecTree will process pixel values of 255 (white) as trees, and pixel values of 0 (black) as non-trees (although this can be customized by means of the tree_val and nontree_val arguments of the Classifier class. To ensure that the ground truth masks are well suited for DetecTree, you might use the following Python snippet:

import numpy as np
import rasterio as rio

with rio.open("path/to/response-tile.tif") as src:
    print(np.unique(src.read()))

and ensure that it outputs only the tree and non-tree pixel values (e.g., 255 and 0 respectively).

4. Train the classifier (one for each cluster of tiles)

With the proper ground truth masks stored in the data/interim/response-tiles directory, the classifiers might be trained as follows:

make train_classifiers

which will train the classifiers and dump them the models directory.

5. Classify the tiles

In order to use the trained classifiers to detect the tree/non-tree pixels in the tiles, you can do:

make classify_tiles

which will classify the tiles at scale with Dask and dump them into data/interim/classified-tiles.

6. Mosaic the classified tiles into a single TIF file

You might assemble the classified tiles into a single TIF file as in:

make tree_canopy_map

which will dump the final output in data/processed/tree-canopy.tif in Swiss CH1903+/LV95 (EPSG:2056). Note that all the TIF files except for the latter are in Swiss CH1903/LV03 coordinates (EPSG:21781). See the Makefile for more details.

7. Validation

The produced tree canopy map will be validated by computing the classification accuracy in a randomly-sampled tile that has not been used for training, e.g.:

import pandas as pd

split_df = pd.read_csv('path/to/split.csv', index_col=0)
split_df[~split_df['train']]['img_filepath'].sample(1)

which will output the path to a randomly-sampled tile. Like with the response tiles, a ground truth mask for such tile can be generated by an image editing software such as GIMP, and must be saved in grayscale mode with the same file name to the data/interim/validation-tiles directory. Then, the following command:

make confusion_df

will generate a confusion data frame with the proportion of tree and non-tree pixels that have been classified correctly. The trace of such confusion data frame corresponds to the estimated classification accuracy.

Acknowledgments

References

  1. Bosch, M. (2020). Detectree: Tree detection from aerial imagery in Python. Journal of Open Source Software (under review).

  2. Yang, L., Wu, X., Praun, E., & Ma, X. (2009). Tree detection from aerial imagery. In Proceedings of the 17th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems (pp. 131-137). ACM.