Home

Awesome

The Canadian Cropland Dataset

This repository houses a novel patch-based dataset compiled using optical satellite images of Canadian agricultural croplands retrieved from Sentinel-2.

1 - Dataset Description

This repository contains instructions and code for use with the novel patch-based dataset, the Canadian Cropland Dataset, inspired by the Eurosat dataset. It is compiled using optical satellite images of Canadian agricultural croplands retrieved from Sentinel-2. A total of 78,536 high-resolution geo-referenced images (Figure 1) of 10 main crop types over 5 months (June-October) and 4 years (2017-2020) were extracted using Google Earth Engine (GEE) and were automatically labelled with the Canadian Crop Inventory. Each image contains 12 main spectral bands as well as a selection of bands corresponding to vegetation indices (GNDVI, NDVI, NDVI45, OSAVI and PSRI). Images were collected using a list of 6,633 geographical points of Canadian agricultural fields.

The dataset can be accessed through this Google Drive link. In the upcoming months, we will be hosting it on a website at Université du Québec à Montreal. The Drive contains the preprocessed and cleaned images from all years in train/validation/test splits. Each split is identical and contains the same points across the image type (RGB, GNDVI, etc.). The paper related to this dataset can be found on ArXiv: https://arxiv.org/abs/2306.00114.

dataset overview

Figure 1: An overview of sample patches of the crop classes in the dataset. The images measure 64 x 64 pixels and have a spatial resolution of 10 m/pixel.

2 - Running the Software

Python version

Other libraries

You can install these required libraries using the conda install -c conda-forge --library_name command:

conda install -c conda-forge earthengine-api
conda install -c conda-forge keras-gpu  
conda install -c conda-forge imutils
conda install -c conda-forge scikit-learn
conda install -c conda-forge scikit-image
conda install -c conda-forge numpy
conda install -c conda-forge opencv
conda install -c conda-forge pandas
conda install -c conda-forge pillow
conda install -c conda-forge jupyterlab
conda install -c conda-forge matplotlib
conda install -c conda-forge rasterio

3 - Description of Repository

Data Cleaning

Contains functions for manipulating images and moving files around to create label specific directories and training/validation/test sets. We use a Java application for removing cloudy/noisy images from the collection (NOT CONTAINED IN THIS REPOSITORY).

rapid_tags

Figure 2: A screenshot of the data cleaning software developed in Java. The dataset is manually curated by manually removing cloudy images, noisy images and images with missing pixels

Data Collection

Contains multiple python scripts for downloading the Sentinel-2 images for each point in the sample .csv file. It also contains some data visualization code.

Dataset Statistics

Contains spreadsheets and figures depicting the distribution of the images within the dataset.

Earth Engine

Contains the JavaScript code used to collect points of agricultural fields all accross Canada from the months of June 2017 to October 2020 (Figure 2). The code can be visualized directly in GEE using this link. Note that you must be registered with an activated GEE account to view the script and run it on the cloud.

geographical points

Figure 3: Map representing an overview of the selected geographical locations used in the Canadian Cropland Dataset. Markers are randomly chosen fields and are color-coded by the 2019 crop types.

Machine Learning

Contains python code for the deep learning benchmark models (ResNet-50, LRCN, 3D-CNN, etc).

4 - References

If you have used this dataset, please consider citing our paper:

APA Style

Boatswain Jacques, A. A., Diallo, A. B., & Lord, E. (2023). The Canadian Cropland Dataset: A New Land Cover Dataset for Multitemporal Deep Learning Classification in Agriculture. arXiv [Cs.CV]. Retrieved from http://arxiv.org/abs/2306.00114

Bibtex

@misc{boatswainjacques2023canadian,
      title={The Canadian Cropland Dataset: A New Land Cover Dataset for Multitemporal Deep Learning Classification in Agriculture}, 
      author={Amanda A. {Boatswain Jacques} and {Abdoulaye Baniré} Diallo and Etienne Lord},
      year={2023},
      eprint={2306.00114},
      archivePrefix={arXiv},
      primaryClass={cs.CV}}

5 - Contact

For any questions or concerns regarding this repository, please contact boatswain_jacques.amanda@courrier.uqam.ca.