Home

Awesome

MMST-ViT: Climate Change-aware Crop Yield Prediction via Multi-Modal Spatial-Temporal Vision Transformer

Model Overview

mmst-vit-arch

This repository provides the official implementation of our proposed Multi-Modal Spatial-Temporal Vision Transformer (MMST-ViT), developed for predicting crop yields at the county level across the United States. It consists of a Multi-Modal Transformer, a Spatial Transformer, and a Temporal Transformer. The Multi-Modal Transformer leverages satellite images and meteorological data during the growing season to capture the direct impact of short-term weather variations on crop growth. The Spatial Transformer learns the high-resolution spatial dependency among counties for precise crop tracking. The Temporal Transformer captures the effects of long-term climate change on crops.

Datasets

The Tiny CropNet Dataset

This paper utilizes the Tiny CropNet dataset for accurate county-level crop yield predictions. Specifically, the Tiny CropNet dataset is composed of three modalities, i.e., Sentinel-2 Imagery, WRF-HRRR Computed Dataset, and USDA Crop Dataset, with their details outlined below:

Now, our Tiny CropNet dataset is available at HuggingFace Datasets.

The CropNet Dataset

As of March 2024, we are releasing our CropNet dataset at HuggingFace Datasets, with its tutorials available at Github. The CropNet dataset is an extension of the Tiny CropNet Dataset. It is composed of three modalities of data, i.e., Sentinel-2 Imagery, WRF-HRRR Computed Dataset, and USDA Crop Dataset, spanning from 2017 to 2022 (i.e., 6 years) across 2291 (over 3143 in total) U.S. counties, with its geographic distribution illustrated below. We also include the number of counties corresponding to each crop type in the USDA Crop Dataset (see the rightmost bar chart in the figure) since crop planting is highly geography-dependent.

CropNet

Requirements

Our model is based on the following libraries:

You can use the following instructions to install all the requirements:

# install requirements
pip install -r requirements.txt

The following code shows how to generate the JSON configuration file for the data loader (using soybean predictions as an example):

python config/build_config_soybean.py

Pre-training

method-pvt-simclr

The above figure illustrates the architecture of our proposed multi-modal self-supervised pre-training.

To pre-train MMST-ViT, please run the following command:

# pre-train
python main_pretrain_mmst_vit.py

Fine-tuning

To fine-tune MMST-ViT for crop yield predictions, use the following command:

# fine-tune
python main_finetune_mmst_vit.py

License

This repository is under the CC-BY-NC 4.0 license. Please refer to LICENSE for details.

Acknowledgment

This repository is based on the official implementation of PVT and MAE. We thank the authors for releasing the code.

Citation

@inproceedings{fudong:iccv23:mmst_vit,
    author    = {Lin, Fudong and Crawford, Summer and Guillot, Kaleb and Zhang, Yihe and Chen, Yan and Yuan, Xu and Chen, Li and Williams, Shelby and Minvielle, Robert
and Xiao, Xiangming and Gholson, Drew and Ashwell, Nicolas and Setiyono, Tri and Tubana, Brenda and Peng, Lu and Bayoumi, Magdy and Tzeng, Nian-Feng},
    title     = {MMST-ViT: Climate Change-aware Crop Yield Prediction via Multi-Modal Spatial-Temporal Vision Transformer},
    booktitle = {IEEE/CVF International Conference on Computer Vision (ICCV)},
    year      = {2023},
    pages     = {5774-5784}
}