Home

Awesome

Tests

PANGAEA: A Global and Inclusive Benchmark for Geospatial Foundation Models

🔥 The pre-print is out!

📚 Introduction

While geospatial foundation models (GFMs) have proliferated rapidly, their evaluations remain inconsistent and narrow. Existing works often utilize suboptimal downstream datasets (e.g., EuroSAT) and tasks (e.g., land cover classification), which constrain comparability and real-world usability. Additionally, a lack of diversity in evaluation protocols, including image resolution and sensor types, further complicates the extensive assessments of GFM performance.

To bridge this gap, we propose a standardized evaluation protocol that incorporates a wide-ranging selection of datasets, tasks, resolutions, and sensor types, establishing a robust and widely applicable benchmark for GFMs.

<img src=".github/geofmbenchmark.png" alt="PANGAEA: a global and inclusive benchmark for geospatial foundation models" width="90%">

In this repo, you can find the code to benchmark GFMs. For the moment we included several GFMs that present different approaches. We look forward to adding new models and datasets.

For the moment, we support the following models:

PaperGitHubKeywords
SSL4EOS12SSL4EO-S12: A Large-Scale Multi-Modal, Multi-Temporal <br> Dataset for Self-Supervised Learning in Earth ObservationlinkDINO, MAE, DATA2VEC, MOCO
Scale-MAEScale-MAE: A Scale-Aware Masked Autoencoder for Multiscale Geospatial Representation LearninglinkMasked Autoencoders, Multiscale
SatlasNetSatlasPretrain: A Large-Scale Dataset for Remote Sensing Image UnderstandinglinkSupervised, Multi-temporal
GFMTowards Geospatial Foundation Models via Continual PretraininglinkSwin, Continual Pre-training
SpectralGPTSpectralGPT: Spectral Remote Sensing Foundation ModellinkMAE, Multi-spectral
DOFANeural Plasticity-Inspired Multimodal Foundation Model for Earth ObservationlinkMAE, Dynamic bands
CROMACROMA: Remote Sensing Representations with Contrastive Radar-Optical Masked AutoencoderslinkContrastive Learning, MAE
PrithviFoundation Models for Generalist Geospatial Artificial IntelligencelinkMAE, Multi-temporal
RemoteCLIPRemoteCLIP: A Vision Language Foundation Model for Remote SensinglinkContrastive Learning

And the following datasets:

DownloadDomainTaskSensorsLocation
HLS Burn ScarslinkWildfireSemantic SegmentationHLS (Harmonized Landsat Sentinel-2)USA
MADOSlinkMarineSemantic SegmentationS2Global
PASTIS-RlinkAgricultureSemantic SegmentationS1, S2, SPOT-6France
Sen1Floods11linkFloodSemantic SegmentationS1, S2Global
xView2linkHADRChange DetectionMaxarGlobal
Five Billion Pixelsoriginal version <br> (custom version coming soon)(Urban) Land CoverSemantic SegmentationGaofen-2China
DynamicEarthNetlink(Urban) Land CoverSemantic SegmentationPlanetFusionGlobal
CropTypeMapping-South SudanlinkAgricultureSemantic SegmentationS1, S2, PlanetSouth Sudan
SpaceNet 7linkUrbanChange detection/ <br> Semantic SegmentationPlanetGlobal
AI4SmallFarmslinkAgricultureSemantic segmentationS2Cambodia/Vietnam
BioMassterslinkForestRegressionS1, S2Finland

The repository supports the following tasks using geospatial (foundation) models:

It is also possible to train some supervised baselines, based on UNet and ViT.

🗺️ Datasets details

Please refer to Dataset Guide to understand the processing requirements and commands specific to each dataset.

If you want to fast-prototype your model, maybe you want to run fast experiments on smaller datasets. We suggest starting with MADOS, HLSBurnScars, SpaceNet7 and Sen1Floods11 and AI4SmallFarms. They offer good diversity in satellites and domains. In the future, we will release stratified subsets for each dataset to facilitate fast prototyping across all datasets.

🛠️ Setup

Clone the repository:

git clone https://github.com/VMarsocci/pangaea-bench.git
cd pangaea-bench

Dependencies

We provide several ways to install the dependencies.

  1. Using either Conda or Mamba:

    conda env create -f environment.yaml
    conda activate pangaea-bench
    

    Optional: install Mamba for faster resolution times

    wget https://github.com/conda-forge/miniforge/releases/download/24.3.0-0/Mambaforge-24.3.0-0-Linux-x86_64.sh
    sh ./Mambaforge-24.3.0-0-Linux-x86_64.sh
    
    mamba env create -f environment.yaml
    mamba activate pangaea-bench
    
  2. Using pip, create a Python native virtual environment and install dependencies into it:

    export PANGAEA_PATH=/path/to/venv/pangaea-bench # change this
    python3 -m venv ${PANGAEA_PATH}
    source ${PANGAEA_PATH}/bin/activate
    
    pip install -r requirements.txt
    

Then install the code repository as a development package

pip install --no-build-isolation --no-deps -e .

🏋️ Training

To run experiments, please refer to configs/train.yaml. In it, in addition to some basic info about training (e.g. finetune for fine-tuning also the encoder, limited_label_train to train the model on a stratified subset of labels, num_workers, batch_size and so on), there are 5 different basic configs:

Other 3 configs are used to set other training parameters:

We provide several examples of command lines to initialize different training tasks on single GPU.

Please note:

💻 Decoder Finetuning

Single Temporal Semantic Segmentation

Take HLSBurnScars dataset, RemoteCLIP Encoder and Upernet Segmentation Decoder as example:

torchrun --nnodes=1 --nproc_per_node=1 pangaea/run.py \
   --config-name=train \
   dataset=hlsburnscars \
   encoder=remoteclip \
   decoder=seg_upernet\
   preprocessing=seg_default \
   criterion=cross_entropy \
   task=segmentation

If you want to overwrite some parameters (e.g. turn off wandbe, change batch size and the path to the dataset, and use 50% stratified sampled subset for training):

torchrun --nnodes=1 --nproc_per_node=1 pangaea/run.py \
   --config-name=train \
   dataset=hlsburnscars \
   encoder=remoteclip \
   decoder=seg_upernet\
   preprocessing=seg_default \
   criterion=cross_entropy \
   task=segmentation \
   dataset.root_path= /path/to/the/dataset/hlsburnscars \
   batch_size=16 \
   use_wandb=False \
   limited_label_train=0.5 \
   limited_label_strategy=stratified

Multi-Temporal Semantic Segmentation

An example of using SSL4EO-DINO on CropTypeMapping is as below

torchrun --nnodes=1 --nproc_per_node=1 pangaea/run.py \
   --config-name=train \
   dataset=croptypemapping \
   encoder=ssl4eo_dino \
   decoder=seg_upernet_mt_ltae \
   preprocessing=seg_resize \
   criterion=cross_entropy \
   task=segmentation

To use SatlasNet encoder, the configs/encoder/satlasnet_mi.yaml is required

torchrun --nnodes=1 --nproc_per_node=1 pangaea/run.py \
   --config-name=train \
   dataset=croptypemapping \
   encoder=satlasnet_mi \
   decoder=seg_upernet_mt_ltae \
   preprocessing=seg_resize \
   criterion=cross_entropy \
   task=segmentation

To overwrite parameters, please check the Single Temporal Semantic Segmentation example.

Change Detection

One of the change detection decoder should be used: configs/decoder/seg_siamupernet_conc.yaml employs feature concatenation strategy while configs/decoder/seg_siamupernet_diff.yaml uses feature differencing strategy. For example, Prithvi encoder on xView2:

torchrun --nnodes=1 --nproc_per_node=1 pangaea/run.py \
   --config-name=train \
   dataset=xview2 \
   encoder=prithvi \
   decoder=seg_siamupernet_conc \
   preprocessing=seg_default \
   criterion=cross_entropy \
   task=change_detection

To overwrite parameters, please check the Single Temporal Semantic Segmentation example.

Single Temporal Regression

The regression decoder (e.g. configs/decoder/reg_upernet.yaml) and the regression task (e.g. configs/task/regression.yaml) configs should be used. E.g. Prithvi encoder on BioMassters

torchrun --nnodes=1 --nproc_per_node=1 pangaea/run.py \
   --config-name=train \
   dataset=biomassters \
   encoder=prithvi \
   decoder=reg_upernet \
   preprocessing=reg_default \
   criterion=mse \
   task=regression

To use SatlasNet encoder, the configs/encoder/satlasnet_si.yaml is required. To overwrite parameters, please check the Single Temporal Semantic Segmentation example.

Multi-Temporal Regression

The multi-temporal regression decoder (e.g. configs/decoder/reg_upernet_mt_ltae.yaml or configs/decoder/reg_upernet_mt_linear.yaml) and the regression task (e.g. configs/task/regression.yaml) configs should be used.

Take Prithvi encoder on BioMassters as example:

torchrun --nnodes=1 --nproc_per_node=1 pangaea/run.py \
   --config-name=train \
   dataset=biomassters \
   encoder=prithvi \
   decoder=reg_upernet_mt_ltae \
   preprocessing=reg_default \
   criterion=mse \
   task=regression

To use SatlasNet encoder, please refer to the multi-temporal semantic segmentation example. To overwrite parameters, please check the Single Temporal Semantic Segmentation example.

💻 End-to-end Finetuning

It is enough to add finetune=True to the command line.

For example, for single-temporal semantic segmentation:

torchrun --nnodes=1 --nproc_per_node=1 pangaea/run.py \
   --config-name=train \
   dataset=hlsburnscars \
   encoder=remoteclip \
   decoder=upernet\
   preprocessing=default \
   criterion=cross_entropy \
   task=segmentation \
   finetune=True

💻 Fully Supervised Baseline

The repo supports also training fully supervised baselines (i.e. UNet and ViT). To run these, follow the same command line rules as for other models. Keep in mind that setting finetune=True is necessary since this fully supervised approach trains the model from scratch. An example for single temporal semantic segmentation with UNet is provided (Sen1Floods11 dataset):

torchrun --nnodes=1 --nproc_per_node=1 pangaea/run.py \
   --config-name=train \
   dataset=sen1floods11 \
   encoder=unet_encoder \
   decoder=seg_unet \
   preprocessing=seg_default \
   criterion=cross_entropy \
   task=segmentation \
   finetune=True

There is no multi-temporal UNet supported.

An example for multi-temporal semantic segmentation with ViT is provided (CropTypeMapping-SS dataset):

torchrun --nnodes=1 --nproc_per_node=1 pangaea/run.py \
   --config-name=train \
   dataset=croptypemapping \
   encoder=vit_scratch \
   decoder=seg_upernet_mt_ltae \
   preprocessing=seg_default \
   criterion=cross_entropy \
   task=segmentation \
   task.evaluator.inference_mode=whole \
   finetune=true

🔧 Customization

Using Your Own Dataset

Refer to: Adding a new downstream dataset

Using Your Own Model

Refer to: Adding a new geospatial foundation model

🏃 Evaluation

An evaluation step is always run after the training.

If you want to just run an evaluation, indicate the ckpt_dir where the checkpoints and configurations are stored.

torchrun pangaea/run.py --config-name=test ckpt_dir=path_to_ckpt_dir

✏️ Contributing

We appreciate all contributions. Please refer to Contributing Guidelines.

⚠️ TO DO

🧮 Some results

<img src=".github/boxplot.png" alt="results" width="60%">

Check the paper for all the insights!

NOTE: if you want to benchmark the results of your model, for a fair comparison do not change the hparams in the configs! Soon we will publish also a set of "benchmark-configs", to support automatic running.

📝 Citation

If you find this work useful, please cite:

@misc{marsocci2024pangaeaglobalinclusivebenchmark,
      title={PANGAEA: A Global and Inclusive Benchmark for Geospatial Foundation Models}, 
      author={Valerio Marsocci and Yuru Jia and Georges Le Bellier and David Kerekes and Liang Zeng and Sebastian Hafner and Sebastian Gerard and Eric Brune and Ritu Yadav and Ali Shibli and Heng Fang and Yifang Ban and Maarten Vergauwen and Nicolas Audebert and Andrea Nascetti},
      year={2024},
      eprint={2412.04204},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2412.04204}, 
}