Awesome

Prithvi-EO-2.0: A Versatile Multi-Temporal Foundation Model for Earth Observation Applications

Daniela Szwarcman, Sujit Roy, Paolo Fraccaro, Þorsteinn Elí Gíslason, Benedikt Blumenstiel, Rinki Ghosal, Pedro Henrique de Oliveira, João Lucas de Sousa Almeida, Rocco Sedona, Yanghui Kang, Srija Chakraborty, Sizhe Wang, Ankur Kumar, Myscon Truong, Denys Godwin, Hyunho Lee, Chia-Yu Hsu, Ata Akbari Asanjan, Besart Mujeci, Trevor Keenan, Paulo Arévolo, Wenwen Li, Hamed Alemohammad, Pontus Olofsson, Christopher Hain, Robert Kennedy, Bianca Zadrozny, Gabriele Cavallaro, Campbell Watson, Manil Maskey, Rahul Ramachandran, Juan Bernabe Moreno

IBM Research, NASA Marshall Space Flight Center, The University of Alabama in Huntsville, University of Iceland, Jülich Supercomputing Centre, Virginia Tech, Arizona State University, Oregon State University, Clark University, Boston University, University of California, Berkeley, Earth from Space Institute

This repository contains code and examples based on the TerraTorch library for fine-tuning Prithvi-EO-2.0, a more powerful version of the foundation model Prithvi developed by IBM and NASA. Trained on 4.2M global time series samples on the JUWELS HPC system at the Jülich Supercomputing Centre (JSC) using NASA’s Harmonized Landsat and Sentinel data at 30m resolution, it offers significant improvements over its predecessor.

📢 Latest Updates

December 4, 2024: Prithvi-EO-2.0 pre-trained models and fine-tuning datasets released on Hugging Face.
December 5, 2024: Prithvi-EO-2.0 paper released on arxiv link. 🔥🔥

Architecture Overview

Prithvi-EO-2.0 is based on the ViT architecture, pretrained using a masked autoencoder (MAE) approach, with two major modifications as shown in the figure below.

model_architecture_v2

First, we replaced the 2D patch embeddings and 2D positional embeddings with 3D versions to support inputs with spatiotemporal characteristics, i.e., a sequence of T images of size (H, W). Our 3D patch embeddings consist of a 3D convolutional layer, dividing the 3D input into non-overlapping cubes of size (t, h, w) for time, height, and width dimensions, respectively. For the 3D positional encodings, we first generate 1D sin/cos encodings individually for each dimension and then combine them together into a single, 3D positional encoding.

Second, we considered geolocation (center latitude and longitude) and date of acquisition (year and day-of-year ranging 1-365) in pretraining. Both encoder and decoder receive time and location information for each sample and encodes them independently using 2D sin/cos encoding. They are added to the embedded tokens via a weighted sum with learned weights: one for time and one for location and separate weights for encoder and decoder. Since this metadata is often not available, we added a drop mechanism during pretraining that randomly drops the geolocation and/or the temporal data to help the model learn how to handle the absence of this information.

Pre-trained Models

Model	Details	Weights
Prithvi-EO-2.0-300M	Pretrained 300M parameter model	https://huggingface.co/ibm-nasa-geospatial/Prithvi-EO-2.0-300M
Prithvi-EO-2.0-300M-TL	Pretrained 300M parameter model with temporal and location embeddings	https://huggingface.co/ibm-nasa-geospatial/Prithvi-EO-2.0-300M-TL
Prithvi-EO-2.0-600M	Pretrained 600M parameter model	https://huggingface.co/ibm-nasa-geospatial/Prithvi-EO-2.0-600M
Prithvi-EO-2.0-600M-TL	Pretrained 600M parameter model with temporal and location embeddings	https://huggingface.co/ibm-nasa-geospatial/Prithvi-EO-2.0-600M-TL

Benchmarking

We validated the Prithvi-EO-2.0 models through extensive experiments using GEO-Bench, the most popular and rigorous benchmark framework available for Earth Observation foundation models. Prithvi-EO-2.0-600M-TL outperforms the previous Prithvi-EO model by 8% across a range of tasks. It also outperforms six other geospatial foundation models when benchmarked on remote sensing tasks from different domains and resolutions (i.e. from 0.1m to 15m).

Fine-tuning

We have fined-tuned Prithvi-EO-2.0 for downstream tasks in different domains of interest using TerraTorch (see instructions on how to get started here). Below we provide a list of the downstream tasks, along with links to the datasets, sample TerraTorch configuration files (or custom code, in the case of Gross Primary Product) and sample notebooks for fine-tuning.

Sample configs

Task	Dataset	TerraTorch Config/Code
Flood Detection	https://github.com/cloudtostreet/Sen1Floods11	sen1floods11.yaml
Wildfire Scar Detection	https://huggingface.co/datasets/ibm-nasa-geospatial/hls_burn_scars	firescars.yaml
Burn Scar Intensity	https://huggingface.co/datasets/ibm-nasa-geospatial/burn_intensity	burnintensity.yaml
Landslide Detection	https://huggingface.co/datasets/ibm-nasa-geospatial/Landslide4sense	landslide.yaml
Multi-temporal Crop Segmentation (US)	https://huggingface.co/datasets/ibm-nasa-geospatial/multi-temporal-crop-classification	multicrop.yaml
Multi-temporal Land Cover and Crop Classification (Europe)	https://datapub.fz-juelich.de/sen4map/	sen4map_land-cover.yaml sen4map_crops.yaml
Above Ground Biomass Estimation	https://huggingface.co/datasets/ibm-nasa-geospatial/BioMassters	biomassters.yaml

Sample Fine-tuning Notebooks