Home

Awesome

Prithvi-EO-2.0: A Versatile Multi-Temporal Foundation Model for Earth Observation Applications

<p align="center"> <img src="https://i.imgur.com/waxVImv.png" alt="Oryx Prithvi-EO-2.0"> </p>

Daniela Szwarcman, Sujit Roy, Paolo Fraccaro, Þorsteinn Elí Gíslason, Benedikt Blumenstiel, Rinki Ghosal, Pedro Henrique de Oliveira, João Lucas de Sousa Almeida, Rocco Sedona, Yanghui Kang, Srija Chakraborty, Sizhe Wang, Ankur Kumar, Myscon Truong, Denys Godwin, Hyunho Lee, Chia-Yu Hsu, Ata Akbari Asanjan, Besart Mujeci, Trevor Keenan, Paulo Arévolo, Wenwen Li, Hamed Alemohammad, Pontus Olofsson, Christopher Hain, Robert Kennedy, Bianca Zadrozny, Gabriele Cavallaro, Campbell Watson, Manil Maskey, Rahul Ramachandran, Juan Bernabe Moreno

**IBM Research, NASA Marshall Space Flight Center, The University of Alabama in Huntsville, University of Iceland, Jülich Supercomputing Centre, Virginia Tech, Arizona State University, Oregon State University, Clark University, Boston University, University of California, Berkeley, Earth from Space Institute **

Website paper

This repository contains code and examples based on the TerraTorch library for fine-tuning Prithvi-EO-2.0, a more powerful version of the foundation model Prithvi developed by IBM and NASA. Trained on 4.2M global time series samples on the JUWELS HPC system at the Jülich Supercomputing Centre (JSC) using NASA’s Harmonized Landsat and Sentinel data at 30m resolution, it offers significant improvements over its predecessor.

📢 Latest Updates

Architecture Overview

Prithvi-EO-2.0 is based on the ViT architecture, pretrained using a masked autoencoder (MAE) approach, with two major modifications as shown in the figure below.

model_architecture_v2

First, we replaced the 2D patch embeddings and 2D positional embeddings with 3D versions to support inputs with spatiotemporal characteristics, i.e., a sequence of T images of size (H, W). Our 3D patch embeddings consist of a 3D convolutional layer, dividing the 3D input into non-overlapping cubes of size (t, h, w) for time, height, and width dimensions, respectively. For the 3D positional encodings, we first generate 1D sin/cos encodings individually for each dimension and then combine them together into a single, 3D positional encoding.

Second, we considered geolocation (center latitude and longitude) and date of acquisition (year and day-of-year ranging 1-365) in pretraining. Both encoder and decoder receive time and location information for each sample and encodes them independently using 2D sin/cos encoding. They are added to the embedded tokens via a weighted sum with learned weights: one for time and one for location and separate weights for encoder and decoder. Since this metadata is often not available, we added a drop mechanism during pretraining that randomly drops the geolocation and/or the temporal data to help the model learn how to handle the absence of this information.

Pre-trained Models

ModelDetailsWeights
Prithvi-EO-2.0-300MPretrained 300M parameter modelhttps://huggingface.co/ibm-nasa-geospatial/Prithvi-EO-2.0-300M
Prithvi-EO-2.0-300M-TLPretrained 300M parameter model with temporal and location embeddingshttps://huggingface.co/ibm-nasa-geospatial/Prithvi-EO-2.0-300M-TL
Prithvi-EO-2.0-600MPretrained 600M parameter modelhttps://huggingface.co/ibm-nasa-geospatial/Prithvi-EO-2.0-600M
Prithvi-EO-2.0-600M-TLPretrained 600M parameter model with temporal and location embeddingshttps://huggingface.co/ibm-nasa-geospatial/Prithvi-EO-2.0-600M-TL

Benchmarking

We validated the Prithvi-EO-2.0 models through extensive experiments using GEO-Bench, the most popular and rigorous benchmark framework available for Earth Observation foundation models. Prithvi-EO-2.0-600M-TL outperforms the previous Prithvi-EO model by 8% across a range of tasks. It also outperforms six other geospatial foundation models when benchmarked on remote sensing tasks from different domains and resolutions (i.e. from 0.1m to 15m).

<img src="https://github.com/user-attachments/assets/b7e49289-810c-4bbc-b127-a361427a259a" width="750" height="450">

Fine-tuning

We have fined-tuned Prithvi-EO-2.0 for downstream tasks in different domains of interest using TerraTorch (see instructions on how to get started here). Below we provide a list of the downstream tasks, along with links to the datasets, sample TerraTorch configuration files (or custom code, in the case of Gross Primary Product) and sample notebooks for fine-tuning.

Sample configs

TaskDatasetTerraTorch Config/Code
Flood Detectionhttps://github.com/cloudtostreet/Sen1Floods11sen1floods11.yaml
Wildfire Scar Detectionhttps://huggingface.co/datasets/ibm-nasa-geospatial/hls_burn_scarsfirescars.yaml
Burn Scar Intensityhttps://huggingface.co/datasets/ibm-nasa-geospatial/burn_intensityburnintensity.yaml
Landslide Detectionhttps://huggingface.co/datasets/ibm-nasa-geospatial/Landslide4senselandslide.yaml
Multi-temporal Crop Segmentation (US)https://huggingface.co/datasets/ibm-nasa-geospatial/multi-temporal-crop-classificationmulticrop.yaml
Multi-temporal Land Cover and Crop Classification (Europe)https://datapub.fz-juelich.de/sen4map/sen4map_land-cover.yaml sen4map_crops.yaml
Above Ground Biomass Estimationhttps://huggingface.co/datasets/ibm-nasa-geospatial/BioMasstersbiomassters.yaml
<!--- |Gross Primary Productivity Estimation|[https://huggingface.co/datasets/ibm-nasa-geospatial/hls_merra2_gppFlux](https://huggingface.co/datasets/ibm-nasa-geospatial/hls_merra2_gppFlux)|[carbon_flux](https://github.com/NASA-IMPACT/Prithvi-EO-2.0/tree/main/examples/carbon_flux)| --->

Sample Fine-tuning Notebooks

<!--- * [Gross Primary Productivity Estimation](https://github.com/NASA-IMPACT/Prithvi-EO-2.0/blob/refactory/examples/carbon_flux/main_flux_finetune_baselines_trainer.ipynb) --->