Awesome

<p align=center>`Awesome Remote Sensing Foundation Models`</p>

:star2:A collection of papers, datasets, benchmarks, code, and pre-trained weights for Remote Sensing Foundation Models (RSFMs).

📢 Latest Updates

:fire::fire::fire: Last Updated on 2024.12.20 :fire::fire::fire:

2024.12.20: Update AnySat and EarthDial.
2024.12.14: Update RSUniVLM.
2024.12.11: :tada::tada::tada:Our survey paper (in Chinese version) has been published in Acta Geodaetica et Cartographica Sinica. Please check the link
2024.12.09: Update RingMoGPT.

Models
Datasets & Benchmarks
- Benchmarks for RSFMs
- (Large-scale) Pre-training Datasets
Others
- Relevant Projects
- Survey Papers

Remote Sensing <ins>Vision</ins> Foundation Models

Abbreviation	Title	Publication	Paper	Code & Weights
GeoKR	Geographical Knowledge-Driven Representation Learning for Remote Sensing Images	TGRS2021	GeoKR	link
-	Self-Supervised Learning of Remote Sensing Scene Representations Using Contrastive Multiview Coding	CVPRW2021	Paper	link
GASSL	Geography-Aware Self-Supervised Learning	ICCV2021	GASSL	link
SeCo	Seasonal Contrast: Unsupervised Pre-Training From Uncurated Remote Sensing Data	ICCV2021	SeCo	link
DINO-MM	Self-supervised Vision Transformers for Joint SAR-optical Representation Learning	IGARSS2022	DINO-MM	link
SatMAE	SatMAE: Pre-training Transformers for Temporal and Multi-Spectral Satellite Imagery	NeurIPS2022	SatMAE	link
RS-BYOL	Self-Supervised Learning for Invariant Representations From Multi-Spectral and SAR Images	JSTARS2022	RS-BYOL	null
GeCo	Geographical Supervision Correction for Remote Sensing Representation Learning	TGRS2022	GeCo	null
RingMo	RingMo: A remote sensing foundation model with masked image modeling	TGRS2022	RingMo	Code
RVSA	Advancing plain vision transformer toward remote sensing foundation model	TGRS2022	RVSA	link
RSP	An Empirical Study of Remote Sensing Pretraining	TGRS2022	RSP	link
MATTER	Self-Supervised Material and Texture Representation Learning for Remote Sensing Tasks	CVPR2022	MATTER	null
CSPT	Consecutive Pre-Training: A Knowledge Transfer Learning Strategy with Relevant Unlabeled Data for Remote Sensing Domain	RS2022	CSPT	link
-	Self-supervised Vision Transformers for Land-cover Segmentation and Classification	CVPRW2022	Paper	link
BFM	A billion-scale foundation model for remote sensing images	Arxiv2023	BFM	null
TOV	TOV: The original vision model for optical remote sensing image understanding via self-supervised learning	JSTARS2023	TOV	link
CMID	CMID: A Unified Self-Supervised Learning Framework for Remote Sensing Image Understanding	TGRS2023	CMID	link
RingMo-Sense	RingMo-Sense: Remote Sensing Foundation Model for Spatiotemporal Prediction via Spatiotemporal Evolution Disentangling	TGRS2023	RingMo-Sense	null
IaI-SimCLR	Multi-Modal Multi-Objective Contrastive Learning for Sentinel-1/2 Imagery	CVPRW2023	IaI-SimCLR	null
CACo	Change-Aware Sampling and Contrastive Learning for Satellite Images	CVPR2023	CACo	link
SatLas	SatlasPretrain: A Large-Scale Dataset for Remote Sensing Image Understanding	ICCV2023	SatLas	link
GFM	Towards Geospatial Foundation Models via Continual Pretraining	ICCV2023	GFM	link
Scale-MAE	Scale-MAE: A Scale-Aware Masked Autoencoder for Multiscale Geospatial Representation Learning	ICCV2023	Scale-MAE	link
DINO-MC	DINO-MC: Self-supervised Contrastive Learning for Remote Sensing Imagery with Multi-sized Local Crops	Arxiv2023	DINO-MC	link
CROMA	CROMA: Remote Sensing Representations with Contrastive Radar-Optical Masked Autoencoders	NeurIPS2023	CROMA	link
Cross-Scale MAE	Cross-Scale MAE: A Tale of Multiscale Exploitation in Remote Sensing	NeurIPS2023	Cross-Scale MAE	link
DeCUR	DeCUR: decoupling common & unique representations for multimodal self-supervision	ECCV2024	DeCUR	link
Presto	Lightweight, Pre-trained Transformers for Remote Sensing Timeseries	Arxiv2023	Presto	link
CtxMIM	CtxMIM: Context-Enhanced Masked Image Modeling for Remote Sensing Image Understanding	Arxiv2023	CtxMIM	null
FG-MAE	Feature Guided Masked Autoencoder for Self-supervised Learning in Remote Sensing	Arxiv2023	FG-MAE	link
Prithvi	Foundation Models for Generalist Geospatial Artificial Intelligence	Arxiv2023	Prithvi	link
RingMo-lite	RingMo-lite: A Remote Sensing Multi-task Lightweight Network with CNN-Transformer Hybrid Framework	Arxiv2023	RingMo-lite	null
-	A Self-Supervised Cross-Modal Remote Sensing Foundation Model with Multi-Domain Representation and Cross-Domain Fusion	IGARSS2023	Paper	null
EarthPT	EarthPT: a foundation model for Earth Observation	NeurIPS2023 CCAI workshop	EarthPT	link
USat	USat: A Unified Self-Supervised Encoder for Multi-Sensor Satellite Imagery	Arxiv2023	USat	link
FoMo-Bench	FoMo-Bench: a multi-modal, multi-scale and multi-task Forest Monitoring Benchmark for remote sensing foundation models	Arxiv2023	FoMo-Bench	link
AIEarth	Analytical Insight of Earth: A Cloud-Platform of Intelligent Computing for Geospatial Big Data	Arxiv2023	AIEarth	link
-	Self-Supervised Learning for SAR ATR with a Knowledge-Guided Predictive Architecture	Arxiv2023	Paper	link
Clay	Clay Foundation Model	-	null	link
Hydro	Hydro--A Foundation Model for Water in Satellite Imagery	-	null	link
U-BARN	Self-Supervised Spatio-Temporal Representation Learning of Satellite Image Time Series	JSTARS2024	Paper	link
GeRSP	Generic Knowledge Boosted Pre-training For Remote Sensing Images	Arxiv2024	GeRSP	GeRSP
SwiMDiff	SwiMDiff: Scene-wide Matching Contrastive Learning with Diffusion Constraint for Remote Sensing Image	Arxiv2024	SwiMDiff	null
OFA-Net	One for All: Toward Unified Foundation Models for Earth Vision	Arxiv2024	OFA-Net	null
SMLFR	Generative ConvNet Foundation Model With Sparse Modeling and Low-Frequency Reconstruction for Remote Sensing Image Interpretation	TGRS2024	SMLFR	link
SpectralGPT	SpectralGPT: Spectral Foundation Model	TPAMI2024	SpectralGPT	link
S2MAE	S2MAE: A Spatial-Spectral Pretraining Foundation Model for Spectral Remote Sensing Data	CVPR2024	S2MAE	null
SatMAE++	Rethinking Transformers Pre-training for Multi-Spectral Satellite Imagery	CVPR2024	SatMAE++	link
msGFM	Bridging Remote Sensors with Multisensor Geospatial Foundation Models	CVPR2024	msGFM	link
SkySense	SkySense: A Multi-Modal Remote Sensing Foundation Model Towards Universal Interpretation for Earth Observation Imagery	CVPR2024	SkySense	Targeted open-source
MTP	MTP: Advancing Remote Sensing Foundation Model via Multi-Task Pretraining	Arxiv2024	MTP	link
DOFA	Neural Plasticity-Inspired Foundation Model for Observing the Earth Crossing Modalities	Arxiv2024	DOFA	link
MMEarth	MMEarth: Exploring Multi-Modal Pretext Tasks For Geospatial Representation Learning	ECCV2024	MMEarth	link
SARATR-X	SARATR-X: A Foundation Model for Synthetic Aperture Radar Images Target Recognition	Arxiv2024	SARATR-X	link
LeMeViT	LeMeViT: Efficient Vision Transformer with Learnable Meta Tokens for Remote Sensing Image Interpretation	IJCAI2024	LeMeViT	link
SoftCon	Multi-Label Guided Soft Contrastive Learning for Efficient Earth Observation Pretraining	Arxiv2024	SoftCon	link
RS-DFM	RS-DFM: A Remote Sensing Distributed Foundation Model for Diverse Downstream Tasks	Arxiv2024	RS-DFM	null
A2-MAE	A2-MAE: A spatial-temporal-spectral unified remote sensing pre-training method based on anchor-aware masked autoencoder	Arxiv2024	A2-MAE	null
HyperSIGMA	HyperSIGMA: Hyperspectral Intelligence Comprehension Foundation Model	Arxiv2024	HyperSIGMA	link
SelectiveMAE	Scaling Efficient Masked Autoencoder Learning on Large Remote Sensing Dataset	Arxiv2024	SelectiveMAE	link
OmniSat	OmniSat: Self-Supervised Modality Fusion for Earth Observation	ECCV2024	OmniSat	link
MM-VSF	Towards a Knowledge guided Multimodal Foundation Model for Spatio-Temporal Remote Sensing Applications	Arxiv2024	MM-VSF	null
MA3E	Masked Angle-Aware Autoencoder for Remote Sensing Images	ECCV2024	MA3E	link
SpectralEarth	SpectralEarth: Training Hyperspectral Foundation Models at Scale	Arxiv2024	SpectralEarth	null
SenPa-MAE	SenPa-MAE: Sensor Parameter Aware Masked Autoencoder for Multi-Satellite Self-Supervised Pretraining	Arxiv2024	SenPa-MAE	link
RingMo-Aerial	RingMo-Aerial: An Aerial Remote Sensing Foundation Model With A Affine Transformation Contrastive Learning	Arxiv2024	RingMo-Aerial	null
SAR-JEPA	Predicting Gradient is Better: Exploring Self-Supervised Learning for SAR ATR with a Joint-Embedding Predictive Architecture	ISPRS JPRS2024	SAR-JEPA	link
PIS	Pretrain a Remote Sensing Foundation Model by Promoting Intra-instance Similarity	TGRS2024	PIS	link
OReole-FM	OReole-FM: successes and challenges toward billion-parameter foundation models for high-resolution satellite imagery	SIGSPATIAL2024	OReole-FM	null
PIEViT	Pattern Integration and Enhancement Vision Transformer for Self-supervised Learning in Remote Sensing	Arxiv2024	PIEViT	null
SatVision-TOA	SatVision-TOA: A Geospatial Foundation Model for Coarse-Resolution All-Sky Remote Sensing Imagery	Arxiv2024	SatVision-TOA	link
RS-vHeat	RS-vHeat: Heat Conduction Guided Efficient Remote Sensing Foundation Model	Arxiv2024	RS-vHeat	null
Prithvi-EO-2.0	Prithvi-EO-2.0: A Versatile Multi-Temporal Foundation Model for Earth Observation Applications	Arxiv2024	Prithvi-EO-2.0	link
AnySat	AnySat: An Earth Observation Model for Any Resolutions, Scales, and Modalities	Arxiv2024	AnySat	link

Remote Sensing <ins>Vision-Language</ins> Foundation Models

Abbreviation	Title	Publication	Paper	Code & Weights
RSGPT	RSGPT: A Remote Sensing Vision Language Model and Benchmark	Arxiv2023	RSGPT	link
RemoteCLIP	RemoteCLIP: A Vision Language Foundation Model for Remote Sensing	Arxiv2023	RemoteCLIP	link
GeoRSCLIP	RS5M: A Large Scale Vision-Language Dataset for Remote Sensing Vision-Language Foundation Model	Arxiv2023	GeoRSCLIP	link
GRAFT	Remote Sensing Vision-Language Foundation Models without Annotations via Ground Remote Alignment	ICLR2024	GRAFT	null
-	Charting New Territories: Exploring the Geographic and Geospatial Capabilities of Multimodal LLMs	Arxiv2023	Paper	link
-	Remote Sensing ChatGPT: Solving Remote Sensing Tasks with ChatGPT and Visual Models	Arxiv2024	Paper	link
SkyEyeGPT	SkyEyeGPT: Unifying Remote Sensing Vision-Language Tasks via Instruction Tuning with Large Language Model	Arxiv2024	Paper	link
EarthGPT	EarthGPT: A Universal Multi-modal Large Language Model for Multi-sensor Image Comprehension in Remote Sensing Domain	Arxiv2024	Paper	null
SkyCLIP	SkyScript: A Large and Semantically Diverse Vision-Language Dataset for Remote Sensing	AAAI2024	SkyCLIP	link
GeoChat	GeoChat: Grounded Large Vision-Language Model for Remote Sensing	CVPR2024	GeoChat	link
LHRS-Bot	LHRS-Bot: Empowering Remote Sensing with VGI-Enhanced Large Multimodal Language Model	Arxiv2024	Paper	link
H2RSVLM	H2RSVLM: Towards Helpful and Honest Remote Sensing Large Vision Language Model	Arxiv2024	Paper	link
RS-LLaVA	RS-LLaVA: Large Vision Language Model for Joint Captioning and Question Answering in Remote Sensing Imagery	RS2024	Paper	link
SkySenseGPT	SkySenseGPT: A Fine-Grained Instruction Tuning Dataset and Model for Remote Sensing Vision-Language Understanding	Arxiv2024	Paper	link
EarthMarker	EarthMarker: Visual Prompt Learning for Region-level and Point-level Remote Sensing Imagery Comprehension	Arxiv2024	Paper	link
GeoText	Towards Natural Language-Guided Drones: GeoText-1652 Benchmark with Spatial Relation Matching	ECCV2024	Paper	link
TEOChat	TEOChat: Large Language and Vision Assistant for Temporal Earth Observation Data	Arxiv2024	Paper	link
Aquila	Aquila: A Hierarchically Aligned Visual-Language Model for Enhanced Remote Sensing Image Comprehension	Arxiv2024	Paper	null
LHRS-Bot-Nova	LHRS-Bot-Nova: Improved Multimodal Large Language Model for Remote Sensing Vision-Language Interpretation	Arxiv2024	Paper	link
RSCLIP	Pushing the Limits of Vision-Language Models in Remote Sensing without Human Annotations	Arxiv2024	Paper	null
RingMoGPT	RingMoGPT: A Unified Remote Sensing Foundation Model for Vision, Language, and grounded tasks	TGRS2024	Paper	null
RSUniVLM	RSUniVLM: A Unified Vision Language Model for Remote Sensing via Granularity-oriented Mixture of Experts	Arxiv2024	Paper	link
EarthDial	EarthDial: Turning Multi-sensory Earth Observations to Interactive Dialogues	Arxiv2024	Paper	null

Remote Sensing <ins>Generative</ins> Foundation Models

Abbreviation	Title	Publication	Paper	Code & Weights
Seg2Sat	Seg2Sat - Segmentation to aerial view using pretrained diffuser models	Github	null	link
-	Generate Your Own Scotland: Satellite Image Generation Conditioned on Maps	NeurIPSW2023	Paper	link
GeoRSSD	RS5M: A Large Scale Vision-Language Dataset for Remote Sensing Vision-Language Foundation Model	Arxiv2023	Paper	link
DiffusionSat	DiffusionSat: A Generative Foundation Model for Satellite Imagery	ICLR2024	DiffusionSat	link
CRS-Diff	CRS-Diff: Controllable Generative Remote Sensing Foundation Model	Arxiv2024	Paper	null
MetaEarth	MetaEarth: A Generative Foundation Model for Global-Scale Remote Sensing Image Generation	Arxiv2024	Paper	link
CRS-Diff	CRS-Diff: Controllable Generative Remote Sensing Foundation Model	Arxiv2024	Paper	link
HSIGene	HSIGene: A Foundation Model For Hyperspectral Image Generation	Arxiv2024	Paper	link

Remote Sensing <ins>Vision-Location</ins> Foundation Models

Abbreviation	Title	Publication	Paper	Code & Weights
CSP	CSP: Self-Supervised Contrastive Spatial Pre-Training for Geospatial-Visual Representations	ICML2023	CSP	link
GeoCLIP	GeoCLIP: Clip-Inspired Alignment between Locations and Images for Effective Worldwide Geo-localization	NeurIPS2023	GeoCLIP	link
SatCLIP	SatCLIP: Global, General-Purpose Location Embeddings with Satellite Imagery	Arxiv2023	SatCLIP	link

Remote Sensing <ins>Vision-Audio</ins> Foundation Models

Abbreviation	Title	Publication	Paper	Code & Weights
-	Self-supervised audiovisual representation learning for remote sensing data	JAG2022	Paper	link

Remote Sensing <ins>Task-specific</ins> Foundation Models

Abbreviation	Title	Publication	Paper	Code & Weights	Task
SS-MAE	SS-MAE: Spatial-Spectral Masked Auto-Encoder for Mulit-Source Remote Sensing Image Classification	TGRS2023	Paper	link	Image Classification
-	A Decoupling Paradigm With Prompt Learning for Remote Sensing Image Change Captioning	TGRS2023	Paper	link	Remote Sensing Image Change Captioning
TTP	Time Travelling Pixels: Bitemporal Features Integration with Foundation Model for Remote Sensing Image Change Detection	Arxiv2023	Paper	link	Change Detection
CSMAE	Exploring Masked Autoencoders for Sensor-Agnostic Image Retrieval in Remote Sensing	Arxiv2024	Paper	link	Image Retrieval
RSPrompter	RSPrompter: Learning to Prompt for Remote Sensing Instance Segmentation based on Visual Foundation Model	TGRS2024	Paper	link	Instance Segmentation
BAN	A New Learning Paradigm for Foundation Model-based Remote Sensing Change Detection	TGRS2024	Paper	link	Change Detection
-	Change Detection Between Optical Remote Sensing Imagery and Map Data via Segment Anything Model (SAM)	Arxiv2024	Paper	null	Change Detection (Optical & OSM data)
AnyChange	Segment Any Change	Arxiv2024	Paper	null	Zero-shot Change Detection
RS-CapRet	Large Language Models for Captioning and Retrieving Remote Sensing Images	Arxiv2024	Paper	null	Image Caption & Text-image Retrieval
-	Task Specific Pretraining with Noisy Labels for Remote sensing Image Segmentation	Arxiv2024	Paper	null	Image Segmentation (Noisy labels)
RSBuilding	RSBuilding: Towards General Remote Sensing Image Building Extraction and Change Detection with Foundation Model	Arxiv2024	Paper	link	Building Extraction and Change Detection
SAM-Road	Segment Anything Model for Road Network Graph Extraction	Arxiv2024	Paper	link	Road Extraction
CrossEarth	CrossEarth: Geospatial Vision Foundation Model for Domain Generalizable Remote Sensing Semantic Segmentation	Arxiv2024	Paper	link	Domain Generalizable Remote Sensing Semantic Segmentation
GeoGround	GeoGround: A Unified Large Vision-Language Model for Remote Sensing Visual Grounding	Arxiv2024	Paper	link	Remote Sensing Visual Grounding

Remote Sensing Agents

Abbreviation	Title	Publication	Paper	Code & Weights
GeoLLM-QA	Evaluating Tool-Augmented Agents in Remote Sensing Platforms	ICLR 2024 ML4RS Workshop	Paper	null
RS-Agent	RS-Agent: Automating Remote Sensing Tasks through Intelligent Agents	Arxiv2024	Paper	null
Change-Agent	Change-Agent: Toward Interactive Comprehensive Remote Sensing Change Interpretation and Analysis	TGRS2024	Paper	link

Benchmarks for RSFMs

Abbreviation	Title	Publication	Paper	Link	Downstream Tasks
-	Revisiting pre-trained remote sensing model benchmarks: resizing and normalization matters	Arxiv2023	Paper	link	Classification
GEO-Bench	GEO-Bench: Toward Foundation Models for Earth Monitoring	Arxiv2023	Paper	link	Classification & Segmentation
FoMo-Bench	FoMo-Bench: a multi-modal, multi-scale and multi-task Forest Monitoring Benchmark for remote sensing foundation models	Arxiv2023	FoMo-Bench	Comming soon	Classification & Segmentation & Detection for forest monitoring
PhilEO	PhilEO Bench: Evaluating Geo-Spatial Foundation Models	Arxiv2024	Paper	link	Segmentation & Regression estimation
SkySense	SkySense: A Multi-Modal Remote Sensing Foundation Model Towards Universal Interpretation for Earth Observation Imagery	CVPR2024	SkySense	Targeted open-source	Classification & Segmentation & Detection & Change detection & Multi-Modal Segmentation: Time-insensitive LandCover Mapping & Multi-Modal Segmentation: Time-sensitive Crop Mapping & Multi-Modal Scene Classification
VLEO-Bench	Good at captioning, bad at counting: Benchmarking GPT-4V on Earth observation data	Arxiv2024	VLEO-bench	link	Location Recognition & Captioning & Scene Classification & Counting & Detection & Change detection
VRSBench	VRSBench: A Versatile Vision-Language Benchmark Dataset for Remote Sensing Image Understanding	NeurIPS2024	VRSBench	link	Image Captioning & Object Referring & Visual Question Answering
UrBench	UrBench: A Comprehensive Benchmark for Evaluating Large Multimodal Models in Multi-View Urban Scenarios	Arxiv2024	UrBench	link	Object Referring & Visual Question Answering & Counting & Scene Classification & Location Recognition & Geolocalization
PANGAEA	PANGAEA: A Global and Inclusive Benchmark for Geospatial Foundation Models	Arxiv2024	PANGAEA	link	Segmentation & Change detection & Regression
COREval	COREval: A Comprehensive and Objective Benchmark for Evaluating the Remote Sensing Capabilities of Large Vision-Language Models	Arxiv2024	COREval	null	Perception & Reasoning
GEOBench-VLM	GEOBench-VLM: Benchmarking Vision-Language Models for Geospatial Tasks	Arxiv2024	GEOBench-VLM	link	Scene Understanding & Counting & Object Classification & Event Detection & Spatial Relations

(Large-scale) Pre-training Datasets

Abbreviation	Title	Publication	Paper	Attribute	Link
fMoW	Functional Map of the World	CVPR2018	fMoW	Vision	link
SEN12MS	SEN12MS -- A Curated Dataset of Georeferenced Multi-Spectral Sentinel-1/2 Imagery for Deep Learning and Data Fusion	-	SEN12MS	Vision	link
BEN-MM	BigEarthNet-MM: A Large Scale Multi-Modal Multi-Label Benchmark Archive for Remote Sensing Image Classification and Retrieval	GRSM2021	BEN-MM	Vision	link
MillionAID	On Creating Benchmark Dataset for Aerial Image Interpretation: Reviews, Guidances, and Million-AID	JSTARS2021	MillionAID	Vision	link
SeCo	Seasonal Contrast: Unsupervised Pre-Training From Uncurated Remote Sensing Data	ICCV2021	SeCo	Vision	link
fMoW-S2	SatMAE: Pre-training Transformers for Temporal and Multi-Spectral Satellite Imagery	NeurIPS2022	fMoW-S2	Vision	link
TOV-RS-Balanced	TOV: The original vision model for optical remote sensing image understanding via self-supervised learning	JSTARS2023	TOV	Vision	link
SSL4EO-S12	SSL4EO-S12: A Large-Scale Multi-Modal, Multi-Temporal Dataset for Self-Supervised Learning in Earth Observation	GRSM2023	SSL4EO-S12	Vision	link
SSL4EO-L	SSL4EO-L: Datasets and Foundation Models for Landsat Imagery	Arxiv2023	SSL4EO-L	Vision	link
SatlasPretrain	SatlasPretrain: A Large-Scale Dataset for Remote Sensing Image Understanding	ICCV2023	SatlasPretrain	Vision (Supervised)	link
CACo	Change-Aware Sampling and Contrastive Learning for Satellite Images	CVPR2023	CACo	Vision	Comming soon
SAMRS	SAMRS: Scaling-up Remote Sensing Segmentation Dataset with Segment Anything Model	NeurIPS2023	SAMRS	Vision	link
RSVG	RSVG: Exploring Data and Models for Visual Grounding on Remote Sensing Data	TGRS2023	RSVG	Vision-Language	link
RS5M	RS5M: A Large Scale Vision-Language Dataset for Remote Sensing Vision-Language Foundation Model	Arxiv2023	RS5M	Vision-Language	link
GEO-Bench	GEO-Bench: Toward Foundation Models for Earth Monitoring	Arxiv2023	GEO-Bench	Vision (Evaluation)	link
RSICap & RSIEval	RSGPT: A Remote Sensing Vision Language Model and Benchmark	Arxiv2023	RSGPT	Vision-Language	Comming soon
Clay	Clay Foundation Model	-	null	Vision	link
SATIN	SATIN: A Multi-Task Metadataset for Classifying Satellite Imagery using Vision-Language Models	ICCVW2023	SATIN	Vision-Language	link
SkyScript	SkyScript: A Large and Semantically Diverse Vision-Language Dataset for Remote Sensing	AAAI2024	SkyScript	Vision-Language	link
ChatEarthNet	ChatEarthNet: A Global-Scale, High-Quality Image-Text Dataset for Remote Sensing	Arxiv2024	ChatEarthNet	Vision-Language	link
LuoJiaHOG	LuoJiaHOG: A Hierarchy Oriented Geo-aware Image Caption Dataset for Remote Sensing Image-Text Retrieval	Arxiv2024	LuoJiaHOG	Vision-Language	null
MMEarth	MMEarth: Exploring Multi-Modal Pretext Tasks For Geospatial Representation Learning	Arxiv2024	MMEarth	Vision	link
SeeFar	SeeFar: Satellite Agnostic Multi-Resolution Dataset for Geospatial Foundation Models	Arxiv2024	SeeFar	Vision	link
FIT-RS	SkySenseGPT: A Fine-Grained Instruction Tuning Dataset and Model for Remote Sensing Vision-Language Understanding	Arxiv2024	Paper	Vision-Language	link
RS-GPT4V	RS-GPT4V: A Unified Multimodal Instruction-Following Dataset for Remote Sensing Image Understanding	Arxiv2024	Paper	Vision-Language	link
RS-4M	Scaling Efficient Masked Autoencoder Learning on Large Remote Sensing Dataset	Arxiv2024	RS-4M	Vision	link
Major TOM	Major TOM: Expandable Datasets for Earth Observation	Arxiv2024	Major TOM	Vision	link
VRSBench	VRSBench: A Versatile Vision-Language Benchmark Dataset for Remote Sensing Image Understanding	Arxiv2024	VRSBench	Vision-Language	link
MMM-RS	MMM-RS: A Multi-modal, Multi-GSD, Multi-scene Remote Sensing Dataset and Benchmark for Text-to-Image Generation	Arxiv2024	MMM-RS	Vision-Language	link
DDFAV	DDFAV: Remote Sensing Large Vision Language Models Dataset and Evaluation Benchmark	Arxiv2024	DDFAV	Vision-Language	link

Relevant Projects

（TODO. This section is dedicated to recommending more relevant and impactful projects, with the hope of promoting the development of the RS community. :smile: :rocket:）

Title	Link	Brief Introduction
RSFMs (Remote Sensing Foundation Models) Playground	link	An open-source playground to streamline the evaluation and fine-tuning of RSFMs on various datasets.
PANGAEA	link	A Global and Inclusive Benchmark for Geospatial Foundation Models.
GeoFM	link	Evaluation of Foundation Models for Earth Observation.

Survey Papers

Title	Publication	Paper	Attribute
Self-Supervised Remote Sensing Feature Learning: Learning Paradigms, Challenges, and Future Works	TGRS2023	Paper	Vision & Vision-Language
The Potential of Visual ChatGPT For Remote Sensing	Arxiv2023	Paper	Vision-Language
遥感大模型：进展与前瞻	武汉大学学报 (信息科学版) 2023	Paper	Vision & Vision-Language
地理人工智能样本：模型、质量与服务	武汉大学学报 (信息科学版) 2023	Paper	-
Brain-Inspired Remote Sensing Foundation Models and Open Problems: A Comprehensive Survey	JSTARS2023	Paper	Vision & Vision-Language
Revisiting pre-trained remote sensing model benchmarks: resizing and normalization matters	Arxiv2023	Paper	Vision
An Agenda for Multimodal Foundation Models for Earth Observation	IGARSS2023	Paper	Vision
Transfer learning in environmental remote sensing	RSE2024	Paper	Transfer learning
遥感基础模型发展综述与未来设想	遥感学报2023	Paper	-
On the Promises and Challenges of Multimodal Foundation Models for Geographical, Environmental, Agricultural, and Urban Planning Applications	Arxiv2023	Paper	Vision-Language
Vision-Language Models in Remote Sensing: Current Progress and Future Trends	IEEE GRSM2024	Paper	Vision-Language
On the Foundations of Earth and Climate Foundation Models	Arxiv2024	Paper	Vision & Vision-Language
Towards Vision-Language Geo-Foundation Model: A Survey	Arxiv2024	Paper	Vision-Language
AI Foundation Models in Remote Sensing: A Survey	Arxiv2024	Paper	Vision
Foundation model for generalist remote sensing intelligence: Potentials and prospects	Science Bulletin2024	Paper	-
Advancements in Visual Language Models for Remote Sensing: Datasets, Capabilities, and Enhancement Techniques	Arxiv2024	Paper	Vision-Language
Foundation Models for Remote Sensing and Earth Observation: A Survey	Arxiv2024	Paper	Vision & Vision-Language
多模态遥感基础大模型：研究现状与未来展望	测绘学报2024	Paper	Vision & Vision-Language & Generative & Vision-Location

Citation

If you find this repository useful, please consider giving a star :star: and citation:

@inproceedings{guo2024skysense,
  title={Skysense: A multi-modal remote sensing foundation model towards universal interpretation for earth observation imagery},
  author={Guo, Xin and Lao, Jiangwei and Dang, Bo and Zhang, Yingying and Yu, Lei and Ru, Lixiang and Zhong, Liheng and Huang, Ziyuan and Wu, Kang and Hu, Dingxiang and others},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages={27672--27683},
  year={2024}
}