Home

Awesome

Maintenance Awesome <img alt="GitHub watchers" src="https://img.shields.io/github/watchers/Jack-bo1220/Awesome-Remote-Sensing-Foundation-Models?style=social"> <img alt="GitHub stars" src="https://img.shields.io/github/stars/Jack-bo1220/Awesome-Remote-Sensing-Foundation-Models?style=social"> <img alt="GitHub forks" src="https://img.shields.io/github/forks/Jack-bo1220/Awesome-Remote-Sensing-Foundation-Models?style=social">

<p align=center>Awesome Remote Sensing Foundation Models</p>

:star2:A collection of papers, datasets, benchmarks, code, and pre-trained weights for Remote Sensing Foundation Models (RSFMs).

📢 Latest Updates

:fire::fire::fire: Last Updated on 2024.08.08 :fire::fire::fire:

Table of Contents

Remote Sensing <ins>Vision</ins> Foundation Models

AbbreviationTitlePublicationPaperCode & Weights
GeoKRGeographical Knowledge-Driven Representation Learning for Remote Sensing ImagesTGRS2021GeoKRlink
-Self-Supervised Learning of Remote Sensing Scene Representations Using Contrastive Multiview CodingCVPRW2021Paperlink
GASSLGeography-Aware Self-Supervised LearningICCV2021GASSLlink
SeCoSeasonal Contrast: Unsupervised Pre-Training From Uncurated Remote Sensing DataICCV2021SeColink
DINO-MMSelf-supervised Vision Transformers for Joint SAR-optical Representation LearningIGARSS2022DINO-MMlink
SatMAESatMAE: Pre-training Transformers for Temporal and Multi-Spectral Satellite ImageryNeurIPS2022SatMAElink
RS-BYOLSelf-Supervised Learning for Invariant Representations From Multi-Spectral and SAR ImagesJSTARS2022RS-BYOLnull
GeCoGeographical Supervision Correction for Remote Sensing Representation LearningTGRS2022GeConull
RingMoRingMo: A remote sensing foundation model with masked image modelingTGRS2022RingMoCode
RVSAAdvancing plain vision transformer toward remote sensing foundation modelTGRS2022RVSAlink
RSPAn Empirical Study of Remote Sensing PretrainingTGRS2022RSPlink
MATTERSelf-Supervised Material and Texture Representation Learning for Remote Sensing TasksCVPR2022MATTERnull
CSPTConsecutive Pre-Training: A Knowledge Transfer Learning Strategy with Relevant Unlabeled Data for Remote Sensing DomainRS2022CSPTlink
-Self-supervised Vision Transformers for Land-cover Segmentation and ClassificationCVPRW2022Paperlink
BFMA billion-scale foundation model for remote sensing imagesArxiv2023BFMnull
TOVTOV: The original vision model for optical remote sensing image understanding via self-supervised learningJSTARS2023TOVlink
CMIDCMID: A Unified Self-Supervised Learning Framework for Remote Sensing Image UnderstandingTGRS2023CMIDlink
RingMo-SenseRingMo-Sense: Remote Sensing Foundation Model for Spatiotemporal Prediction via Spatiotemporal Evolution DisentanglingTGRS2023RingMo-Sensenull
IaI-SimCLRMulti-Modal Multi-Objective Contrastive Learning for Sentinel-1/2 ImageryCVPRW2023IaI-SimCLRnull
CACoChange-Aware Sampling and Contrastive Learning for Satellite ImagesCVPR2023CAColink
SatLasSatlasPretrain: A Large-Scale Dataset for Remote Sensing Image UnderstandingICCV2023SatLaslink
GFMTowards Geospatial Foundation Models via Continual PretrainingICCV2023GFMlink
Scale-MAEScale-MAE: A Scale-Aware Masked Autoencoder for Multiscale Geospatial Representation LearningICCV2023Scale-MAElink
DINO-MCDINO-MC: Self-supervised Contrastive Learning for Remote Sensing Imagery with Multi-sized Local CropsArxiv2023DINO-MClink
CROMACROMA: Remote Sensing Representations with Contrastive Radar-Optical Masked AutoencodersNeurIPS2023CROMAlink
Cross-Scale MAECross-Scale MAE: A Tale of Multiscale Exploitation in Remote SensingNeurIPS2023Cross-Scale MAElink
DeCURDeCUR: decoupling common & unique representations for multimodal self-supervisionArxiv2023DeCURlink
PrestoLightweight, Pre-trained Transformers for Remote Sensing TimeseriesArxiv2023Prestolink
CtxMIMCtxMIM: Context-Enhanced Masked Image Modeling for Remote Sensing Image UnderstandingArxiv2023CtxMIMnull
FG-MAEFeature Guided Masked Autoencoder for Self-supervised Learning in Remote SensingArxiv2023FG-MAElink
PrithviFoundation Models for Generalist Geospatial Artificial IntelligenceArxiv2023Prithvilink
RingMo-liteRingMo-lite: A Remote Sensing Multi-task Lightweight Network with CNN-Transformer Hybrid FrameworkArxiv2023RingMo-litenull
-A Self-Supervised Cross-Modal Remote Sensing Foundation Model with Multi-Domain Representation and Cross-Domain FusionIGARSS2023Papernull
EarthPTEarthPT: a foundation model for Earth ObservationNeurIPS2023 CCAI workshopEarthPTlink
USatUSat: A Unified Self-Supervised Encoder for Multi-Sensor Satellite ImageryArxiv2023USatlink
FoMo-BenchFoMo-Bench: a multi-modal, multi-scale and multi-task Forest Monitoring Benchmark for remote sensing foundation modelsArxiv2023FoMo-Benchlink
AIEarthAnalytical Insight of Earth: A Cloud-Platform of Intelligent Computing for Geospatial Big DataArxiv2023AIEarthlink
-Self-Supervised Learning for SAR ATR with a Knowledge-Guided Predictive ArchitectureArxiv2023Paperlink
ClayClay Foundation Model-nulllink
HydroHydro--A Foundation Model for Water in Satellite Imagery-nulllink
U-BARNSelf-Supervised Spatio-Temporal Representation Learning of Satellite Image Time SeriesJSTARS2024Paperlink
GeRSPGeneric Knowledge Boosted Pre-training For Remote Sensing ImagesArxiv2024GeRSPGeRSP
SwiMDiffSwiMDiff: Scene-wide Matching Contrastive Learning with Diffusion Constraint for Remote Sensing ImageArxiv2024SwiMDiffnull
OFA-NetOne for All: Toward Unified Foundation Models for Earth VisionArxiv2024OFA-Netnull
SMLFRGenerative ConvNet Foundation Model With Sparse Modeling and Low-Frequency Reconstruction for Remote Sensing Image InterpretationTGRS2024SMLFRlink
SpectralGPTSpectralGPT: Spectral Foundation ModelTPAMI2024SpectralGPTlink
S2MAES2MAE: A Spatial-Spectral Pretraining Foundation Model for Spectral Remote Sensing DataCVPR2024S2MAEnull
SatMAE++Rethinking Transformers Pre-training for Multi-Spectral Satellite ImageryCVPR2024SatMAE++link
msGFMBridging Remote Sensors with Multisensor Geospatial Foundation ModelsCVPR2024msGFMlink
SkySenseSkySense: A Multi-Modal Remote Sensing Foundation Model Towards Universal Interpretation for Earth Observation ImageryCVPR2024SkySenseComming soon
MTPMTP: Advancing Remote Sensing Foundation Model via Multi-Task PretrainingArxiv2024MTPlink
DOFANeural Plasticity-Inspired Foundation Model for Observing the Earth Crossing ModalitiesArxiv2024DOFAlink
PISPretrain A Remote Sensing Foundation Model by Promoting Intra-instance Similarity-nulllink
MMEarthMMEarth: Exploring Multi-Modal Pretext Tasks For Geospatial Representation LearningArxiv2024MMEarthlink
SARATR-XSARATR-X: A Foundation Model for Synthetic Aperture Radar Images Target RecognitionArxiv2024SARATR-Xlink
LeMeViTLeMeViT: Efficient Vision Transformer with Learnable Meta Tokens for Remote Sensing Image InterpretationIJCAI2024LeMeViTlink
SoftConMulti-Label Guided Soft Contrastive Learning for Efficient Earth Observation PretrainingArxiv2024SoftConlink
RS-DFMRS-DFM: A Remote Sensing Distributed Foundation Model for Diverse Downstream TasksArxiv2024RS-DFMnull
A2-MAEA2-MAE: A spatial-temporal-spectral unified remote sensing pre-training method based on anchor-aware masked autoencoderArxiv2024A2-MAEnull
HyperSIGMAHyperSIGMA: Hyperspectral Intelligence Comprehension Foundation ModelArxiv2024HyperSIGMAlink
SelectiveMAEScaling Efficient Masked Autoencoder Learning on Large Remote Sensing DatasetArxiv2024SelectiveMAElink
OmniSatOmniSat: Self-Supervised Modality Fusion for Earth ObservationECCV2024OmniSatlink
MM-VSFTowards a Knowledge guided Multimodal Foundation Model for Spatio-Temporal Remote Sensing ApplicationsArxiv2024MM-VSFnull
MA3EMasked Angle-Aware Autoencoder for Remote Sensing ImagesECCV2024MA3Elink

Remote Sensing <ins>Vision-Language</ins> Foundation Models

AbbreviationTitlePublicationPaperCode & Weights
RSGPTRSGPT: A Remote Sensing Vision Language Model and BenchmarkArxiv2023RSGPTlink
RemoteCLIPRemoteCLIP: A Vision Language Foundation Model for Remote SensingArxiv2023RemoteCLIPlink
GeoRSCLIPRS5M: A Large Scale Vision-Language Dataset for Remote Sensing Vision-Language Foundation ModelArxiv2023GeoRSCLIPlink
GRAFTRemote Sensing Vision-Language Foundation Models without Annotations via Ground Remote AlignmentICLR2024GRAFTnull
-Charting New Territories: Exploring the Geographic and Geospatial Capabilities of Multimodal LLMsArxiv2023Paperlink
-Remote Sensing ChatGPT: Solving Remote Sensing Tasks with ChatGPT and Visual ModelsArxiv2024Paperlink
SkyEyeGPTSkyEyeGPT: Unifying Remote Sensing Vision-Language Tasks via Instruction Tuning with Large Language ModelArxiv2024Paperlink
EarthGPTEarthGPT: A Universal Multi-modal Large Language Model for Multi-sensor Image Comprehension in Remote Sensing DomainArxiv2024Papernull
SkyCLIPSkyScript: A Large and Semantically Diverse Vision-Language Dataset for Remote SensingAAAI2024SkyCLIPlink
GeoChatGeoChat: Grounded Large Vision-Language Model for Remote SensingCVPR2024GeoChatlink
LHRS-BotLHRS-Bot: Empowering Remote Sensing with VGI-Enhanced Large Multimodal Language ModelArxiv2024Paperlink
H2RSVLMH2RSVLM: Towards Helpful and Honest Remote Sensing Large Vision Language ModelArxiv2024Paperlink
RS-LLaVARS-LLaVA: Large Vision Language Model for Joint Captioning and Question Answering in Remote Sensing ImageryRS2024Paperlink
SkySenseGPTSkySenseGPT: A Fine-Grained Instruction Tuning Dataset and Model for Remote Sensing Vision-Language UnderstandingArxiv2024Paperlink

Remote Sensing <ins>Generative</ins> Foundation Models

AbbreviationTitlePublicationPaperCode & Weights
Seg2SatSeg2Sat - Segmentation to aerial view using pretrained diffuser modelsGithubnulllink
-Generate Your Own Scotland: Satellite Image Generation Conditioned on MapsNeurIPSW2023Paperlink
GeoRSSDRS5M: A Large Scale Vision-Language Dataset for Remote Sensing Vision-Language Foundation ModelArxiv2023Paperlink
DiffusionSatDiffusionSat: A Generative Foundation Model for Satellite ImageryICLR2024DiffusionSatlink
CRS-DiffCRS-Diff: Controllable Generative Remote Sensing Foundation ModelArxiv2024Papernull
MetaEarthMetaEarth: A Generative Foundation Model for Global-Scale Remote Sensing Image GenerationArxiv2024Paperlink

Remote Sensing <ins>Vision-Location</ins> Foundation Models

AbbreviationTitlePublicationPaperCode & Weights
CSPCSP: Self-Supervised Contrastive Spatial Pre-Training for Geospatial-Visual RepresentationsICML2023CSPlink
GeoCLIPGeoCLIP: Clip-Inspired Alignment between Locations and Images for Effective Worldwide Geo-localizationNeurIPS2023GeoCLIPlink
SatCLIPSatCLIP: Global, General-Purpose Location Embeddings with Satellite ImageryArxiv2023SatCLIPlink

Remote Sensing <ins>Vision-Audio</ins> Foundation Models

AbbreviationTitlePublicationPaperCode & Weights
-Self-supervised audiovisual representation learning for remote sensing dataJAG2022Paperlink

Remote Sensing <ins>Task-specific</ins> Foundation Models

AbbreviationTitlePublicationPaperCode & WeightsTask
SS-MAESS-MAE: Spatial-Spectral Masked Auto-Encoder for Mulit-Source Remote Sensing Image ClassificationTGRS2023PaperlinkImage Classification
TTPTime Travelling Pixels: Bitemporal Features Integration with Foundation Model for Remote Sensing Image Change DetectionArxiv2023PaperlinkChange Detection
CSMAEExploring Masked Autoencoders for Sensor-Agnostic Image Retrieval in Remote SensingArxiv2024PaperlinkImage Retrieval
RSPrompterRSPrompter: Learning to Prompt for Remote Sensing Instance Segmentation based on Visual Foundation ModelTGRS2024PaperlinkInstance Segmentation
BANA New Learning Paradigm for Foundation Model-based Remote Sensing Change DetectionTGRS2024PaperlinkChange Detection
-Change Detection Between Optical Remote Sensing Imagery and Map Data via Segment Anything Model (SAM)Arxiv2024PapernullChange Detection (Optical & OSM data)
AnyChangeSegment Any ChangeArxiv2024PapernullZero-shot Change Detection
RS-CapRetLarge Language Models for Captioning and Retrieving Remote Sensing ImagesArxiv2024PapernullImage Caption & Text-image Retrieval
-Task Specific Pretraining with Noisy Labels for Remote sensing Image SegmentationArxiv2024PapernullImage Segmentation (Noisy labels)
RSBuildingRSBuilding: Towards General Remote Sensing Image Building Extraction and Change Detection with Foundation ModelArxiv2024PaperlinkBuilding Extraction and Change Detection
SAM-RoadSegment Anything Model for Road Network Graph ExtractionArxiv2024PaperlinkRoad Extraction

Remote Sensing Agents

AbbreviationTitlePublicationPaperCode & Weights
GeoLLM-QAEvaluating Tool-Augmented Agents in Remote Sensing PlatformsICLR 2024 ML4RS WorkshopPapernull
RS-AgentRS-Agent: Automating Remote Sensing Tasks through Intelligent AgentsArxiv2024Papernull

Benchmarks for RSFMs

AbbreviationTitlePublicationPaperLinkDownstream Tasks
-Revisiting pre-trained remote sensing model benchmarks: resizing and normalization mattersArxiv2023PaperlinkClassification
GEO-BenchGEO-Bench: Toward Foundation Models for Earth MonitoringArxiv2023PaperlinkClassification & Segmentation
FoMo-BenchFoMo-Bench: a multi-modal, multi-scale and multi-task Forest Monitoring Benchmark for remote sensing foundation modelsArxiv2023FoMo-BenchComming soonClassification & Segmentation & Detection for forest monitoring
PhilEOPhilEO Bench: Evaluating Geo-Spatial Foundation ModelsArxiv2024PaperlinkSegmentation & Regression estimation
SkySenseSkySense: A Multi-Modal Remote Sensing Foundation Model Towards Universal Interpretation for Earth Observation ImageryCVPR2024SkySenseComming SoonClassification & Segmentation & Detection & Change detection & Multi-Modal Segmentation: Time-insensitive LandCover Mapping & Multi-Modal Segmentation: Time-sensitive Crop Mapping & Multi-Modal Scene Classification
VLEO-BenchGood at captioning, bad at counting: Benchmarking GPT-4V on Earth observation dataArxiv2024VLEO-benchlinkLocation Recognition & Captioning & Scene Classification & Counting & Detection & Change detection
VRSBenchVRSBench: A Versatile Vision-Language Benchmark Dataset for Remote Sensing Image UnderstandingArxiv2024VRSBenchlinkImage Captioning & Object Referring & Visual Question Answering

(Large-scale) Pre-training Datasets

AbbreviationTitlePublicationPaperAttributeLink
fMoWFunctional Map of the WorldCVPR2018fMoWVisionlink
SEN12MSSEN12MS -- A Curated Dataset of Georeferenced Multi-Spectral Sentinel-1/2 Imagery for Deep Learning and Data Fusion-SEN12MSVisionlink
BEN-MMBigEarthNet-MM: A Large Scale Multi-Modal Multi-Label Benchmark Archive for Remote Sensing Image Classification and RetrievalGRSM2021BEN-MMVisionlink
MillionAIDOn Creating Benchmark Dataset for Aerial Image Interpretation: Reviews, Guidances, and Million-AIDJSTARS2021MillionAIDVisionlink
SeCoSeasonal Contrast: Unsupervised Pre-Training From Uncurated Remote Sensing DataICCV2021SeCoVisionlink
fMoW-S2SatMAE: Pre-training Transformers for Temporal and Multi-Spectral Satellite ImageryNeurIPS2022fMoW-S2Visionlink
TOV-RS-BalancedTOV: The original vision model for optical remote sensing image understanding via self-supervised learningJSTARS2023TOVVisionlink
SSL4EO-S12SSL4EO-S12: A Large-Scale Multi-Modal, Multi-Temporal Dataset for Self-Supervised Learning in Earth ObservationGRSM2023SSL4EO-S12Visionlink
SSL4EO-LSSL4EO-L: Datasets and Foundation Models for Landsat ImageryArxiv2023SSL4EO-LVisionlink
SatlasPretrainSatlasPretrain: A Large-Scale Dataset for Remote Sensing Image UnderstandingICCV2023SatlasPretrainVision (Supervised)link
CACoChange-Aware Sampling and Contrastive Learning for Satellite ImagesCVPR2023CACoVisionComming soon
SAMRSSAMRS: Scaling-up Remote Sensing Segmentation Dataset with Segment Anything ModelNeurIPS2023SAMRSVisionlink
RSVGRSVG: Exploring Data and Models for Visual Grounding on Remote Sensing DataTGRS2023RSVGVision-Languagelink
RS5MRS5M: A Large Scale Vision-Language Dataset for Remote Sensing Vision-Language Foundation ModelArxiv2023RS5MVision-Languagelink
GEO-BenchGEO-Bench: Toward Foundation Models for Earth MonitoringArxiv2023GEO-BenchVision (Evaluation)link
RSICap & RSIEvalRSGPT: A Remote Sensing Vision Language Model and BenchmarkArxiv2023RSGPTVision-LanguageComming soon
ClayClay Foundation Model-nullVisionlink
SATINSATIN: A Multi-Task Metadataset for Classifying Satellite Imagery using Vision-Language ModelsICCVW2023SATINVision-Languagelink
SkyScriptSkyScript: A Large and Semantically Diverse Vision-Language Dataset for Remote SensingAAAI2024SkyScriptVision-Languagelink
ChatEarthNetChatEarthNet: A Global-Scale, High-Quality Image-Text Dataset for Remote SensingArxiv2024ChatEarthNetVision-Languagelink
LuoJiaHOGLuoJiaHOG: A Hierarchy Oriented Geo-aware Image Caption Dataset for Remote Sensing Image-Text RetrievalArxiv2024LuoJiaHOGVision-Languagenull
MMEarthMMEarth: Exploring Multi-Modal Pretext Tasks For Geospatial Representation LearningArxiv2024MMEarthVisionlink
SeeFarSeeFar: Satellite Agnostic Multi-Resolution Dataset for Geospatial Foundation ModelsArxiv2024SeeFarVisionlink
FIT-RSSkySenseGPT: A Fine-Grained Instruction Tuning Dataset and Model for Remote Sensing Vision-Language UnderstandingArxiv2024PaperVision-Languagelink
RS-GPT4VRS-GPT4V: A Unified Multimodal Instruction-Following Dataset for Remote Sensing Image UnderstandingArxiv2024PaperVision-Languagelink
RS-4MScaling Efficient Masked Autoencoder Learning on Large Remote Sensing DatasetArxiv2024RS-4MVisionlink
Major TOMMajor TOM: Expandable Datasets for Earth ObservationArxiv2024Major TOMVisionlink
VRSBenchVRSBench: A Versatile Vision-Language Benchmark Dataset for Remote Sensing Image UnderstandingArxiv2024VRSBenchVision-Languagelink

Relevant Projects

(TODO. This section is dedicated to recommending more relevant and impactful projects, with the hope of promoting the development of the RS community. :smile: :rocket:)

TitleLinkBrief Introduction
RSFMs (Remote Sensing Foundation Models) PlaygroundlinkAn open-source playground to streamline the evaluation and fine-tuning of RSFMs on various datasets.

Survey Papers

TitlePublicationPaperAttribute
Self-Supervised Remote Sensing Feature Learning: Learning Paradigms, Challenges, and Future WorksTGRS2023PaperVision & Vision-Language
The Potential of Visual ChatGPT For Remote SensingArxiv2023PaperVision-Language
遥感大模型:进展与前瞻武汉大学学报 (信息科学版) 2023PaperVision & Vision-Language
地理人工智能样本:模型、质量与服务武汉大学学报 (信息科学版) 2023Paper-
Brain-Inspired Remote Sensing Foundation Models and Open Problems: A Comprehensive SurveyJSTARS2023PaperVision & Vision-Language
Revisiting pre-trained remote sensing model benchmarks: resizing and normalization mattersArxiv2023PaperVision
An Agenda for Multimodal Foundation Models for Earth ObservationIGARSS2023PaperVision
Transfer learning in environmental remote sensingRSE2024PaperTransfer learning
遥感基础模型发展综述与未来设想遥感学报2023Paper-
On the Promises and Challenges of Multimodal Foundation Models for Geographical, Environmental, Agricultural, and Urban Planning ApplicationsArxiv2023PaperVision-Language
Vision-Language Models in Remote Sensing: Current Progress and Future TrendsIEEE GRSM2024PaperVision-Language
On the Foundations of Earth and Climate Foundation ModelsArxiv2024PaperVision & Vision-Language
Towards Vision-Language Geo-Foundation Model: A SurveyArxiv2024PaperVision-Language
AI Foundation Models in Remote Sensing: A SurveyArxiv2024PaperVision

Citation

If you find this repository useful, please consider giving a star :star: and citation:

@inproceedings{guo2024skysense,
  title={Skysense: A multi-modal remote sensing foundation model towards universal interpretation for earth observation imagery},
  author={Guo, Xin and Lao, Jiangwei and Dang, Bo and Zhang, Yingying and Yu, Lei and Ru, Lixiang and Zhong, Liheng and Huang, Ziyuan and Wu, Kang and Hu, Dingxiang and others},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages={27672--27683},
  year={2024}
}