Awesome

Awesome remote sensing vision language models

This is a repository for visual language models in remote sensing, including advanced methods and commonly used datasets in different applications, such as image-text retrieval, visual question answering, pretraining, etc.

If you find any relevant papers that are not included here, please feel free to pull requests at any time.

Surveys
Remote Sensing Vision Language Model
Applications
Dataset

Surveys

Paper	Published in	Code/Project
Vision-Language Models in Remote Sensing: Current Progress and Future Trends	arxiv 2023	-
The Potential of Visual ChatGPT For Remote Sensing	arxiv 2023	-
Brain-inspired Remote Sensing Foundation Models and Open Problems: A Comprehensive Survey	JSTARG 2023	-

Remote Sensing Vision Language Model

Paper	Published in	Code/Project
RSGPT: A Remote Sensing Vision Language Model and Benchmark	arxiv 2023	code
RemoteGLM	2023	code
Tree-GPT: Modular Large Language Model Expert System for Forest Remote Sensing Image Understanding and Interactive Analysis	arxiv 2023	-
Towards Automatic Satellite Images Captions Generation Using Large Language Models	arxiv 2023	-
GeoChat: Grounded Large Vision-Language Model for Remote Sensing	arxiv 2023	code
SkyScript: A Large and Semantically Diverse Vision-Language Dataset for Remote Sensing	AAAI 2024	code

Applications

Pretraining

Paper	Published in	Code/Project
S-CLIP: Semi-supervised Vision-Language Pre-training using Few Specialist Captions	arxiv 2023	code
RemoteCLIP: A Vision Language Foundation Model for Remote Sensing	arxiv 2023	code
RS5M: A Large Scale Vision-Language Dataset for Remote Sensing Vision-Language Foundation Model	arxiv 2023	Project

Image Captioning

Paper	Published in	Code/Project
Deep Semantic Understanding of High Resolution Remote Sensing Image	CITS 2016	-
Can a Machine Generate Humanlike Language Descriptions for a Remote Sensing Image?	TGRS 2017	-
Exploring models and data for remote sensing image caption generation	TGRS 2017	code
Natural language escription of remote sensing images based on deep learning	IGARSS 2017	-
Description Generation for Remote Sensing Images Using Attribute Attention Mechanism	Remote Sensing 2019	-
Vaa:Visual aligning attention model for remote sensing image captioning	IEEE Access 2019	-
Exploring Multi-Level Attention and Semantic Relationship for Remote Sensing Image Captioning	IEEE Access 2019	-
A multi-level attention model for remote sensing image captions	Remote Sensing 2020	-
Remote sensing image captioning via variational autoencoder and reinforcement learning	Knowledge-Based Systems 2020	-
Truncation cross entropy loss for remote sensing image captionin	TGRS 2020	-
Word–Sentence Framework for Remote Sensing Image Captioning	TGRS 2020	code
A novel SVM-based decoder for remote sensing image captioning	TGRS 2021	-
High-resolution remote sensing image captioning based on structured attention	TGRS 2021	code
Exploring transformer and multilabel classification for remote sensing image captioning	GRSL 2022	-
NWPU-captions dataset and mlca-net for remote sensing image captioning	TGRS 2022	-
Remote Sensing Image Change Captioning With Dual-Branch Transformers: A New Method and a Large Scale Dataset	TGRS 2022	code
Transforming remote sensing images to textual descriptions	INT J APPL EARTH OBS 2022	-
Remote-sensing image captioning based on multilayer aggregated transformer	GRSL 2022	-
Vlca: vision-language aligning model with cross-modal attention for bilingual remote sensing image captioning	J SYST ENG ELECTRON 2023	-
Multi-source interactive stair attention for remote sensing image captioning	Remote Sensing 2023	-
Changes to Captions: An Attentive Network for Remote Sensing Change Captioning	arxiv 2023	code
Bootstrapping Interactive Image-Text Alignment for Remote Sensing Image Captioning	arxiv 2023	code

Text-based Image Generation

Paper	Published in	Code/Project
Retro-Remote Sensing: Generating Images From Ancient Texts	J-STARS 2019	-
Remote sensing image augmentation based on text description for waterside change detection	Remote Sensing 2021	-
Text-to-remote-sensing-image generation with structured generative adversarial networks	GRSL 2021	-
Txt2img-MHN:Remote sensing image generation from text using modern hopfield network	arxiv 2022	code

Image-text Retrieval

Paper	Published in	Code/Project
Textrs: Deep bidirectional triplet network for matching text to remote sensing images.	Remote Sensing 2020	-
Deep unsupervised embedding for remote sensing image retrieval using textual cues	Applied Sciences 2020	-
A deep semantic alignment network for the cross-modal image-text retrieval in remote sensing	J-STARS 2021	-
A lightweight multi-scale crossmodal text-image retrieval method in remote sensing	TGRS 2021	code
Remote sensing cross-modal text-image retrieval based on global and local information	TGRS 2022	code
Multilanguage transformer for improved text to remote sensing image retrieval	J-STARS 2022	-
Exploring a fine-grained multiscale method for cross-modal remote sensing image retrieva	TGRS 2022	code
Contrasting dual transformer architectures for multi-modal remote sensing image retrieval	Applied Sciences 2023	-
Parameter-Efficient Transfer Learning for Remote Sensing Image-Text Retrieval	arxiv 2023	-
Direction-Oriented Visual-semantic Embedding Model for Remote Sensing Image-text Retrieval	arxiv 2023	-

Visual Question Answering

Paper	Published in	Code/Project
RSVQA: Visual question answering for remote sensing data	TGRS 2020	code
Mutual Attention Inception Network for Remote Sensing Visual Question Answering	TGRS 2021	code
How to find a good image-text embedding for remote sensing visual question answering?	ECML-PKDD 2021	-
Cross-Modal Visual Question Answering for Remote Sensing Data: The International Conference on Digital Image Computing: Techniques and Applications	DICTA 2021	-
RSVQA meets bigearthnet: a new,large-scale, visual question answering dataset for remote sensing	IGARSS 2021	code
Self-Paced Curriculum Learning for Visual Question Answering on Remote Sensing Data	IGARSS 2021	-
From easy to hard: Learning language-guided curriculum for visual question answering on remote sensing data	TGRS 2022	code
Language transformers for remote sensing visual question answering	IGARSS 2022	-
Open-ended remote sensing visual question answering with transformers	IJRS 2022	-
Bi-modal transformer-based approach for visual question answering in remote sensing imagery	TGRS 2022	-
Prompt-RSVQA: Prompting visual context to a language model for remote sensing visual question answering	CVPRW 2022	-
Change detection meets visual question answering	TGRS 2022	code
A spatial hierarchical reasoning network for remote sensing visual question answering	TGRS 2023	-
Multilingual Augmentation for Robust Visual Question Answering in Remote Sensing Images	JURSE 2023	-
LiT-4-RSVQA: Lightweight Transformer-based Visual Question Answering in Remote Sensing	IGARSS 2023	code
Charting New Territories: Exploring the Geographic and Geospatial Capabilities of Multimodal LLMs	arXiv 2023	code

Visual Grounding

Paper	Published in	Code/Project
Visual Grounding in Remote Sensing Images	ACMMM 2022	data
RSVG: Exploring data and models for visual grounding on remote sensing data	TGRS 2023	code

Scene Classification

Paper	Published in	Code/Project
Zero-shot scene classification for high spatial resolution remote sensing images	TGRS 2017	-
Fine-grained object recognition and zero-shot learning in remote sensing imagery	TGRS 2017	-
Structural alignment based zero-shot classification for remote sensing scenes	ICECE 2018	-
A distance-constrained semantic autoencoder for zero-shot remote sensing scene classification	J-STARS 2021	-
Learning deep crossmodal embedding networks for zero-shot remote sensing image scene classification	TGRS 2021	-
Generative adversarial networks for zero-shot remote sensing scene classification	Applied Sciences 2022	-
APPLeNet: Visual Attention Parameterized Prompt Learning for Few-Shot Remote Sensing Image Generalization using CLIP	CVPR 2023	code

Object Detection

Paper	Published in	Code/Project
Text semantic fusion relation graph reasoning for few-shot object detection on remote sensing images	Remote Sensing 2023	-
Few-shot object detection in aerial imagery guided by textmodal knowledge	TGRS 2023	-

Semantic Segmentation

Paper	Published in	Code/Project
Semi-supervised contrastive learning for few-shot segmentation of remote sensing images	Remote Sensing 2022	-
Few-shot segmentation of remote sensing images using deep metric learning	GRSL 2022.
Language-aware domain generalization network for cross-scene hyperspectral image classification	TGRS 2023	code
RSPrompter: Learning to Prompt for Remote Sensing Instance Segmentation based on Visual Foundation Model	arxiv 2023	code
RRSIS: Referring Remote Sensing Image Segmentation	arxiv 2023	-
CPSeg: Finer-grained Image Semantic Segmentation via Chain-of-Thought Language Prompting	arxiv 2023	-

Others

Dataset

Image Captioning Dataset

Dataset	Home/Github	Download link
RSICD	Github	[BaiduYun] [Google Drive]
Sydney-Captions	Github	[BaiduYun]
UCM-Captions	Github	[BaiduYun]
NWPU-RESISC45	Github	[BaiduYun] [OneDrive]
DIOR-Captions	-	-
RS-5M	Github	[HuggingFace]
LEVIR-CC	Github	Google Drive
SkyScript	github

Text-based Image Generation Dataset

Text-based Image Retrieval Dataset

Dataset	Home/Project	Download link
RSITMD	Github	[BaiduYun] [Google Drive]

Visual Question Answering Dataset

Dataset	Home/Project	Download link
RSVQA	Home	[data]
RSVQA×BEN	[Github] [Home]	-
RSIVQA	Github	-
CDVQA	Github	-

Visual Grounding Dataset

Dataset	Home/Project	Download link
DIOR-RSVG	Github	[Google Drive]

Scene Classification Dataset

Dataset	Home/Project	Download link
NWPU-RESISC45	Home	[OneDrive] [BaiduYun]
AID	Home	[OneDrive] [BaiduYun]
UC Merced Land-Use(UCM)	Home	-
SATIN	Home	[HuggingFace]

Object Detection Dataset

Dataset	Home/Project	Download link
NWPU VHR-10	Home	[OneDrive] [BaiduYun]
DIOR	Home	[Google Drive] [BaiduYun]
FAIR1M	-	[BaiduYun]

Semantic Segmentation Dataset

Dataset	Home/Project	Download link
Vaihingen	Home	[BaiduYun]
Potsdam	Home	[BaiduYun]
Toronto	Home	-
GID	Home	[BaiduYun code:GID5] [OneDrive]