Awesome
Awesome-Remote-Sensing-Multimodal-Large-Language-Models
π₯π₯π₯ Multimodal Large Language Models for Remote Sensing: A Survey
[Project Page]This Page |
School of Artificial Intelligence, OPtics, and ElectroNics (iOPEN), Northwestern Polytechnical University
<div align='center'> :sparkles: The <b>first survey</b> for Multimodal Large Language Models for Remote Sensing (RS-MLLMs). </div>β¨β¨β¨ Behold our meticulously curated trove of RS-MLLMs resources!!!
πππ‘ The website will be updated in real-time to track the latest state of RS-MLLMs!!!
πππ Feast your eyes on an assortment of model architecture, training pipelines, datasets, comprehensive evaluation benchmarks, intelligent agents for remote sensing, techniques for instruction tuning, and much more.
ππ₯π’ A collection of remote sensing multimodal large language model papers focusing on the vision-language domain.
<p align="center"> <img src="./images/1-timeline.jpg" width="100%" height="100%"> </p><font size=7><div align='center' > :apple: Multimodal Large Language Models for Remote Sensing </div></font>
<p align="center"> <img src="./images/6-timeline-agent.jpg" width="70%" height="100%"> </p> <font size=7><div align='center' > :apple: Intelligent Agents for Remote Sensing </div></font>Please share a <font color='orange'>STAR β</font> if this project does help
π’ Latest Updates
In this repository, we will collect and document researchers and their outstanding work related to remote sensing multimodal large language model (vision-language).
- The list will be continuously updated π₯π₯
- π¦ coming soon! π
- May-22-2024: The first RS-MLLMs review manuscript has been submitted for review. π₯π₯
<font size=5><center><b> Table of Contents </b> </center></font>
- Awesome Papers
- Awesome Datasets
- Latest Evaluation Benchmarks for Remote Sensing Vision-Language Tasks
Awesome Papers
Multimodal Large Language Models for Remote Sensing
Intelligent Agents for Remote Sensing
Title | Venue | Date | Code | Note |
---|---|---|---|---|
RS-Agent: Automating Remote Sensing Tasks through Intelligent Agents <br>W. Xu, Z. Yu, Y. Wang, J. Wang, and M. Peng.<br> | arXiv | 2024-06-11 | - | - |
GeoLLM-Engine: A Realistic Environment for Building Geospatial Copilots <br>S. Singh, M. Fore, D. Stamoulis, and D. Group.<br> | arXiv | 2024-04-23 | - | - |
Evaluating Tool-Augmented Agents in Remote Sensing Platforms <br>S. Singh, M. Fore, and D. Stamoulis.<br> | arXiv | 2024-04-23 | - | - |
arXiv | 2024-04-01 | Github | - | |
arXiv | 2024-01-17 | Github | - | |
Tree-GPT: Modular Large Language Model Expert System for Forest Remote Sensing Image Understanding and Interactive Analysis <br>S. Du, S. Tang, W. Wang, X. Li, and R. Guo.<br> | arXiv | 2023-10-07 | - | - |
Vision-Language Pre-training Models for Remote Sensing
Title | Venue | Date | Code | Note |
---|---|---|---|---|
arXiv | 2024-01-02 | Github | - | |
T-GRS | 2024-04-18 | Github | arXiv | |
Remote Sensing Vision-Language Foundation Models without Annotations via Ground Remote Alignment <br>U. Mall, C. P. Phoo, M. K. Liu, C. Vondrick, B. Hariharan, and K. Bala.<br> | ICLR | 2024-01-16 | Project | arXiv |
JAG | 2023-09-18 | Github | - | |
T-GRS | 2023-08-28 | Github | arXiv |
Survey Papers for Remote Sensing Vision-Language Tasks
Title | Venue | Date | Code | Note |
---|---|---|---|---|
arXiv | 2024-06-13 | Github | arXiv | |
Vision-Language Models in Remote Sensing: Current progress and future trends <br>X. Li, C. Wen, Y. Hu, Z. Yuan, and X. X. Zhu.<br> | MGRS | 2024-04-22 | - | - |
Language Integration in Remote Sensing: Tasks, datasets, and future directions <br>L. Bashmal, Y. Bazi, F. Melgani, M. M. Al Rahhal, and M. A. Al Zuair.<br> | MGRS | 2023-10-11 | - | - |
Brain-Inspired Remote Sensing Foundation Models and Open Problems: A Comprehensive Survey <br>L. Jiao et al.<br> | JSTARS | 2023-09-18 | - | - |
Others
Title | Venue | Date | Code | Note |
---|---|---|---|---|
On the Foundations of Earth and Climate Foundation Models <br>X. X. Zhu et al.<br> | arXiv | 2024-05-07 | Github | - |
On the Promises and Challenges of Multimodal Foundation Models for Geographical, Environmental, Agricultural, and Urban Planning Applications <br>C. Tan et al.<br> | arXiv | 2023-12-23 | - | - |
arXiv | 2023-11-24 | Github | - | |
The Potential of Visual ChatGPT for Remote Sensing <br>L. P. Osco, E. L. de Lemos, W. N. Gonçalves, A. P. M. Ramos, and J. Marcato Junior.<br> | Remote Sensing | 2023-06-22 | - | - |
Awesome Datasets
Datasets of Pre-Training for Alignment
Title | Venue | Date | Code | Note |
---|---|---|---|---|
arXiv | 2024-02-17 | Github | Link | |
arXiv | 2024-01-02 | Github | - | |
AAAI | 2024-03-24 | Github | arXiv |
Datasets of Multimodal Instruction Tuning
Latest Evaluation Benchmarks for Remote Sensing Vision-Language Tasks
Remote Sensing Image Captioning and Aerial Video Captioning
<p align="center"> <img src="./images/caption.jpg" width="80%" height="100%"> </p>Remote Sensing Visual Question Answering and Remote Sensing Visual Grounding
<p align="center"> <img src="./images/vqavg.jpg" width="80%" height="100%"> </p>Remote Sensing Image-Text Retrieval
<p align="center"> <img src="./images/itretrieval.jpg" width="80%" height="100%"> </p>Remote Sensing Scene Classification
<p align="center"> <img src="./images/rssc.jpg" width="80%" height="100%"> </p>π€ Contact
If you have any questions about this project, please feel free to contact zhanyangnwpu@gmail.com.