Awesome
Awesome-Remote-Sensing-Multimodal-Large-Language-Models
π₯π₯π₯ Multimodal Large Language Models for Remote Sensing: A Survey
[Project Page]This Page |
School of Artificial Intelligence, OPtics, and ElectroNics (iOPEN), Northwestern Polytechnical University
<div align='center'> :sparkles: The <b>first survey</b> for Multimodal Large Language Models for Remote Sensing (RS-MLLMs). </div>β¨β¨β¨ Behold our meticulously curated trove of RS-MLLMs resources!!!
πππ‘ The website will be updated in real-time to track the latest state of RS-MLLMs!!!
πππ Feast your eyes on an assortment of model architecture, training pipelines, datasets, comprehensive evaluation benchmarks, intelligent agents for remote sensing, techniques for instruction tuning, and much more.
ππ₯π’ A collection of remote sensing multimodal large language model papers focusing on the vision-language domain.
<p align="center"> <img src="./images/1-timeline.jpg" width="100%" height="100%"> </p><font size=7><div align='center' > :apple: Multimodal Large Language Models for Remote Sensing </div></font>
<p align="center"> <img src="./images/6-timeline-agent.jpg" width="70%" height="100%"> </p> <font size=7><div align='center' > :apple: Intelligent Agents for Remote Sensing </div></font>Please share a <font color='orange'>STAR β</font> if this project does help
π’ Latest Updates
In this repository, we will collect and document researchers and their outstanding work related to remote sensing multimodal large language model (vision-language).
- The list will be continuously updated π₯π₯
- π¦ coming soon! π
- May-22-2024: The first RS-MLLMs review manuscript has been submitted for review. π₯π₯
<font size=5><center><b> Table of Contents </b> </center></font>
- Awesome Papers
- Awesome Datasets
- Latest Evaluation Benchmarks for Remote Sensing Vision-Language Tasks
Awesome Papers
Multimodal Large Language Models for Remote Sensing
Intelligent Agents for Remote Sensing
Title | Venue | Date | Code | Note |
---|---|---|---|---|
RS-Agent: Automating Remote Sensing Tasks through Intelligent Agents <br>W. Xu, Z. Yu, Y. Wang, J. Wang, and M. Peng.<br> | arXiv | 2024-06-11 | - | - |
GeoLLM-Engine: A Realistic Environment for Building Geospatial Copilots <br>S. Singh, M. Fore, D. Stamoulis, and D. Group.<br> | arXiv | 2024-04-23 | - | - |
Evaluating Tool-Augmented Agents in Remote Sensing Platforms <br>S. Singh, M. Fore, and D. Stamoulis.<br> | arXiv | 2024-04-23 | - | - |
<br> Change-Agent: Towards Interactive Comprehensive Remote Sensing Change Interpretation and Analysis <br>C. Liu, K. Chen, H. Zhang, Z. Qi, Z. Zou, and Z. Shi.<br> | arXiv | 2024-04-01 | Github | - |
<br> Remote Sensing ChatGPT: Solving Remote Sensing Tasks with ChatGPT and Visual Models <br>H. Guo, X. Su, C. Wu, B. Du, L. Zhang, and D. Li.<br> | arXiv | 2024-01-17 | Github | - |
Tree-GPT: Modular Large Language Model Expert System for Forest Remote Sensing Image Understanding and Interactive Analysis <br>S. Du, S. Tang, W. Wang, X. Li, and R. Guo.<br> | arXiv | 2023-10-07 | - | - |
Vision-Language Pre-training Models for Remote Sensing
Title | Venue | Date | Code | Note |
---|---|---|---|---|
<br> RS5M and GeoRSCLIP: A Large Scale Vision-Language Dataset and A Large Vision-Language Model for Remote Sensing <br>Z. Zhang, T. Zhao, Y. Guo, and J. Yin.<br> | arXiv | 2024-01-02 | Github | accepted by IEEE-TGRS |
<br> RemoteCLIP: A Vision Language Foundation Model for Remote Sensing <br>F. Liu, D. Chen, Z. Guan, X. Zhou, J. Zhu, and J. Zhou.<br> | T-GRS | 2024-04-18 | Github | arXiv |
Remote Sensing Vision-Language Foundation Models without Annotations via Ground Remote Alignment <br>U. Mall, C. P. Phoo, M. K. Liu, C. Vondrick, B. Hariharan, and K. Bala.<br> | ICLR | 2024-01-16 | Project | arXiv |
<br> RS-CLIP: Zero Shot Remote Sensing Scene Classification via Contrastive Vision-Language Supervision <br>X. Li, C. Wen, Y. Hu, and N. Zhou.<br> | JAG | 2023-09-18 | Github | - |
<br> Parameter-Efficient Transfer Learning for Remote Sensing ImageβText Retrieval <br>Y. Yuan, Y. Zhan, and Z. Xiong.<br> | T-GRS | 2023-08-28 | Github | arXiv |
Survey Papers for Remote Sensing Vision-Language Tasks
Title | Venue | Date | Code | Note |
---|---|---|---|---|
<br>Remote Sensing Temporal Vision-Language Models: A Comprehensive Survey <br>C. Liu, J. Zhang, K. Chen, M. Wang, Z. Zou, and Z. Shi <br> | arXiv | 2024-12-03 | Github | arXiv |
From Pixels to Prose: Advancing Multi-Modal Language Models for Remote Sensing <br>X. Sun, B. Peng, C. Zhang, F. Jin, Q. Niu, J. Liu, K. Chen, M. Li, P. Feng, Z. Bi, M. Liu, and Y. Zhang.<br> | arXiv | 2024-11-05 | - | - |
<br>Foundation Models for Remote Sensing and Earth Observation: A Survey <br>A. Xiao, W. Xuan, J. Wang, J. Huang, D. Tao, S. Lu, and N. Yokoya.<br> | arXiv | 2024-10-22 | Github | arXiv |
<br>Advancements in Visual Language Models for Remote Sensing: Datasets, Capabilities, and Enhancement Techniques <br>L. Tao, H. Zhang, H. Jing, Y. Liu, K. Yao, C. Li, and X. Xue.<br> | arXiv | 2024-10-15 | Github | arXiv |
<br>Towards Vision-Language Geo-Foundation Model: A Survey <br>Y. Zhou, L. Feng, Y. Ke, X. Jiang, J. Yan, and W. Zhang.<br> | arXiv | 2024-06-13 | Github | arXiv |
Vision-Language Models in Remote Sensing: Current progress and future trends <br>X. Li, C. Wen, Y. Hu, Z. Yuan, and X. X. Zhu.<br> | MGRS | 2024-04-22 | - | - |
Language Integration in Remote Sensing: Tasks, datasets, and future directions <br>L. Bashmal, Y. Bazi, F. Melgani, M. M. Al Rahhal, and M. A. Al Zuair.<br> | MGRS | 2023-10-11 | - | - |
Brain-Inspired Remote Sensing Foundation Models and Open Problems: A Comprehensive Survey <br>L. Jiao et al.<br> | JSTARS | 2023-09-18 | - | - |
Others
Title | Venue | Date | Code | Note |
---|---|---|---|---|
On the Foundations of Earth and Climate Foundation Models <br>X. X. Zhu et al.<br> | arXiv | 2024-05-07 | Github | - |
On the Promises and Challenges of Multimodal Foundation Models for Geographical, Environmental, Agricultural, and Urban Planning Applications <br>C. Tan et al.<br> | arXiv | 2023-12-23 | - | - |
<br> Charting New Territories: Exploring the Geographic and Geospatial Capabilities of Multimodal LLMs <br>J. Roberts, T. LΓΌddecke, R. Sheikh, K. Han, and S. Albanie. <br> | arXiv | 2023-11-24 | Github | - |
The Potential of Visual ChatGPT for Remote Sensing <br>L. P. Osco, E. L. de Lemos, W. N. Gonçalves, A. P. M. Ramos, and J. Marcato Junior.<br> | Remote Sensing | 2023-06-22 | - | - |
Awesome Datasets
Datasets of Pre-Training for Alignment
Title | Venue | Date | Code | Note |
---|---|---|---|---|
<be> RSTeller: Scaling Up Visual Language Modeling in Remote Sensing with Rich Linguistic Semantics from Openly Available Data and Large Language Models <br>J. Ge, Y. Zheng, K. Guo, and J. Liang.<br> | arXiv | 2024-08-27 | Github | Link |
<be> ChatEarthNet: A Global-Scale, High-Quality Image-Text Dataset for Remote Sensing <br>Z. Yuan, Z. Xiong, L. Mou, and X. X. Zhu.<br> | arXiv | 2024-02-17 | Github | Link |
<br> RS5M and GeoRSCLIP: A Large Scale Vision-Language Dataset and A Large Vision-Language Model for Remote Sensing <br>Z. Zhang, T. Zhao, Y. Guo, and J. Yin.<br> | arXiv | 2024-01-02 | Github | - |
<br> SkyScript: A Large and Semantically Diverse Vision-Language Dataset for Remote Sensing <br>Z. Wang, R. Prabha, T. Huang, J. Wu, and R. Rajagopal.<br> | AAAI | 2024-03-24 | Github | arXiv |
Datasets of Multimodal Instruction Tuning
Latest Evaluation Benchmarks for Remote Sensing Vision-Language Tasks
Remote Sensing Image Captioning and Aerial Video Captioning
<p align="center"> <img src="./images/caption.jpg" width="80%" height="100%"> </p>Remote Sensing Visual Question Answering and Remote Sensing Visual Grounding
<p align="center"> <img src="./images/vqavg.jpg" width="80%" height="100%"> </p>Remote Sensing Image-Text Retrieval
<p align="center"> <img src="./images/itretrieval.jpg" width="80%" height="100%"> </p>Remote Sensing Scene Classification
<p align="center"> <img src="./images/rssc.jpg" width="80%" height="100%"> </p>π€ Contact
If you have any questions about this project, please feel free to contact zhanyangnwpu@gmail.com.