

<font size='5'>RSGPT: A Remote Sensing Vision Language Model and Benchmark</font>

Yuan Hu, Jianlong Yuan, Congcong Wen, Xiaonan Lu, Xiang Li☨

☨corresponding author

<!-- <a href='https://rsgpt.github.io'><img src='https://img.shields.io/badge/Project-Page-Green'></a> -->

<a href='https://arxiv.org/abs/2307.15266'><img src='https://img.shields.io/badge/Paper-Arxiv-red'></a>

This is an ongoing project. We are working on increasing the dataset size.

Related Projects

<font size='5'>VRSBench: A Versatile Vision-Language Benchmark Dataset for Remote Sensing Image Understanding</font>

Xiang Li, Jian Ding, Mohamed Elhoseiny

<a href='https://vrsbench.github.io'><img src='https://img.shields.io/badge/Project-Page-Green'></a> <a href='https://arxiv.org/abs/2406.12384'><img src='https://img.shields.io/badge/Paper-Arxiv-red'></a> <a href='https://huggingface.co/datasets/xiang709/VRSBench'><img src='https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue'>

<font size='5'>Vision-language models in remote sensing: Current progress and future trends</font>

Xiang Li☨, Congcong Wen, Yuan Hu, Zhenghang Yuan, Xiao Xiang Zhu

<a href='[https://arxiv.org/abs/2307.15266](https://ieeexplore.ieee.org/abstract/document/10506064/)'><img src='https://img.shields.io/badge/Paper-Arxiv-red'></a>

:fire: Updates



The idea of finetuning our vision-language model is borrowed from MiniGPT-4. Our model is based on finetuning InstructBLIP using our RSICap dataset.


If you're using RSGPT in your research or applications, please cite using this BibTeX:

  title={RSGPT: A Remote Sensing Vision Language Model and Benchmark},
  author={Hu, Yuan and Yuan, Jianlong and Wen, Congcong and Lu, Xiaonan and Li, Xiang},
  journal={arXiv preprint arXiv:2307.15266},