Awesome

RS-GPT4V: A Unified Multimodal Instruction-Following Dataset for Remote Sensing Image Understanding

Linrui Xu, Ling Zhao, Wang Guo, Qiujun Li, Kewang Long, Kaiqi Zou, Yuhan Wang, Haifeng Li☨

News

We will be releasing the complete dataset, scripts, and model weights soon!

[2024/06/18]: 🔥 Our paper now is available at arxiv.
[2024/06/25]: 🔥 Our data will be released to huggingface soon.
[2024/07/01]: 🔥 Our data has been released to onedrive.

RS-GPT4V Dataset

RS-GPT4V integrates advanced tasks using both vision and language data. The dataset facilitates complex reasoning and detailed understanding of remote sensing images through multimodal instruction-following formats. Below are visual representations of the dataset's principles and structure:

Evolution of Remote Sensing Tasks and Data

Evolution from simple remote sensing tasks to complex instruction-based tasks using multimodal data.

Design Principles and Characteristics of the RS-GPT4V Dataset

Illustrates the dataset's design principles focusing on unity, diversity, correctness, complexity, richness, and robustness.

Principles-Driven Pipeline for RS-GPT4V Dataset Construction

The construction process follows a structured approach integrating data collection, instruction-response generation, and instruction-annotation adaptation.

Citation

If you find RS-GPT4V useful for your research and applications, please cite using this BibTeX:

@ARTICLE{10197260,
  author={Xu, Linrui and Guo, Wang and Li, Qiujun and Long, Kewang and Zou, Kaiqi and Wang, Yuhan and Li, Haifeng},
  title={RS-GPT4V: A Unified Multimodal Instruction-Following Dataset for Remote Sensing Image Understanding}, 
  year={2024},
  volume={},
  number={},
  pages={1-14},
  journal={arXiv}, 
  doi={https://arxiv.org/abs/2406.12479}
}