Awesome
RS-GPT4V: A Unified Multimodal Instruction-Following Dataset for Remote Sensing Image Understanding
Linrui Xu, Ling Zhao, Wang Guo, Qiujun Li, Kewang Long, Kaiqi Zou, Yuhan Wang, Haifeng Liβ¨
News
We will be releasing the complete dataset, scripts, and model weights soon!
- [2024/06/18]: π₯ Our paper now is available at arxiv.
- [2024/06/25]: π₯ Our data will be released to huggingface soon.
- [2024/07/01]: π₯ Our data has been released to onedrive.
RS-GPT4V Dataset
RS-GPT4V integrates advanced tasks using both vision and language data. The dataset facilitates complex reasoning and detailed understanding of remote sensing images through multimodal instruction-following formats. Below are visual representations of the dataset's principles and structure:
Evolution of Remote Sensing Tasks and Data
<p align="center"> <img src="https://github.com/GeoX-Lab/RS-GPT4V/assets/36953734/ec7f90f3-a25f-427a-9d98-206cd20aba3d" width="100%" alt="Evolution of Remote Sensing Tasks and Data"> </p>Evolution from simple remote sensing tasks to complex instruction-based tasks using multimodal data.
Design Principles and Characteristics of the RS-GPT4V Dataset
<p align="center"> <img src="https://github.com/GeoX-Lab/RS-GPT4V/assets/36953734/3e14241a-c05e-48fd-b29f-963708b53b74" width="100%" alt="Design Principles and Characteristics"> </p>Illustrates the dataset's design principles focusing on unity, diversity, correctness, complexity, richness, and robustness.
Principles-Driven Pipeline for RS-GPT4V Dataset Construction
<p align="center"> <img src="https://github.com/GeoX-Lab/RS-GPT4V/assets/36953734/bc0dfed8-3c9a-45f3-91e7-b08b51ae8817" width="100%" alt="Dataset Construction Pipeline"> </p>The construction process follows a structured approach integrating data collection, instruction-response generation, and instruction-annotation adaptation.
Citation
If you find RS-GPT4V useful for your research and applications, please cite using this BibTeX:
@ARTICLE{10197260,
author={Xu, Linrui and Guo, Wang and Li, Qiujun and Long, Kewang and Zou, Kaiqi and Wang, Yuhan and Li, Haifeng},
title={RS-GPT4V: A Unified Multimodal Instruction-Following Dataset for Remote Sensing Image Understanding},
year={2024},
volume={},
number={},
pages={1-14},
journal={arXiv},
doi={https://arxiv.org/abs/2406.12479}
}