Home

Awesome

🌏 EarthGPT: A Universal Multi-modal Large Language Model for Multi-sensor Image Comprehension in Remote Sensing Domain

Official repository for EarthGPT. :smile:

Authors: Wei Zhang*, Miaoxin Cai*, Tong Zhang, Yin Zhuang, and Xuerui Mao

:mega: News

:sparkles: Introduction

EarthGPT is a pioneering model designed to seamlessly unify multi-sensor and diverse remote sensing intelligent visual interpretation tasks in a unified framework, guided by user language instructions. EarthGPT is versatile at performing visual-language dialogues across optical, SAR, and infrared images. EarthGPT's capabilities extend to a wide range of tasks including scene classification, image description, visual question answering, target description, visual localization, and object detection.

<div align="center"> <img src="images/examples.png"> </div>

:sparkles: MMRS-1M: Multi-sensor remote sensing instruction dataset

<u>The entire data of MMRS-1M is coming soon! 🚀</u>

MMRS-1M is the largest multi-modal multi-sensor RS instruction-following dataset, consisting of over 1M image-text pairs that include optical, SAR, and infrared RS images.

We release 9000+ image-text pairs in the following link.

Link:https://pan.baidu.com/s/1hN7RXQv5xo5Fyq0nHzUlzg

PWd:haha

:bookmark: Citation

@article{zhang2024earthgpt,
  title={Earthgpt: A universal multi-modal large language model for multi-sensor image comprehension in remote sensing domain},
  author={Zhang, Wei and Cai, Miaoxin and Zhang, Tong and Zhuang, Yin and Mao, Xuerui},
  journal={IEEE Transactions on Geoscience and Remote Sensing},
  year={2024},
  publisher={IEEE}
}

:memo: Acknowledgment

This paper benefits from llama. Thanks for their wonderful work.

:envelope: Contact

If you have any questions about EarthGPT, please feel free to contact w.w.zhanger@gmail.com.