Home

Awesome

SkyEyeGPT: Unifying Remote Sensing Vision-Language Tasks via Instruction Tuning with Large Language Model

<br> <p align="center"> <img src="images/SkyEyeGPT.png" width="250"/> <p> <br> <div align="center"> <strong>Author: Yang Zhan, Zhitong Xiong, Yuan Yuan</strong>

<strong>School of Artificial Intelligence, OPtics, and ElectroNics (iOPEN), Northwestern Polytechnical University</strong>

</div>

This is the official repository for paper "SkyEyeGPT: Unifying Remote Sensing Vision-Language Tasks via Instruction Tuning with Large Language Model". [paper] [SkyEye-968k]

Please share a <font color='orange'>STAR ⭐</font> if this project does help

You can focus on remote sensing multimodal large language model (Vision-Language) here

📢 Latest Updates

This is an ongoing project. We will be working on improving it.


💬 SkyEyeGPT: Remote Sensing Multi-modal Chatbot

The online demo will be released.

<div align="center"> <img src="images/chatbot.png"/> </div>

<img src="images/SkyEyeGPT.png" height="30"> SkyEyeGPT: Architecture

The model and checkpoint are coming soon! 🚀

<div align="center"> <img src="images/model.png"/> </div>

🌋 SkyEye-968k: Unified RS Vision-Language Instruction

The download link of the unified remote sensing vision-language instruction dataset is here! 🚀

Download link: https://huggingface.co/datasets/ZhanYang-nwpu/SkyEye-968k

<div align="center"> <img src="images/dataset.png"/ height="400"> </div>

📦 Performance

<div align="center"> <img src="images/performance.png"/ height="400"> </div>

👁️ Visualization

1. Detailed description

<div align="center"> <img src="images/detailed_descr.png"/> </div>

2. Some testing samples of captioning, grounding, and VQA

<div align="center"> <img src="images/some_sample.png"/> </div>

👁️ Qualitative results

1. Remote Sensing Visual Grounding

<div align="center"> <img src="images/RSVG.png"/> </div>

2. Remote Sensing Phrase Grounding

<div align="center"> <img src="images/RSPG.png"/> </div>

3. Remote Sensing Image Captioning

<div align="center"> <img src="images/RSIC.png"/> </div>

4. UAV Aerial Video Captioning

<div align="center"> <img src="images/UAVC.png"/> </div>

5. Remote Sensing Visual Question Answering

<div align="center"> <img src="images/RSVQA.png"/> </div>

6. Remote Sensing Referring Expression Generation

<div align="center"> <img src="images/RSREG.png"/> </div>

7. Remote Sensing Scene Classification

<div align="center"> <img src="images/RSSC.png"/> </div>

🔍 Quantitative results

1. Remote Sensing Image Captioning

<div align="center"> <img src="images/T_RSIC1.png"/> </div> <div align="center"> <img src="images/T_RSIC2.png"/> </div>

2. UAV Aerial Video Captioning

<div align="center"> <img src="images/T_UAVC.png"/> </div>

3. Remote Sensing Visual Grounding

<div align="center"> <img src="images/T_RSVG.png"/ height="250"> </div>

4. Remote Sensing Visual Question Answering

<div align="center"> <img src="images/T_RSVQA1.png"/> </div> <div align="center"> <img src="images/T_RSVQA2.png"/ height="250"> </div>

📜 Citation

@misc{zhan2024skyeyegpt,
      title={SkyEyeGPT: Unifying Remote Sensing Vision-Language Tasks via Instruction Tuning with Large Language Model}, 
      author={Yang Zhan and Zhitong Xiong and Yuan Yuan},
      year={2024},
      eprint={arXiv:2401.09712},
      archivePrefix={arXiv}
}

🙏 Acknowledgement

Our code is based on MiniGPT-4, shikra, and MiniGPT-v2. We sincerely appreciate their contributions and authors for releasing source codes. We are thankful to EVA and LLaMA2 for releasing their models as open-source contributions. I would like to thank Xiong zhitong and Yuan yuan for helping the manuscript. I also thank the School of Artificial Intelligence, OPtics, and ElectroNics (iOPEN), Northwestern Polytechnical University for supporting this work.

🤖 Contact

If you have any questions about this project, please feel free to contact zhanyangnwpu@gmail.com.