Home

Awesome

RSAdapter

The official PyTorch implementation of the paper "RSAdapter: Adapting Multimodal Models for Remote Sensing Visual Question Answering".

If you find our work useful in your research, please cite:

@article{wang2024rsadapter,
  title={RSAdapter: Adapting Multimodal Models for Remote Sensing Visual Question Answering},
  author={Wang, Yuduo and Ghamisi, Pedram},
  journal={IEEE Transactions on Geoscience and Remote Sensing},
  year={2024},
  publisher={IEEE}
}

Introduction

In this work, we introduce a novel method known as RSAdapter, which prioritizes runtime and parameter efficiency. RSAdapter comprises two key components: the Parallel Adapter and an additional linear transformation layer inserted after each fully connected (FC) layer within the Adapter. This approach not only improves adaptation to pretrained multimodal models but also allows the parameters of the linear transformation layer to be integrated into the preceding FC layers during inference, reducing inference costs.

<div align="center"> <img src=Figure/Flowchart.png width=80% /> </div>

Preparation

Training

python train_lr.py

python train_hr.py

python train_rsi.py

COMPARISON WITH SOTA

<div align="center"> <img src=Figure/Comp_1.png width=50% /> </div> <div align="center"> <img src=Figure/Comp_2.png width=50% /> </div>

TODO

Acknowledgement

The codes are based on transformers. The authors would also like to thank the contributors to the RSVQA and RSIVQA datasets.