Awesome

RSAdapter

The official PyTorch implementation of the paper "RSAdapter: Adapting Multimodal Models for Remote Sensing Visual Question Answering".

If you find our work useful in your research, please cite:

@article{wang2024rsadapter,
  title={RSAdapter: Adapting Multimodal Models for Remote Sensing Visual Question Answering},
  author={Wang, Yuduo and Ghamisi, Pedram},
  journal={IEEE Transactions on Geoscience and Remote Sensing},
  year={2024},
  publisher={IEEE}
}

Introduction

In this work, we introduce a novel method known as RSAdapter, which prioritizes runtime and parameter efficiency. RSAdapter comprises two key components: the Parallel Adapter and an additional linear transformation layer inserted after each fully connected (FC) layer within the Adapter. This approach not only improves adaptation to pretrained multimodal models but also allows the parameters of the linear transformation layer to be integrated into the preceding FC layers during inference, reducing inference costs.

Preparation

Download the RSVQA and RSIVQA datasets.

Training

for RSVQA-LR dataset
- Change the default path of image files

python train_lr.py

for RSVQA-HR dataset
- Change the default path of image files

python train_hr.py

for RSIVQA dataset
- Change the default path of image files
- Since RSIVQA comprises multiple datasets with varying image sizes, we first resize all images to a unified size of 256 × 256 before feeding them into the model. Please resize images before training the model on RSIVQA dataset.

python train_rsi.py

COMPARISON WITH SOTA

TODO

Add Inference code

Acknowledgement

The codes are based on transformers. The authors would also like to thank the contributors to the RSVQA and RSIVQA datasets.