Awesome
Parameter-Efficient Transfer Learning for Remote Sensing Image-Text Retrieval
Author: Yuan Yuan, Yang Zhan, Zhitong Xiong
This is the official repository for paper "Parameter-Efficient Transfer Learning for Remote Sensing Image-Text Retrieval". [Paper]
School of Artificial Intelligence, OPtics, and ElectroNics (iOPEN), Northwestern Polytechnical University
Please share a <font color='orange'>STAR β</font> if this project does help
π’ Latest Updates
- Jan-11-2024: MRS-Adapter code is released. π₯π₯
- Aug-26-2023: dataset is released. π₯π₯
- Aug-10-2023: paper is accepted by T-GRS. π₯π₯
π¬ Introduction
This is the novel and sophisticated PETL framework for the RS image-text retrieval task, the PyTorch source code of the paper "Parameter-Efficient Transfer Learning for Remote Sensing Image-Text Retrieval". Specifically, the proposed framework consists of the pretrained CLIP model, the multimodal remote sensing adapter (MRS-Adapter), and a hybrid multi-modal contrastive (HMMC) learning objective.
- We design a simple yet effective loss function: the hybrid multi-modal contrastive (HMMC) loss for PETL-based RS image-text retrieval. Experimental results prove that the proposed HMMC loss is effective in further improving the performance on top of the proposed MRS-Adapter.
- We provide comprehensive empirical studies for the PETL-based RS image-text retrieval task. Our qualitative and quantitative results demonstrate that the proposed method is promising and of great potential for practical applications.
- Extensive experiments show that our approach can significantly reduce 98.9% of fine-tuning parameters without performance sacrifice compared to full fine-tuning. Our retrieval performance exceeds traditional methods by 7-13%. The comprehensive benchmark results are insightful for future research.
πNetwork Architecture
<p align="middle"> <img src="fig/model.jpg"> </p>π¦Download Data
The RSITR dataset can be downloaded from our Google Drive. The download link is available below:
https://drive.google.com/drive/folders/1F6WBQB-1PLqABh-uDv9m-KPdChakWcWY?usp=sharing
We expect the directory and file structure to be the following:
./ # current (project) directory
βββ README.md
βββ data/ # Dataset
βββ rsicd_precomp/ # RSICD
βββ rsicd_images/ # Remote sensing images
βββ train_caps.txt # Captions of training and validation set
βββ train_filename.txt # Image name of training and validation set
βββ test_caps.txt # Captions of test set
βββ test_filename.txt # Image name of test set
βββ rsitmd_precomp/ # RSITMD
βββ rsitmd_images/ # Remote sensing images
βββ train_caps.txt # Captions of training and validation set
βββ train_filename.txt # Image name of training and validation set
βββ test_caps.txt # Captions of test set
βββ test_filename.txt # Image name of test set
βββ ucm_precomp/ # UCM
βββ ucm_images/ # Remote sensing images
βββ train_caps.txt # Captions of training and validation set
βββ train_filename.txt # Image name of training and validation set
βββ test_caps.txt # Captions of test set
βββ test_filename.txt # Image name of test set
ποΈRemote Sensing Image-Text Retrieval Visualization
<p align="middle"> <img src="fig/result.jpg"> </p>πResults
<p align="middle"> <img src="fig/result_RSICD.png"> </p> <p align="middle"> <img src="fig/result_RSITMD.png"> </p> <p align="middle"> <img src="fig/result_UCM.png"> </p>πReference
If you found this code useful, please cite the paper. Welcome :+1:<big>Fork and Star
</big>:+1:, then I will let you know when we update.
@ARTICLE{10231134,
author={Yuan, Yuan and Zhan, Yang and Xiong, Zhitong},
journal={IEEE Transactions on Geoscience and Remote Sensing},
title={Parameter-Efficient Transfer Learning for Remote Sensing ImageβText Retrieval},
year={2023},
volume={61},
number={},
pages={1-14},
doi={10.1109/TGRS.2023.3308969}}
πAcknowledgments
We benchmark extensive state-of-the-art PETL methods on the PE-RSITR task. Our code is based on GaLR. We sincerely appreciate their contributions and authors for releasing source codes. I would like to thank Xiong zhitong and Yuan yuan for helping the manuscript. I also thank the School of Artificial Intelligence, OPtics, and ElectroNics (iOPEN), Northwestern Polytechnical University for supporting this work.