Home

Awesome

Cross-Modal-Adapter

<p align="center"> <img src='imgs/figure1.png' align="center" width="650px"> </p>

This repository will be the official Pytorch implementation for Cross-Modal Adapter.

Title:  Cross-Modal Adapter for Text-Video Retrieval
Authors:  Haojun Jiang, Jianke Zhang, Rui Huang, Chunjiang Ge, Zanlin Ni
     Jiwen Lu, Jie Zhou, Shiji Song, Gao Huang (Corresponding Author)
Institute: Tsinghua University, BNRist and Beijing Institute of Technology
Publish:   arXiv preprint (arXiv 2211.09623)
Contact:  jhj20 at mails dot tsinghua dot edu dot cn

<!-- ## BibTex @article{ma2022rethinking, title={Rethinking network design and local geometry in point cloud: A simple residual MLP framework}, author={Ma, Xu and Qin, Can and You, Haoxuan and Ran, Haoxi and Fu, Yun}, journal={arXiv preprint arXiv:2202.07123}, year={2022} } -->

Overview

In this paper, we present a novel Cross-Modal Adapter for parameter-efficient fine-tuning. Although surprisingly simple, our approach has three notable benefits: (1) reduces 99.6% of fine-tuned parameters, and alleviates the problem of overfitting, (2) saves approximately 30% of training time, and (3) allows all the pre-trained parameters to be fixed, enabling the pre-trained model to be shared across datasets.

<p align="center"> <img src='imgs/figure2.png' align="center" width="800px"> </p>

Results

1.Text2video and video2text retrieval resutls on MSR-VTT.

<p align="center"> <img src='imgs/msrvtt.png' align="center" width="800px"> </p>

2. Text2video and video2text retrieval resutls on MSVD, VATEX, DiDeMo, and ActivityNet.

<p align="center"> <img src='imgs/other_four.png' align="center" width="800px"> </p>

3. Training efficiency.

<p align="center"> <img src='imgs/efficiency_8gpu.png' align="center" width="800px"> </p>

4. Visualizations.

<p align="center"> <img src='imgs/visualization.png' align="center" width="800px"> </p>

Acknowledgment

Our implementation is mainly based on the following codebases. We gratefully thank the authors for their wonderful works.