Home

Awesome

<div align="center"> <h1> Side4Video: Spatial-Temporal Side Network for Memory-Efficient Image-to-Video Transfer Learning </h1> <h5 align="center">

arXiv

PWC PWC PWC PWC PWC PWC

</h5> </div>

This repository is the official implementation of Side4Video, which significantly reduces the training memory cost for action recognition and text-video retrieval tasks.

<div align=center> <img width="500" alt="image" src="imgs/mem.png"> </div> <!--[![Paper](http://img.shields.io/badge/Paper-arxiv.2307.08908-b31b1b.svg)](https://arxiv.org/abs/2307.08908)-->

πŸ“° News

<!-- - [ ] We will release code soon.-->

πŸ—ΊοΈ Overview

<!--[The motivation of Side4Video is to reduce the training cost, enabling us to train a larger model with limited resources.--> <div align=center> <img width="795" alt="image" src="imgs/Side4Video.png"> </div> <!-- ![Side4Video](imgs/Side4Video.png) -->

πŸš€ Training and Testing

For training and testing our model, please refer to the Recognition and Retrieval folders.

πŸ“Š Results

<div align=center> <img width="800" alt="image" src="imgs/memory.png"> </div> Our best model can achieve an accuracy of 67.3% & 74.6 on Something-Something V1 & V2, 88.6% on Kinetics-400 and a Recall@1 of 52.3% on MSR-VTT, 56.1% on MSVD, 68.8% on VATEX.

πŸ–‡οΈ Citation

If you find this repository is useful, please star🌟 this repo and citeπŸ–‡οΈ our paper.

@article{yao2023side4video,
  title={Side4Video: Spatial-Temporal Side Network for Memory-Efficient Image-to-Video Transfer Learning},
  author={Yao, Huanjin and Wu, Wenhao and Li, Zhiheng},
  journal={arXiv preprint arXiv:2311.15769},
  year={2023}
}

πŸ‘ Acknowledgment

Our implementation is mainly based on the following codebases. We are sincerely grateful for their work.

πŸ“§ Contact

If you have any questions about this repository, please file an issue or contact Huanjin Yao Gmail Badge or Wenhao Wu Gmail Badge.

<!--``` Huanjin Yao: yaohj22@mails.tsinghua.edu.cn Wenhao Wu: wenhao.wu@sydney.edu.au ```-->