Home

Awesome

ECCV2022: LAFF for Text-to-Video Retrieval

This is the official source code of our LAFF paper: Lightweight Attentional Feature Fusion: A New Baseline for Text-to-Video Retrieval.

image-20220704221254531

Environment

We used Anaconda to setup a deep learning workspace that supports PyTorch. Run the following script to install all the required packages.

conda create -n laff python==3.8 -y
conda activate laff
git clone https://github.com/ruc-aimc-lab/laff.git
cd laff
pip install -r requirements.txt

Downloads

Data

See the data page.

Trained Models

Provide model links here.

Code

The shell folder provides scripts that perform training from scratch.

Performance

MV-test3k

ModelR1R5R10Medr
W2VV++23.049.060.76
SEA19.944.356.57
CLIP-finetuned27.753.064.25
LAFF28.053.864.94
LAFF-ml29.154.965.84

MV-test1k

ModelR1R5R10Medr
W2VV++39.468.178.12
SEA37.267.178.32
CLIP-finetuned39.767.878.42
LAFF42.270.781.22
LAFF-ml42.671.8812

MSVD

ModelR1R5R10Medr
W2VV++37.871.081.62
SEA34.568.880.53
CLIP-finetuned44.674.784.12
LAFF45.275.884.32
LAFF-ml45.476.084.62

TGIF

ModelR1R5R10Medr
W2VV++2242.852.79
SEA16.433.642.517
CLIP-finetuned21.540.649.911
LAFF24.144.754.38
LAFF-ml24.545.054.58

VATEX

ModelR1R5R10Medr
W2VV++55.891.2961
SEA52.490.295.91
CLIP-finetuned53.387.594.01
LAFF57.791.395.91
LAFF-ml59.191.796.31

Citation

@inproceedings{eccv2022-laff,
title = {Lightweight Attentional Feature Fusion: A New Baseline for Text-to-Video Retrieval},
author = {Fan Hu and Aozhu Chen and Ziyue Wang and Fangming Zhou and Jianfeng Dong and Xirong Li},
year = {2022},
booktitle = {ECCV},
}

Contact

If you enounter any issue when running the code, please feel free to reach us either by creating a new issue in the github or by emailing