Home

Awesome

EaTR (ICCV 2023)

This repository provides the official PyTorch implementation of the ICCV 2023 paper:

Knowing Where to Focus: Event-aware Transformer for Video Grounding [arXiv]<br> Jinhyun Jang, Jungin Park, Jin Kim, Hyeongjun Kwon, Kwanghoon Sohn<br> Yonsei University

<p align="center"> <img src="model_overview.png"/> </p>

Prerequisites

<b>0. Clone this repo.</b>

<b>1. Install dependencies.</b>

We trained and evaluated our models with Python 3.7 and PyTorch 1.12.1.

# create conda env
conda create --name eatr python=3.7
# activate env
conda actiavte eatr
# install pytorch
conda install pytorch torchvision torchaudio cudatoolkit=11.0 -c pytorch
# install other python packages
pip install tqdm ipython easydict tensorboard tabulate scikit-learn pandas

<b>2. Prepare datasets.</b>

Download and extract each features under '../data/${dataset}/features/' directory.<br> The files are organized in the following manner:

EaTR
├── data
│   ├── qvhighlights
│   │   ├── *features
│   │   ├── highlight_{train,val,test}_release.jsonl
│   │   └── subs_train.jsonl
│   ├── charades
│   │   ├── *features
│   │   └── charades_sta_{train,test}_tvr_format.jsonl
│   └── activitynet
│       ├── *features
│       └── activitynet_{train,val_1,val_2}.jsonl
├── models
├── utils
├── scripts
├── README.md
├── train.py
└── ···

Training

Training can be launched by running the following command:

bash eatr/scripts/train.sh 

Inference

Once the model is trained, you can use the following command for inference:

bash eatr/scripts/inference.sh ${path-to-ckeckpoint} ${split-name}  

${split-name} can be one of val and test.

Citation

@inproceedings{Jang2023Knowing,
  title={Knowing Where to Focus: Event-aware Transformer for Video Grounding},
  author={Jang, Jinhyun and Park, Jungin and Kim, Jin and Kwon, Hyeongjun and Sohn, Kwanghoon},
  booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
  year={2023}
}