Home

Awesome

SCDM

Code for the paper: "Semantic Conditioned Dynamic Modulation for Temporal Sentence Grounding in Videos"

requirements

Introduction

Temporal sentence grounding (TSG) in videos aims to detect and localize one target video segment, which semantically corresponds to a given sentence query. We propose a semantic conditioned dynamic modulation (SCDM) mechanism to help solve the TSG problem, which relies on the sentence semantics to modulate the temporal convolution operations for better correlating and composing the sentence-related video contents over time.

Download Features and Example Preprocessed Data

First, download the following files into the './data' folder:

Data Preprocessing

As denoted in our paper, we perform the temporal sentence grounding task in three datasets: Charades-STA, ActivityNet Captions, and TACoS. Before the model training and testing in these three datasets, please preprocess the data first.

python generate_charades_data.py

Preprocessed data will be put into the './data/Charades/h5py/' folder.

python generate_tacos_data.py

Preprocessed data for the TACoS dataset will be put into the './data/TACOS/h5py/' folder.

python generate_anet_data.py

Preprocessed data for the ActivityNet Captions dataset will be put into the './data/ActivityNet/h5py/' folder.

Model Training and Testing

python run_charades_scdm.py --task train

for model training, and run:

python run_charades_scdm.py --task test

for model testing. Other variant models are similar to train and test.

Citation

@inproceedings{yuan2019semantic,
  title={Semantic Conditioned Dynamic Modulation for Temporal Sentence Grounding in Videos},
  author={Yuan, Yitian and Ma, Lin and Wang, Jingwen and Liu, Wei and Zhu, Wenwu},
  booktitle={Advances in Neural Information Processing Systems},
  pages={534--544},
  year={2019}
}