Home

Awesome

STRONG: Spatio-Temporal Reinforcement Learning for Cross-Modal Video Moment Localization

This is our implementation for the paper:

Da Cao, Yawen Zeng, Meng Liu, Xiangnan He, Meng Wang, and Zheng Qin. 2020. STRONG: Spatio-Temporal Reinforcement Learning for Cross-Modal Video Moment Localization. In The ACM International Conference on Multimedia (ACM MM '20). ACM, Seattle, United States.

Environment Settings

We use the framework pytorch.

STRONG

The released code consists of the following files.

--data
--log
--feature_all
--cal4log
--main
--MADDPG
--IMGDDPG
--model
--memory
--spp
--utils
--randomProcess

Example to run the codes

Run STRONG:

python main.py

Example to get the results

Run log:

python cal4log.py

There are a lot of experimental records in the ./log

Dataset

We provide two processed datasets: Charades-STA && TACoS The strategy of multi-scale sliding windows is utilized to segment each video with the size of [64, 128, 256, 512] frames with 80% overlap and we randomly selected 80% and 20% of them for training and testing, respectively.

All features are saved in ./feature_all_train, ./feature_all_test.

Workspace

./data/video_cut2image that processes images in advance as a runtime workspace.

Last Update Date: Jul 28, 2020