Awesome
STRONG: Spatio-Temporal Reinforcement Learning for Cross-Modal Video Moment Localization
This is our implementation for the paper:
Da Cao, Yawen Zeng, Meng Liu, Xiangnan He, Meng Wang, and Zheng Qin. 2020. STRONG: Spatio-Temporal Reinforcement Learning for Cross-Modal Video Moment Localization. In The ACM International Conference on Multimedia (ACM MM '20). ACM, Seattle, United States.
Environment Settings
We use the framework pytorch.
- pytorch version: '1.2.0'
- python version: '3.5'
STRONG
The released code consists of the following files.
--data
--log
--feature_all
--cal4log
--main
--MADDPG
--IMGDDPG
--model
--memory
--spp
--utils
--randomProcess
Example to run the codes
Run STRONG:
python main.py
Example to get the results
Run log:
python cal4log.py
There are a lot of experimental records in the ./log
Dataset
We provide two processed datasets: Charades-STA && TACoS The strategy of multi-scale sliding windows is utilized to segment each video with the size of [64, 128, 256, 512] frames with 80% overlap and we randomly selected 80% and 20% of them for training and testing, respectively.
All features are saved in ./feature_all_train, ./feature_all_test.
- These two processed features are available for downloading here: https://drive.google.com/open?id=1-AMToMuTlPRY1C2n0ZoyKrwBsVPehbFK
- The original videos and their corresponding caption annotations/querys: https://github.com/jiyanggao/TALL and http://www.coli.uni-saarland.de/projects/smile/tacos
Workspace
./data/video_cut2image that processes images in advance as a runtime workspace.
Last Update Date: Jul 28, 2020