Awesome
Weakly-Supervised Video Object Grounding from Text by Loss Weighting and Object Interaction
This is the source code from our paper Weakly-Supervised Video Object Grounding from Text by Loss Weighting and Object Interaction
Requirements (Recommended)
-
CUDA 9.0 and CUDNN v7.1
-
Install Miniconda (either Miniconda2 or 3, version 4.6+). We recommend using conda environment to install required packages, including Python 3.6, PyTorch 0.4.0 etc.:
MINICONDA_ROOT=[to your Miniconda root directory]
conda env create -f tools/conda_env_yc2_bb.yml --prefix $MINICONDA_ROOT/envs/yc2-bb
conda activate yc2-bb
python -m spacy download en # to download spacy English model
Data Preparation
Download the followings.
-
The YouCook2-BB annotation pack from the official website, [06/22/2024] Due to requests and inaccessibility of online videos, we are now sharing the raw video files for non-commercial, research purposes only. They can be found in Download pages of the offical website.
-
Region proposals [all-in-one] and feature files for each split [train(113GB), val(38GB), test(17GB)]. You can also extract features/proposals on your own using Faster RCNN PyTorch.
Place all the downloaded files under data/yc2
and uncompress.
Running
Training
The example command on running a 4-GPU distributed data parallel job:
CUDA_VISIBLE_DEVICES=0 python train.py --loss_weighting --obj_interact --checkpoint_path $checkpoint_path --cuda --world_size 4 &
CUDA_VISIBLE_DEVICES=1 python train.py --loss_weighting --obj_interact --checkpoint_path $checkpoint_path --cuda --world_size 4 &
CUDA_VISIBLE_DEVICES=2 python train.py --loss_weighting --obj_interact --checkpoint_path $checkpoint_path --cuda --world_size 4 &
CUDA_VISIBLE_DEVICES=3 python train.py --loss_weighting --obj_interact --checkpoint_path $checkpoint_path --cuda --world_size 4
(Optional) Set --world_size 1
to run in single-GPU mode.
(Optional) To visualize the training curves, we use visdom
(install through pip install visdom
). Start the server (probably in a tmux
or screen
) in the background with the command: visdom
. In your training command, add --enable_visdom
as a command argument.
Testing
You can download the pre-trained model from here (model_checkpoint=full-model.pth
) and place it under the checkpoint
dir.
python test.py --start_from ./checkpoint/$model_checkpoint --val_split validation --cuda
The evaluation server on the test set is now available on Codalab!
Visualization
This requires opencv2 and can be done by running command conda install -c menpo opencv
.
python vis.py --start_from ./checkpoint/$model_checkpoint --cuda
Notes
After releasing the original version of YouCook2-BB, we have added 149/4316=3.5% more annotations to the dataset. As a result, the overall model performance has had a slight change: 30.1% now v.s. 30.3% before on the validation set and 32.0% now v.s. 31.7% before on the test set.
Citation
Please acknowledge the following paper if you use the code:
@inproceedings{ZhLoCoBMVC18,
author={Zhou, Luowei and Louis, Nathan and Corso, Jason J},
title={Weakly-Supervised Video Object Grounding from Text by Loss Weighting and Object Interaction},
booktitle = {British Machine Vision Conference},
year = {2018},
url = {http://bmvc2018.org/contents/papers/0070.pdf}
}