Awesome
Joint Event Detection and Description in Continuous Video Streams
Code released by Huijuan Xu (Boston University).
Introduction
We present the Joint Event Detection and Description Network (JEDDi-Net) that solves the dense captioning task in an end-to-end fashion. Our model continuously encodes the input video stream with three-dimensional convolutional layers, proposes variable-length temporal events based on pooled features, and transcribes the event proposals into captions with the consideration of visual and language context.
License
JEDDi-Net is released under the MIT License (refer to the LICENSE file for details).
Citing JEDDi-Net
If you find JEDDi-Net useful in your research, please consider citing:
@article{xu2019joint,
title={Joint Event Detection and Description in Continuous Video Streams},
author={Xu, Huijuan and Li, Boyang and Ramanishka, Vasili and Sigal, Leonid and Saenko, Kate},
journal={2019 IEEE Winter Conference on Applications of Computer Vision (WACV)},
year={2019}
}
Contents
Installation:
-
Clone the JEDDi-Net repository.
git clone --recursive git@github.com:VisionLearningGroup/JEDDi-Net.git
-
Build
Caffe3d
withpycaffe
(see: Caffe installation instructions).Note: Caffe must be built with Python support!
cd ./caffe3d
# If have all of the requirements installed and your Makefile.config in place, then simply do:
make -j8 && make pycaffe
-
Build JEDDi-Net lib folder.
cd ./lib make
Preparation:
-
Download the ground truth annatations and videos in ActivityNet Captions dataset.
-
Extract frames from downloaded videos in 25 fps.
-
Generate the pickle data for training and testing JEDDi-Net model.
cd ./preprocess # generate training data python generate_train_roidb_sorted.py # generate validation data python generate_val_roidb.py
Training:
-
Download the separately-trained segment proposal network(SPN) and captioning models ./pretrain/ .
-
In JEDDi-Net root folder, run:
bash ./experiments/denseCap_jeddiNet_end2end/script_train.sh
Testing:
-
Download one sample JEDDi-Net model to ./snapshot/ .
One JEDDi-Net model on ActivityNet Captions dataset is provided in: caffemodel .
The provided JEDDi-Net model has the METEOR score ~8.58% on the validation set.
-
In JEDDi-Net root folder, generate the prediction log file on the validation set.
bash ./experiments/denseCap_jeddiNet_end2end/test/script_test.sh
-
Generate the results.json file from the prediction log file.
cd ./experiments/denseCap_jeddiNet_end2end/test/ bash bash.sh
-
Follow the evaluation code to get the evaluation results.