Awesome
Tracking by Natural Language Specification
This repository contains the code for the following paper:
- Z. Li, R. Tao, E. Gavves, C. G. M. Snoek, A. W. M. Smeulders, Tracking by Natural Language Specification, in Computer Vision and Pattern Recognition (CVPR), 2017 (PDF)
@article{li2017cvpr,
title={Tracking by Natural Language Specification},
author={Li, Zhenyang and Tao, Ran and Gavves, Efstratios and Snoek, Cees G. M. and Smeulders, Arnold W. M.},
journal={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition},
year={2017}
}
Download Dataset
Lingual Lingual OTB99 Sentences
Lingual ImageNet Sentences
Please note that we use all the frames from original OTB100 dataset in our OTB99 videos, while for ImageNet videos we may only select a subsequence (see start/end frames we selected for each video in train.txt
or test.txt
).
How to use the demo code
Download and setup Caffe (our own branch)
- Caffe branch here (Note:
langtrackV3
branch notmaster
branch) - Compile Caffe with option
WITH_PYTHON_LAYER = 1
Download pre-trained models
-
Download natural language segmentation model caffemodel and copy to
MAIN_PATH/snapshots/lang_high_res_seg/_iter_25000.caffemodel
-
Download tracking model caffemodel and copy to
MAIN_PATH/VGG16.v2.caffemodel
Run demo code
ipython notebook code
Here we first demostrate how the model II in the paper works with example videos:
- Given an image and a natural language query, how to identify a target (applied on the first query frame of a video only)
demo/lang_seg_demo.ipynb
- Given a visual target (a box identified from step 1) and a sequence of frames, how to track the object in all the frames
demo/lang_track_demo.ipynb