Home

Awesome

Efficient One-stage Video Object Detection by Exploiting Temporal Consistency (ECCV22)

License

By Guanxiong Sun.

This repo contains the PyTorch implementations of the paper "Efficient One-stage Video Object Detection by Exploiting Temporal Consistency" published in ECCV 2022.

The code based on two open-source toolboxes: mmtracking and mmdetection.

Main Results

Pretrained models and logs are available at the GoogleDrive.

ModelBackboneAPAP50AP75AP smallAP mediumAP largeModel and Log
FCOS+LPNResNet-10154.079.759.39.826.660.4GoogleDrive

Installation

Requirements:

Installation

# create conda environment
conda create --name eovod -y python=3.7
conda activate eovod

# install PyTorch 1.8.0 with cuda 10.2
conda install pytorch==1.8.0 torchvision==0.9.0 cudatoolkit=10.2 -c pytorch

# install mmcv-full 1.3.17
pip install mmcv-full==1.3.17 -f https://download.openmmlab.com/mmcv/dist/cu102/torch1.8.0/index.html

# install other requirements
pip install -r requirements.txt

See here for different versions of MMCV compatible to different PyTorch and CUDA versions.

Optionally you can compile mmcv from source if you need to develop both mmcv and mmdet. Refer to the guide for details.

Data preparation

Download Datasets

Please download ILSVRC2015 DET and ILSVRC2015 VID dataset from here. After that, we recommend to symlink the path to the datasets to datasets/. And the path structure should be as follows:

./data/ILSVRC/ ./data/ILSVRC/Annotations/DET ./data/ILSVRC/Annotations/VID ./data/ILSVRC/Data/DET ./data/ILSVRC/Data/VID ./data/ILSVRC/ImageSets Note: List txt files under ImageSets folder can be obtained from here.

Convert Annotations

We use CocoVID to maintain all datasets in this codebase. In this case, you need to convert the official annotations to this style. We provide scripts and the usages are as following:

# ImageNet DET
python ./tools/convert_datasets/ilsvrc/imagenet2coco_det.py -i ./data/ILSVRC -o ./data/ILSVRC/annotations

# ImageNet VID
python ./tools/convert_datasets/ilsvrc/imagenet2coco_vid.py -i ./data/ILSVRC -o ./data/ILSVRC/annotations

Usage

Training

Training on a single GPU

python tools/train.py ${CONFIG_FILE} [optional arguments]

Training on multiple GPUs

We provide tools/dist_train.sh to launch training on multiple GPUs. The basic usage is as follows.

bash ./tools/dist_train.sh \
    ${CONFIG_FILE} \
    ${GPU_NUM} \
    [optional arguments]

Optional arguments remain the same as stated above.

If you would like to launch multiple jobs on a single machine, e.g., 2 jobs of 4-GPU training on a machine with 8 GPUs, you need to specify different ports (29500 by default) for each job to avoid communication conflict.

If you use dist_train.sh to launch training jobs, you can set the port in commands.

CUDA_VISIBLE_DEVICES=0,1,2,3 PORT=29500 ./tools/dist_train.sh ${CONFIG_FILE} 4
CUDA_VISIBLE_DEVICES=4,5,6,7 PORT=29501 ./tools/dist_train.sh ${CONFIG_FILE} 4

Example

  1. Train EOVOD(FCOS) and then evaluate AP at the last epoch.

    ./tools/dist_train.sh configs/vid/fcos_att/fcos_att_r101_fpn_9x_vid_caffe_random_level2_imagenet.py 8
    

Inference

This section will show how to test existing models on supported datasets. The following testing environments are supported:

During testing, different tasks share the same API and we only support samples_per_gpu = 1.

You can use the following commands for testing:

# single-gpu testing
python tools/test.py ${CONFIG_FILE} [--checkpoint ${CHECKPOINT_FILE}] [--out ${RESULT_FILE}] [--eval ${EVAL_METRICS}]

# multi-gpu testing
./tools/dist_test.sh ${CONFIG_FILE} ${GPU_NUM} [--checkpoint ${CHECKPOINT_FILE}] [--out ${RESULT_FILE}] [--eval ${EVAL_METRICS}]

Optional arguments:

Examples of testing VID model

Assume that you have already downloaded the checkpoints to the directory checkpoints/.

  1. Test EOVOD on ImageNet VID, and evaluate the bbox mAP.

    python tools/test.py configs/vid/fcos_att/fcos_att_r101_fpn_9x_vid_caffe_random_level2_imagenet.py \
        --checkpoint checkpoints/$CHECKPOINT_FILE \
        --out results.pkl \
        --eval bbox
    
  2. Test EOVOD with 8 GPUs on ImageNet VID, and evaluate the bbox mAP.

    ./tools/dist_test.sh configs/vid/fcos_att/fcos_att_r101_fpn_9x_vid_caffe_random_level2_imagenet.py 8 \
        --checkpoint checkpoints/$CHECKPOINT_FILE \
        --out results.pkl \
        --eval bbox