Home

Awesome

A Faster Pytorch Implementation of Faster R-CNN

Introduction

This project is a faster pytorch implementation of faster R-CNN, aimed to accelerating the training of faster R-CNN object detection models. Recently, there are a number of good implementations:

During our implementing, we referred the above implementations, especailly longcw/faster_rcnn_pytorch. However, our implementation has several unique and new features compared with the above implementations:

Other Implementations

Tutorial

Benchmarking

We benchmark our code thoroughly on three datasets: pascal voc, coco and imagenet-200, using two different network architecture: vgg16 and resnet101. Below are the results:

1). PASCAL VOC 2007 (Train/Test: 07trainval/07test, scale=600, ROI Align)

model  #GPUsbatch sizelr      lr_decaymax_epoch    time/epochmem/GPUmAP
VGG-16    111e-35  6  0.76 hr3265MB  70.1
VGG-16    144e-39  0.50 hr9083MB  69.6
VGG-16    8161e-2 8  100.19 hr5291MB69.4
VGG-16    8241e-210110.16 hr11303MB69.2
Res-101111e-3570.88 hr3200 MB75.2
Res-101  144e-38  100.60 hr9700 MB74.9
Res-101  8161e-28  100.23 hr8400 MB75.2 
Res-101  8241e-210120.17 hr10327MB75.1  

2). COCO (Train/Test: coco_train+coco_val-minival/minival, scale=800, max_size=1200, ROI Align)

model#GPUsbatch sizelrlr_decaymax_epochtime/epochmem/GPUmAP
VGG-16    816  1e-2464.9 hr7192 MB29.2
Res-101  816  1e-24  66.0 hr10956 MB36.2
Res-101  816  1e-24  106.0 hr10956 MB37.0

NOTE. Since the above models use scale=800, you need add "--ls" at the end of test command.

3). COCO (Train/Test: coco_train+coco_val-minival/minival, scale=600, max_size=1000, ROI Align)

model#GPUsbatch sizelrlr_decaymax_epochtime/epochmem/GPUmAP
Res-101  824  1e-24  65.4 hr  10659 MB33.9
Res-101  824  1e-24  105.4 hr  10659 MB34.5

4). Visual Genome (Train/Test: vg_train/vg_test, scale=600, max_size=1000, ROI Align, category=2500)

model#GPUsbatch sizelrlr_decaymax_epochtime/epochmem/GPUmAP
VGG-16  1 P1004  1e-35  203.7 hr  12707 MB4.4

Thanks to Remi for providing the pretrained detection model on visual genome!

What we are going to do

Preparation

First of all, clone the code

git clone https://github.com/jwyang/faster-rcnn.pytorch.git

Then, create a folder:

cd faster-rcnn.pytorch && mkdir data

prerequisites

Data Preparation

Pretrained Model

We used two pretrained models in our experiments, VGG and ResNet101. You can download these two models from:

Download them and put them into the data/pretrained_model/.

NOTE. We compare the pretrained models from Pytorch and Caffe, and surprisingly find Caffe pretrained models have slightly better performance than Pytorch pretrained. We would suggest to use Caffe pretrained models from the above link to reproduce our results.

If you want to use pytorch pre-trained models, please remember to transpose images from BGR to RGB, and also use the same data transformer (minus mean and normalize) as used in pretrained model.

Compilation

As pointed out by ruotianluo/pytorch-faster-rcnn, choose the right -arch in make.sh file, to compile the cuda code:

GPU modelArchitecture
TitanX (Maxwell/Pascal)sm_52
GTX 960Msm_50
GTX 1080 (Ti)sm_61
Grid K520 (AWS g2.2xlarge)sm_30
Tesla K80 (AWS p2.xlarge)sm_37

More details about setting the architecture can be found here or here

Install all the python dependencies using pip:

pip install -r requirements.txt

Compile the cuda dependencies using following simple commands:

cd lib
sh make.sh

It will compile all the modules you need, including NMS, ROI_Pooing, ROI_Align and ROI_Crop. The default version is compiled with Python 2.7, please compile by yourself if you are using a different python version.

As pointed out in this issue, if you encounter some error during the compilation, you might miss to export the CUDA paths to your environment.

Train

Before training, set the right directory to save and load the trained models. Change the arguments "save_dir" and "load_dir" in trainval_net.py and test_net.py to adapt to your environment.

To train a faster R-CNN model with vgg16 on pascal_voc, simply run:

CUDA_VISIBLE_DEVICES=$GPU_ID python trainval_net.py \
                   --dataset pascal_voc --net vgg16 \
                   --bs $BATCH_SIZE --nw $WORKER_NUMBER \
                   --lr $LEARNING_RATE --lr_decay_step $DECAY_STEP \
                   --cuda

where 'bs' is the batch size with default 1. Alternatively, to train with resnet101 on pascal_voc, simple run:

 CUDA_VISIBLE_DEVICES=$GPU_ID python trainval_net.py \
                    --dataset pascal_voc --net res101 \
                    --bs $BATCH_SIZE --nw $WORKER_NUMBER \
                    --lr $LEARNING_RATE --lr_decay_step $DECAY_STEP \
                    --cuda

Above, BATCH_SIZE and WORKER_NUMBER can be set adaptively according to your GPU memory size. On Titan Xp with 12G memory, it can be up to 4.

If you have multiple (say 8) Titan Xp GPUs, then just use them all! Try:

python trainval_net.py --dataset pascal_voc --net vgg16 \
                       --bs 24 --nw 8 \
                       --lr $LEARNING_RATE --lr_decay_step $DECAY_STEP \
                       --cuda --mGPUs

Change dataset to "coco" or 'vg' if you want to train on COCO or Visual Genome.

Test

If you want to evlauate the detection performance of a pre-trained vgg16 model on pascal_voc test set, simply run

python test_net.py --dataset pascal_voc --net vgg16 \
                   --checksession $SESSION --checkepoch $EPOCH --checkpoint $CHECKPOINT \
                   --cuda

Specify the specific model session, chechepoch and checkpoint, e.g., SESSION=1, EPOCH=6, CHECKPOINT=416.

Demo

If you want to run detection on your own images with a pre-trained model, download the pretrained model listed in above tables or train your own models at first, then add images to folder $ROOT/images, and then run

python demo.py --net vgg16 \
               --checksession $SESSION --checkepoch $EPOCH --checkpoint $CHECKPOINT \
               --cuda --load_dir path/to/model/directoy

Then you will find the detection results in folder $ROOT/images.

Note the default demo.py merely support pascal_voc categories. You need to change the line to adapt your own model.

Below are some detection results:

<div style="color:#0000FF" align="center"> <img src="images/img3_det_res101.jpg" width="430"/> <img src="images/img4_det_res101.jpg" width="430"/> </div>

Webcam Demo

You can use a webcam in a real-time demo by running

python demo.py --net vgg16 \
               --checksession $SESSION --checkepoch $EPOCH --checkpoint $CHECKPOINT \
               --cuda --load_dir path/to/model/directoy \
               --webcam $WEBCAM_ID

The demo is stopped by clicking the image window and then pressing the 'q' key.

Authorship

This project is equally contributed by Jianwei Yang and Jiasen Lu, and many others (thanks to them!).

Citation

@article{jjfaster2rcnn,
    Author = {Jianwei Yang and Jiasen Lu and Dhruv Batra and Devi Parikh},
    Title = {A Faster Pytorch Implementation of Faster R-CNN},
    Journal = {https://github.com/jwyang/faster-rcnn.pytorch},
    Year = {2017}
}

@inproceedings{renNIPS15fasterrcnn,
    Author = {Shaoqing Ren and Kaiming He and Ross Girshick and Jian Sun},
    Title = {Faster {R-CNN}: Towards Real-Time Object Detection
             with Region Proposal Networks},
    Booktitle = {Advances in Neural Information Processing Systems ({NIPS})},
    Year = {2015}
}