Home

Awesome

graph-rcnn.pytorch

Pytorch code for our ECCV 2018 paper "Graph R-CNN for Scene Graph Generation"

<div style="color:#0000FF" align="center"> <img src="figures/teaser_fig.png" width="850"/> </div> <!-- :balloon: 2019-06-04: Okaaay, time to reimplement Graph R-CNN on pytorch 1.0 and release a new benchmark for scene graph generation. It will also integrate other models like IMP, MSDN and Neural Motif Network. Stay tuned! :balloon: 2019-06-16: Plan is a bit delayed by ICCV rebuttal, but still on track. Stay tuned! -->

Introduction

This project is a set of reimplemented representative scene graph generation models based on Pytorch 1.0, including:

Our reimplementations are based on the following repositories:

Why we need this repository?

The goal of gathering all these representative methods into a single repo is to establish a more fair comparison across different methods under the same settings. As you may notice in recent literatures, the reported numbers for IMP, MSDN, Graph R-CNN and Neural Motifs are usually confusing, especially due to the big gap between IMP style methods (first three) and Neural Motifs-style methods (neural motifs paper and other variants built on it). We hope this repo can establish a good benchmark for various scene graph generation methods, and contribute to the research community!

Checklist

Benchmarking

Object Detection

sourcebackbonemodelbslrlr_decaymAP@0.5mAP@0.50:0.95
this repoRes-101faster r-cnn65e-370k,90k24.812.8

Scene Graph Generation (Frequency Prior Only)

sourcebackbonemodelbslrlr_decaysgdet@20sgdet@50sgdet@100
this repoRes-101freq65e-370k,90k19.425.028.5
motifnetVGG-16freq---17.723.527.6
<!-- Resnet-101 | freq-overlap | 6 | 5e-3 | (70k, 90k) | 100k | - | - | - -->

* freq = frequency prior baseline

Scene Graph Generation (Joint training)

sourcebackbonemodelbslrlr_decaysgdet@20sgdet@50sgdet@100
this repoRes-101vanilla65e-370k,90k10.414.316.8
<!---[this repo](https://drive.google.com/open?id=1Vb-gX3_OLhzgdNseXgS_2DiLmJ8qiG8P) | Res-101 | freq | 6 | 5e-3 | 70k,90k | 100k | 19.4 | 25.0 | 28.5-->

Scene Graph Generation (Step training)

sourcebackbonemodelbslrmAP@0.5sgdet@20sgdet@50sgdet@100
this repoRes-101vanilla85e-324.210.513.816.1
this repoRes-101imp85e-324.216.721.725.2
motifnetVGG-16imp---14.620.724.5
<!--this repo | Res-101 | msdn | 8 | 5e-3 | 20k,30k | - | - | - | - --> <!--this repo | Res-101 | grcnn | 8 | 5e-3 | 20k,30k | - | - | - | - -->

* you can click 'this repo' in above table to download the checkpoints.

The above table shows that our reimplementation of baseline and imp algorithm match the performance reported in mofitnet.

Comparisons with other Methods

modelbslrmAP@0.5sgdet@20sgdet@50sgdet@100
vanilla85e-324.210.513.816.1
imp85e-324.216.721.725.2
msdn85e-324.218.323.627.1
graph-rcnn(no att)85e-324.218.823.726.2

* you can click 'model' in above table to download the checkpoints.

Accordingly, all models achieved significantly better numbers compared with those reported in the original papers. The main reason for these consistant improvements are due to the per-class NMS of object proposals before sending to relationship head. Also, we found the gap between different methods are also reduced significantly. Our model has similar performance to msdn, while better performance than imp.

Adding RelPN to other Methods

We added our RelPN to various algorithms and compared with the original version.

modelrelpnbslrmAP@0.5sgdet@20sgdet@50sgdet@100
vanillano85e-324.210.513.816.1
vanillayes85e-324.212.315.817.7
impno85e-324.216.721.725.2
impyes85e-324.219.223.926.3
msdnno85e-324.218.323.627.1
msdnyes85e-324.219.223.826.2

* you can click 'model' in above table to download the checkpoints.

Above, we can see consistant improvements for different algorithms, which demonstrates the effeciveness of our proposed relation proposal network (RelPN).

Also, since much less object pairs (256, originally > 1k) are fed to relation head for predicate classification, the inference time for the models with RelPN is reduced significantly (~2.5 times faster)

Tips and Tricks

Some important observations based on the experiments:

Installation

Prerequisites

Dependencies

Install all the python dependencies using pip:

pip install -r requirements.txt

and libraries using apt-get:

apt-get update
apt-get install libglib2.0-0
apt-get install libsm6

Data Preparation

AnnotationsObjectPredicate
#Categories15050

First, make a folder in the root folder:

mkdir -p datasets/vg_bm

Here, the suffix 'bm' is in short of "benchmark" representing the dataset for benchmarking. We may have other format of vg dataset in the future, e.g., more categories.

Then, download the data and preprocess the data according following this repo. Specifically, after downloading the visual genome dataset, you can follow this guidelines to get the following files:

datasets/vg_bm/imdb_1024.h5
datasets/vg_bm/bbox_distribution.npy
datasets/vg_bm/proposals.h5
datasets/vg_bm/VG-SGG-dicts.json
datasets/vg_bm/VG-SGG.h5

The above files will provide all the data needed for training the object detection models and scene graph generation models listed above.

AnnotationsObjectAttributePredicate
#Categories160040020

Soon, I will add this data loader to train bottom-up and top-down model on more object/predicate/attribute categories.

AnnotationsObjectAttributePredicate
#Categories2500~600~400

This data loader further increase the number of categories for training more fine-grained visual representations.

Compilation

Compile the cuda dependencies using the following commands:

cd lib/scene_parser/rcnn
python setup.py build develop

After that, you should see all the necessary components, including nms, roi_pool, roi_align are compiled successfully.

Train

Train object detection model:

python main.py --config-file configs/faster_rcnn_res101.yaml

Multi-GPU training:

python -m torch.distributed.launch --nproc_per_node=$NGPUS main.py --config-file configs/faster_rcnn_res101.yaml

where NGPUS is the number of gpus available.

Train scene graph generation model jointly (train detector and sgg as a whole):

python main.py --config-file configs/sgg_res101_joint.yaml --algorithm $ALGORITHM

Multi-GPU training:

python -m torch.distributed.launch --nproc_per_node=$NGPUS main.py --config-file configs/sgg_res101_joint.yaml --algorithm $ALGORITHM

where NGPUS is the number of gpus available. ALGORIHM is the scene graph generation model name.

Train scene graph generation model stepwise (train detector first, and then sgg):

python main.py --config-file configs/sgg_res101_step.yaml --algorithm $ALGORITHM

Multi-GPU training:

python -m torch.distributed.launch --nproc_per_node=$NGPUS main.py --config-file configs/sgg_res101_step.yaml --algorithm $ALGORITHM

where NGPUS is the number of gpus available. ALGORIHM is the scene graph generation model name.

Evaluate

Evaluate object detection model:

python main.py --config-file configs/faster_rcnn_res101.yaml --inference --resume $CHECKPOINT

where CHECKPOINT is the iteration number. By default it will evaluate the whole validation/test set. However, you can specify the number of inference images by appending the following argument:

--inference $YOUR_NUMBER

:warning: If you want to evaluate the model at your own path, just need to change the MODEL.WEIGHT_DET to your own path in faster_rcnn_res101.yaml.

Evaluate scene graph frequency baseline model:

In this case, you do not need any sgg model checkpoints. To get the evaluation result, object detector is enough. Run the following command:

python main.py --config-file configs/sgg_res101_{joint/step}.yaml --inference --use_freq_prior

In the yaml file, please specify the path MODEL.WEIGHT_DET for your object detector.

Evaluate scene graph generation model:

python main.py --config-file configs/sgg_res101_{joint/step}.yaml --inference --resume $CHECKPOINT --algorithm $ALGORITHM
python main.py --config-file configs/sgg_res101_{joint/step}.yaml --inference --resume $CHECKPOINT --algorithm $ALGORITHM --use_freq_prior

Similarly you can also append the ''--inference $YOUR_NUMBER'' to perform partially evaluate.

:warning: If you want to evaluate the model at your own path, just need to change the MODEL.WEIGHT_SGG to your own path in sgg_res101_{joint/step}.yaml.

Visualization

If you want to visualize some examples, you just simple append the command with:

--visualize

Citation

@inproceedings{yang2018graph,
    title={Graph r-cnn for scene graph generation},
    author={Yang, Jianwei and Lu, Jiasen and Lee, Stefan and Batra, Dhruv and Parikh, Devi},
    booktitle={Proceedings of the European Conference on Computer Vision (ECCV)},
    pages={670--685},
    year={2018}
}

Acknowledgement

We appreciate much the nicely organized code developed by maskrcnn-benchmark. Our codebase is built mostly based on it.