Awesome

Factorizable Net (F-Net)

This is pytorch implementation of our ECCV-2018 paper: Factorizable Net: An Efficient Subgraph-based Framework for Scene Graph Generation. This project is based on our previous work: Multi-level Scene Description Network.

Progress

Guide for Project Setup
Guide for Model Evaluation with pretrained model
Guide for Model Training
Uploading pretrained model and format-compatible datasets.
Update the Model link for VG-DR-Net (We will upload a new model by Aug. 27).
Update the Dataset link for VG-DR-Net.
A demonstration of our Factorizable Net
Migrate to PyTorch 1.0.1
Multi-GPU support (beta version): one image per GPU

Updates

Feb 26, 2019: Now we release our beta [Multi-GPU] version of Factorizable Net. Find the stable version at branch 0.3.1
Aug 28, 2018: Bug fix for running the evaluation with "--use_gt_boxes". VG-DR-Net has some self-relations, e.g. A-relation-A. Previously, we assumed there is no such relation. This commit may influence the model performance on Scene Graph Generation.

Project Settings

Install the requirements (you can use pip or Anaconda):

conda install pip pyyaml sympy h5py cython numpy scipy click
conda install -c menpo opencv3
conda install pytorch torchvision cudatoolkit=8.0 -c pytorch 
pip install easydict

Clone the Factorizable Net repository

git clone git@github.com:yikang-li/FactorizableNet.git

Build the Cython modules for nms, roi pooling,roi align modules
```
cd lib
make all
cd ..
```
Download the three datasets VG-MSDN, VG-DR-Net, VRD to F-Net/data. And extract the folders with tar xzvf ${Dataset}.tgz. We have converted the original annotations to json version.
Download Visual Genome images and VRD images.
Link the image data folder to target folder: ln -s /path/to/images F-Net/data/${Dataset}/images
- p.s. You can change the default data directory by modifying dir in options/data_xxx.json.
[optional] Download the pretrained RPN for Visual Genome and VRD. Place them into output/.
[optional] Download the pretrained Factorizable Net on VG-MSDN, VG-DR-Net and VG-VRD, and place them to output/trained_models/

Project Organization

There are several subfolders contained:

lib: dataset Loader, NMS, ROI-Pooling, evaluation metrics, etc. are listed in the folder.
options: configurations for Data, RPN, F-Net and hyperparameters.
models: model definitions for RPN, Factorizable and related modules.
data: containing VG-DR-Net (svg/), VG-MSDN (visual_genome/) and VRD (VRD/).
output: storing the trained model, checkpoints and loggers.

Evaluation with our Pretrained Models

Pretrained models on VG-MSDN, VG-DR-Net and VG-VRD are provided. --evaluate is provided to enable evaluation mode. Additionally, we also provide --use_gt_boxes to fed the ground-truth object bounding boxes instead of RPN proposals.

Evaluation on VG-MSDN with pretrained. Scene Graph Generation results: Recall@50: 12.984%, Recall@100: 16.506%.

CUDA_VISIBLE_DEVICES=0 python train_FN.py --evaluate --dataset_option=normal \
	--path_opt options/models/VG-MSDN.yaml \
	--pretrained_model output/trained_models/Model-VG-MSDN.h5

Evaluation on VG-VRD with pretrained. : Scene Graph Generation results: Recall@50: 19.453%, Recall@100: 24.640%.

CUDA_VISIBLE_DEVICES=0 python train_FN.py --evaluate \
	--path_opt options/models/VRD.yaml \
	--pretrained_model output/trained_models/Model-VRD.h5

Evaluation on VG-DR-Net with pretrained. Scene Graph Generation results: Recall@50: 19.807%, Recall@100: 25.488%.

CUDA_VISIBLE_DEVICES=0 python train_FN.py --evaluate --dataset_option=normal \
	--path_opt options/models/VG-DR-Net.yaml \
	--pretrained_model output/trained_models/Model-VG-DR-Net.h5

Training

Training Region Proposal Network (RPN). The shared conv layers are fixed. We also provide pretrained RPN on Visual Genome and VRD.

# Train RPN for VG-MSDN and VG-DR-Net
CUDA_VISIBLE_DEVICES=0 python train_rpn.py --dataset_option=normal 

# Train RPN for VRD
CUDA_VISIBLE_DEVICES=0 python train_rpn_VRD.py

Training Factorizable Net: detailed training options are included in options/models/.

# Train F-Net on VG-MSDN:
CUDA_VISIBLE_DEVICES=0 python train_FN.py --dataset_option=normal \
	--path_opt options/models/VG-MSDN.yaml --rpn output/RPN.h5
	
# Train F-Net on VRD:
CUDA_VISIBLE_DEVICES=0 python train_FN.py  \
	--path_opt options/models/VRD.yaml --rpn output/RPN_VRD.h5
	
# Train F-Net on VG-DR-Net:
CUDA_VISIBLE_DEVICES=0 python train_FN.py --dataset_option=normal \
	--path_opt options/models/VG-DR-Net.yaml --rpn output/RPN.h5

--rpn xxx.h5 can be ignored in end-to-end training from pretrained VGG16. Sometime, unexpected and confusing errors appear. Ignore it and restart to training.

For better results, we usually re-train the model with additional epochs by resuming the training from the checkpoint with --resume ckpt:

# Resume F-Net training on VG-MSDN:
CUDA_VISIBLE_DEVICES=0 python train_FN.py --dataset_option=normal \
	--path_opt options/models/VG-MSDN.yaml --resume ckpt --epochs 30

Acknowledgement

We thank longcw for his generous release of the PyTorch Implementation of Faster R-CNN.

Reference

If you find our project helpful, your citations are highly appreciated:

@inproceedings{li2018fnet,
author={Li, Yikang and Ouyang, Wanli and Bolei, Zhou and Jianping, Shi and Chao, Zhang and Wang, Xiaogang},
title={Factorizable Net: An Efficient Subgraph-based Framework for Scene Graph Generation},
booktitle = {ECCV},
year = {2018}
}

We also have two papers regarding to scene graph generation / visual relationship detection:

@inproceedings{li2017msdn,
author={Li, Yikang and Ouyang, Wanli and Zhou, Bolei and Wang, Kun and Wang, Xiaogang},
title={Scene graph generation from objects, phrases and region captions},
booktitle = {ICCV},
year = {2017}
}

@inproceedings{li2017vip,
author={Li, Yikang and Ouyang, Wanli and Zhou, Bolei and Wang, Kun and Wang, Xiaogang},
title={ViP-CNN: Visual Phrase Guided Convolutional Neural Network},
booktitle = {CVPR},
year = {2017}
}

License:

The pre-trained models and the Factorizable Network technique are released for uncommercial use.

Contact Yikang LI if you have questions.