Home

Awesome

Multi-level Scene Description Network

This is our implementation of Multi-level Scene Description Network in Scene Graph Generation from Objects, Phrases and Region Captions. The project is based on PyTorch version of faster R-CNN. (Update: model links have been updated. Sorry for the inconvenience.)

*Updates*

We have released our newly proposed scene graph generation model in our ECCV-2018 paper:

Factorizable Net: An Efficient Subgraph-based Framework for Scene Graph Generation.

Check the github repo Factorizable Net if you are interested.

Progress

We are still working on the project. If you are interested, please Follow our project.

Project Settings

  1. Install the requirements (you can use pip or Anaconda):

    conda install pip pyyaml sympy h5py cython numpy scipy
    conda install -c menpo opencv3
    conda install -c soumith pytorch torchvision cuda80 
    pip install easydict
    
  2. Clone the Faster R-CNN repository

    git clone git@github.com:yikang-li/MSDN.git
    
  3. Build the Cython modules for nms and the roi_pooling layer

    cd MSDN/faster_rcnn
    ./make.sh
    cd ..
    
  4. Download the trained full model and trained RPN, and place it to output/trained_model

  5. Download our cleansed Visual Genome dataset. And unzip it:

tar xzvf top_150_50.tgz
  1. Download Visual Genome images

  2. Place Images and cleansed annotations to coresponding folders:

mkdir -p data/visual_genome
cd data/visual_genome
ln -s /path/to/VG_100K_images_folder VG_100K_images
ln -s /path/to/downloaded_folder top_150_50

Training

Evaluation

Our pretrained full Model is provided for your evaluation for further implementation. (Please download the related files in advance.)

./eval.sh

Currently, the accuracy of our released version is slightly different from the reported results in the paper:Recall@50: 11.705%; Recall@100: 14.085%.

Acknowledgement

We thank longcw for his generously releasing the PyTorch Implementation of Faster R-CNN.

Reference

@inproceedings{li2017msdn,
author={Li, Yikang and Ouyang, Wanli and Zhou, Bolei and Wang, Kun and Wang, Xiaogang},
title={Scene graph generation from objects, phrases and region captions},
booktitle = {Proceedings of the IEEE International Conference on Computer Vision},
year = {2017}
}

License:

The pre-trained models and the MSDN technique are released for uncommercial use.

Contact Yikang LI if you have questions.