Home

Awesome

Spatially Conditioned Graphs

NEW! Check out our most recent work on transformer-based HOI detection here.

<img src="./assets/scg.png" alt="graph" height="200" align="left"/> <img src="./assets/mbf.png" alt="multibranch_fusion" height="200" align="center"/>

This repository contains the official PyTorch implementation for ICCV 2021 paper

Frederic Z. Zhang, Dylan Campbell and Stephen Gould. Spatially Conditioned Graphs for Detecting Human-Object Interactions. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 13319-13327, October 2021.

[paper] [supp] [preprint] [video]

Citation

If you find this repository useful for your research, please kindly cite our paper:

@inproceedings{zhang2021scg,
  author    = {Frederic Z. Zhang, Dylan Campbell and Stephen Gould},
  title     = {Spatially Conditioned Graphs for Detecting Human–Object Interactions},
  booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
  month     = {October},
  year      = {2021},
  pages     = {13319-13327}
}

Table of Contents

Prerequisites

  1. Download the repository with git clone https://github.com/fredzzhang/spatially-conditioned-graphs
  2. Install the lightweight deep learning library Pocket
  3. Make sure the environment you created for Pocket is activated. You are good to go!

Demonstration

To generate qualitative results shown in the paper, please follow instructions in the diagnosis package at spatially-conditioned-graphs/diagnosis/.

Data Utilities

The HICO-DET and V-COCO repos have been incorporated as submodules for convenience. To download relevant data utilities, run the following commands.

cd /path/to/spatially-conditioned-graphs
git submodule init
git submodule update

HICO-DET

  1. Download the HICO-DET dataset
    1. If you have not downloaded the dataset before, run the following script
    cd /path/to/spatially-conditioned-graphs/hicodet
    bash download.sh
    
    1. If you have previously downloaded the dataset, simply create a soft link
    cd /path/to/spatially-conditioned-graphs/hicodet
    ln -s /path/to/hico_20160224_det ./hico_20160224_det
    
  2. Run a Faster R-CNN pre-trained on MS COCO to generate detections
cd /path/to/spatially-conditioned-graphs/hicodet/detections
python preprocessing.py --partition train2015
python preprocessing.py --partition test2015
  1. Download fine-tuned detections
cd /path/to/spatially-conditioned-graphs/download
bash download_finetuned_detections.sh
  1. Generate ground truth detections (optional)
cd /path/to/spatially-conditioned-graphs/hicodet/detections
python generate_gt_detections.py --partition test2015 

V-COCO

  1. Download the train2014 and val2014 partitions of the COCO dataset
    1. If you have not downloaded the dataset before, run the following script
    cd /path/to/spatially-conditioned-graphs/vcoco
    bash download.sh
    
    1. If you have previously downloaded the dataset, simply create a soft link. Note that
    cd /path/to/spatially-conditioned-graphs/vcoco
    ln -s /path/to/coco ./mscoco2014
    
  2. Run a Faster R-CNN pre-trained on MS COCO to generate detections
cd /path/to/spatially-conditioned-graphs/vcoco/detections
python preprocessing.py --partition trainval
python preprocessing.py --partition test

Testing

HICO-DET

  1. Download the checkpoint of our trained model
cd /path/to/spatially-conditioned-graphs/download
bash download_checkpoint.sh
  1. Test a model
cd /path/to/spatially-conditioned-graphs
CUDA_VISIBLE_DEVICES=0 python test.py --model-path checkpoints/scg_1e-4_b32h16e7_hicodet_e2e.pt

By default, detections from a pre-trained detector is used. To change sources of detections, use the argument --detection-dir, e.g. --detection-dir hicodet/detections/test2015_gt to select ground truth detections. Fine-tuned detections (if you downloaded them) are available under hicodet/detections.

  1. Cache detections for Matlab evaluation following HO-RCNN (optional)
cd /path/to/spatially-conditioned-graphs
CUDA_VISIBLE_DEVICES=0 python cache.py --model-path checkpoints/scg_1e-4_b32h16e7_hicodet_e2e.pt

By default, 80 .mat files, one for each object class, will be cached in a directory named matlab. Use the --cache-dir argument to change the cache directory. To change sources of detections, refer to the use of --detection-dir in the previous section.

As a reference, the performance of the provided model is shown in the table below.

DetectionsDefault SettingKnown Object Setting
Pre-trained on MS COCO(21.85, 18.11, 22.97)(25.53, 21.79, 26.64)
*Fine-tuned on HICO-DET (DRG)(31.33, 24.72, 33.31)(34.37, 27.18, 36.52)
Fine-tuned DETR-R101 (here)(29.26, 24.61, 30.65)(32.87, 27.89, 34.35)
Ground truth detections(51.53, 41.02, 54.67)(51.75, 41.40, 54.84)

*The detections provided by the DRG repo were produced by a Cascaded R-CNN with ResNeXt-152 backbone, which is not directly comparable to the commonly used object detectors in the literature.

V-COCO

We did not implement evaluation utilities for V-COCO, and instead use the utilities provided by Gupta. To generate the required pickle file, run the following script by correctly specifying the path to a model with --model-path

cd /path/to/spatially-conditioned-graphs
CUDA_VISIBLE_DEVICES=0 python cache.py --dataset vcoco --data-root vcoco \
    --detection-dir vcoco/detections/test \
    --cache-dir vcoco_cache --partition test \
    --model-path /path/to/a/model

This will generate a file named vcoco_results.pkl under vcoco_cache in the current directory. Please refer to the v-coco repo (not to be confused with vcoco, the submodule) for further instructions. Note that loading the pickle file requires a particular class CacheTemplate, which is shown below in its entirety.

from collections import defaultdict
class CacheTemplate(defaultdict):
    """A template for VCOCO cached results """
    def __init__(self, **kwargs):
        super().__init__()
        for k, v in kwargs.items():
            self[k] = v
    def __missing__(self, k):
        seg = k.split('_')
        # Assign zero score to missing actions
        if seg[-1] == 'agent':
            return 0.
        # Assign zero score and a tiny box to missing <action,role> pairs
        else:
            return [0., 0., .1, .1, 0.]

You can either add it into the evaluation code or save it as a seperate file to import from.

Training

HICO-DET

cd /path/to/spatially-conditioned-graphs
python main.py --world-size 8 --cache-dir checkpoints/hicodet &>log &

Specify the number of GPUs to use with the argument --world-size. The default sub-batch size is 4 (per GPU). The provided model was trained with 8 GPUs, with an effective batch size of 32. Reducing the effective batch size could result in slightly inferior performance. The default learning rate for batch size of 32 is 0.0001. As a rule of thumb, scale the learning rate proportionally when changing the batch size, e.g. 0.00005 for batch size of 16. It is recommended to redirect stdout and stderr to a file to save the training log (as indicated by &>log). To check the progress, run cat log | grep mAP, or alternatively you can go through the log with vim log. Also, the mAP logged follows a slightly different protocol. It does NOT necessarily correlate with the mAP that the community reports. It only serves as a diagnostic tool. The true performance of the model requires running a seperate test as shown in the previous section. By default, checkpoints will be saved under checkpoints in the current directory. For more arguments, run python main.py --help to find out. We follow the early stopping training strategy, and have concluded (using a validation set split from the training set) that the model at epoch 7 should be picked. Training on 8 GeForce GTX TITAN X devices takes about 5 hours.

V-COCO

cd /path/to/spatially-conditioned-graphs
python main.py --world-size 8 \
    --dataset vcoco --partitions trainval val --data-root vcoco \
    --train-detection-dir vcoco/detections/trainval \
    --val-detection-dir vcoco/detections/trainval \
    --print-interval 20 --cache-dir checkpoints/vcoco &>log &

Contact

If you have any questions regarding our paper or the repo, please post them in discussions. If you ran into issues related to the code, feel free to open an issue. Alternatively, you can contact me at frederic.zhang@anu.edu.au