Home

Awesome

Towards Hard-Positive Query Mining for DETR-based Human-Object Interaction Detection/Object Detection

by Xubin Zhong, Changxing Ding, Zijian Li and Shaoli Huang.

This repository contains the official implementation of the paper "Towards Hard-Positive Query Mining for DETR-based Human-Object Interaction Detection", which is accepted to ECCV2022.

<div align="center"> <img src=".github/overview.jpg" width="900px" /> </div>

To the best of our knowledge, HQM is the first approach that promotes the robustness of DETR-based models from the perspective of hard example mining. Moreover, HQM is plug-and-play and can be readily applied to many DETR-based HOI detection methods.

New performance on CDN !!!

An efficient code implemenation of GBS on CDN is available /code_path/CDN/exp/train_hico.sh. Adding GBS, CDN-S can achieve 32.29 mAP within 60 epochs.

<div align="center"> <img src=".github/CDN_HQM.jpg" width="450px" /> </div>

Preparation

Dependencies

Our implementation uses external libraries such as NumPy, PyTorch and 8 2080Ti GPUs.You can resolve the dependencies with the following command.

pip install numpy
pip install -r requirements.txt

Dataset

HICO-DET

HICO-DET dataset can be downloaded here. After finishing downloading, unpack the tarball (hico_20160224_det.tar.gz) to the data directory.

Instead of using the original annotations files, we use the annotation files provided by the PPDM authors. The annotation files can be downloaded from here. The downloaded annotation files have to be placed as follows.

HQM
 |─ data
 │   └─ hico_20160224_det
 |       |─ annotations
 |       |   |─ trainval_hico.json
 |       |   |─ test_hico.json
 |       |   └─ corre_hico.npy
 :       :

 |─ params
 │   └─ detr-r50-pre.pth

Pre-trained parameters

The annotations file and pre-trained weights can be downloaded here.

Training

python -m torch.distributed.launch \
    --nproc_per_node=8  \
    --use_env \
    main.py \
    --hoi \
    --dataset_file hico_gt \
    --model_name HQM \
    --hoi_path data/hico_20160224_det/ \
    --num_obj_classes 80 \
    --num_verb_classes 117 \
    --backbone resnet50 \
    --set_cost_bbox 2.5 \
    --set_cost_giou 1 \
    --bbox_loss_coef 2.5 \
    --giou_loss_coef 1 \
    --find_unused_parameters \
    --AJL 

Evaluation

You can conduct the evaluation with trained parameters as follows. The trained parameters are available here.

python -m torch.distributed.launch \
    --nproc_per_node=8  \
    --use_env \
    main.py \
    --hoi \
    --dataset_file hico_gt \
    --model_name HQM \
    --hoi_path data/hico_20160224_det/ \
    --num_obj_classes 80 \
    --num_verb_classes 117 \
    --backbone resnet50 \
    --set_cost_bbox 2.5 \
    --set_cost_giou 1 \
    --bbox_loss_coef 2.5 \
    --giou_loss_coef 1 \
    --find_unused_parameters \
    --AJL \
    --eval \
    --resume params/checkpoint_best.pth

The results are like below:

"test_mAP": 0.313470564574163, "test_mAP rare": 0.26546478777620686, "test_mAP non-rare": 0.32780995244887723

test_mAP, test_mAP rare, and test_mAP non-rare are the results of the default full, rare, and non-rare setting, respectively.

Results

HOI Detection HICO-DET.

Full (D)Rare (D)Non-rare (D)Full(KO)Rare (KO)Non-rare (KO)
HOTR + HQM (ResNet50)25.6924.7025.9828.2427.3528.51
QPIC + HQM (ResNet50)31.3426.5432.7834.0929.6335.42
CDN-S + HQM (ResNet50)32.4728.1533.7635.1730.7336.50

D: Default, KO: Known object

HOI Detection V-COCO.

Scenario 1
ours (ResNet50)63.6

Object Detection COCO.

APAP_0.5AP_0.75AP_SAP_MAP_L
SMCA35.0856.4735.9115.1438.0154.51
SMCA + HQM36.4857.0238.1916.4840.6254.91

Citation

Please consider citing our papers if it helps your research.

@inproceedings{zhong_eccv2022,
author = {Zhong, Xubin and Ding, Changxing and Li, Zijian and Huang, Shaoli},
title = {Towards Hard-Positive Query Mining for DETR-based Human-Object Interaction Detection},
booktitle={ECCV},
year = {2022},
}

@InProceedings{Qu_2022_CVPR,
    author    = {Qu, Xian and Ding, Changxing and Li, Xingao and Zhong, Xubin and Tao, Dacheng},
    title     = {Distillation Using Oracle Queries for Transformer-Based Human-Object Interaction Detection},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    month     = {June},
    year      = {2022},
    pages     = {19558-19567}
}

@inproceedings{zhang2022accelerating,
  title={Accelerating DETR convergence via semantic-aligned matching},
  author={Zhang, Gongjie and Luo, Zhipeng and Yu, Yingchen and Cui, Kaiwen and Lu, Shijian},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages={949--958},
  year={2022}
}

Acknowledgement

DOQ, QPIC , CDN