Home

Awesome

HoughNet: Integrating near and long-range evidence for bottom-up object detection

Official PyTorch implementation of HoughNet.

HoughNet: Integrating near and long-range evidence for bottom-up object detection,
Nermin Samet, Samet Hicsonmez, Emre Akbas,
ECCV 2020. (arXiv pre-print)

Extended HoughNet with new tasks.

HoughNet: Integrating near and long-range evidence for visual detection,
Nermin Samet, Samet Hicsonmez, Emre Akbas,
TPAMI 2022. (arXiv pre-print)

Updates

(August, 2022) Our extended paper is accepted to IEEE Transaction on Pattern Analysis and Machine Intelligence (TPAMI).

(April, 2021) We extended HoughNet with other visual detection tasks: video object detection, instance segmentation, keypoint detection and 3D object detection.

More details can be found in arXiv pre-print.

Summary

Object detection methods typically rely on only local evidence. For example, to detect the mouse in the image below, only the features extracted at/around the mouse are used. In contrast, HoughNet is able to utilize long-range (i.e. far away) evidence, too. Below, on the right, the votes that support the detection of the mouse are shown: in addition to the local evidence, far away but semantically relevant objects, the two keyboards, vote for the mouse.

<img src="/readme/teaser.png" width="550">

HoughNet is a one-stage, anchor-free, voting-based, bottom-up object detection method. Inspired by the Generalized Hough Transform, HoughNet determines the presence of an object at a certain location by the sum of the votes cast on that location. Votes are collected from both near and long-distance locations based on a log-polar vote field. Thanks to this voting mechanism, HoughNet is able to integrate both near and long-range, class-conditional evidence for visual recognition, thereby generalizing and enhancing current object detection methodology, which typically relies on only local evidence. On the COCO dataset, HoughNet achieves 46.4 AP (and 65.1 AP<sub>50</sub>), performing on par with the state-of-the-art in bottom-up object detection and outperforming most major one-stage and two-stage methods. We further validate the effectiveness of HoughNet in another task, namely, "labels to photo" image generation by integrating the voting module to two different GAN models and showing that the accuracy is significantly improved in both cases.

Highlights

A step-by-step animation of the voting process is provided here.

Object Detection Results on COCO val2017

BackboneAP / AP<sub>50</sub>Multi-scale AP / AP<sub>50</sub>
Hourglass-10443.0 / 62.246.1 / 64.6
ResNet-101 w DCN37.2 / 56.541.5 / 61.5
ResNet-10136.0 / 55.240.7 / 60.6

Instance Segmentation Results on COCO val2017

ModelAP / AP50Box AP / AP50
Baseline27.2 / 46.433.9 / 51.3
HoughNet28.4 / 48.035.0 / 52.9

2D Keypoint Detection Results on COCO val2017

ModelAP / AP50Box AP / AP50
Voting for Person Class.56.9 / 81.650.1 / 71.4
Voting for Keypoint Est.56.8 / 81.550.2 / 70.9
Voting for Both56.9 / 81.650.4 / 71.7

All models could be found in Model zoo.

Installation

Please refer to INSTALL.md for installation instructions.

Evaluation and Training

For evaluation and training details please refer to GETTING_STARTED.md.

Acknowledgement

This work was supported by the AWS Cloud Credits for Research program and by the Scientific and Technological Research Council of Turkey (TUBITAK) through the project titled "Object Detection in Videos with Deep Neural Networks" (grant number 117E054). The numerical calculations reported in this paper were partially performed at TUBITAK ULAKBIM, High Performance and Grid Computing Center (TRUBA resources). We also thank the authors of CenterNet for their clean code and inspiring work.

License

HoughNet is released under the MIT License (refer to the LICENSE file for details). We developed HoughNet on top of CenterNet. Please refer to the License of CenterNet for more detail.

Citation

If you find HoughNet useful for your research, please cite our paper as follows.

N. Samet, S. Hicsonmez, E. Akbas, "HoughNet: Integrating near and long-range evidence for bottom-up object detection", In European Conference on Computer Vision (ECCV), 2020.

N. Samet, S. Hicsonmez, E. Akbas, "HoughNet: Integrating near and long-range evidence for visual detection", arXiv, 2021.

BibTeX entry:

@inproceedings{HoughNet,
  author = {Nermin Samet and Samet Hicsonmez and Emre Akbas},
  title = {HoughNet: Integrating near and long-range evidence for bottom-up object detection},
  booktitle = {European Conference on Computer Vision (ECCV)},
  year = {2020},
}
@misc{HoughNet2021,
      title={HoughNet: Integrating near and long-range evidence for visual detection}, 
      author={Nermin Samet and Samet Hicsonmez and Emre Akbas},
      year={2021}, 
}