Awesome
Bucketed Ranking-based Losses for Efficient Training of Object Detectors
The official implementation of Bucketed Ranking-based Losses. Our implementation is based on mmdetection.
Bucketed Ranking-based Losses for Efficient Training of Object Detectors,
Feyza Yavuz, Baris Can Cam, Adnan Harun Dogan, Kemal Oksuz, Emre Akbas, Sinan Kalkan, ECCV 2024. (arXiv pre-print)
Introduction
What is Bucketed Ranking-based (BR) Losses? Bucketing for ranking-based losses enhances the efficiency of such losses in object detection by grouping negative predictions into buckets, significantly reducing the number of pairwise comparisons required during training. Bucketing maintains the alignment with evaluation criteria and robustness against class imbalance of ranking-based loss functions while drastically improving the time complexity.
<p align="center"> <img src="figures/ranking_comparison_2.png" width="600"> </p>BRS-DETR: Efficient and Robust Transformer-Based Object Detection with Bucketed Ranking-Based Losses BRS-DETR integrates Bucketed Ranking-Based Loss (BRS Loss) into Co-DETR, delivering superior performance and training efficiency on the COCO benchmark. (i) BRS-DETR achieves a 0.8 AP improvement on ResNet-50 and consistent gains across other transformer-based backbones. (ii) BRS-DETR provides faster training: cuts training time by 6×, optimizing the handling of positive examples and loss calculation of auxillary heads.
Benefits of BR Loss on Efficiency and Simplification of Training. With BR Loss, we achieve significant improvements in training efficiency: (i) The bucketed approach reduces the time complexity to O(max(N log(N),P²)), allowing faster training, (ii) BR Loss maintains the simplicity and robustness of ranking-based approaches without requiring complex sampling heuristics or additional auxiliary heads, and (iii) it enables efficient training of large-scale object detectors, including transformer-based models, with minimal tuning.
Benefits of BR Loss on Improving Performance. Using BR Loss, we train seven diverse visual detectors and demonstrate consistent performance improvements: (i) BR Loss accelerates training by 2× on average while preserving the accuracy of unbucketed versions, (ii) For the first time, we successfully train transformer-based detectors like CoDETR using ranking-based losses, consistently outperforming their original configurations across multiple backbones.
<p align="center"> <img src="figures/performance_comparison.png" width="600"> </p>How to Cite
Please cite the paper if you benefit from our paper or the repository:
@inproceedings{BRLoss,
title = {Bucketed Ranking-based Losses for Efficient Training of Object Detectors},
author = {Feyza Yavuz and Baris Can Cam and Adnan Harun Dogan and Kemal Oksuz and Emre Akbas and Sinan Kalkan},
booktitle = {European Conference on Computer Vision (ECCV)},
year = {2024}
}
Specifications of Dependencies and Preparation
- Please see get_started.md for requirements and installation of mmdetection.
- Please see introduction.md for dataset preparation and basic usage of mmdetection.
Please note that, we implement our method on MMDetection V2.25.3 and MMCV V1.5.0. More specifically, we use python=3.7.11, pytorch=1.11.0, cuda=11.3
versions.
Trained Models
Here, we report validation set results for object detection and instance segmentation tasks. For object detection we report results on COCO validation set. For instance segmentation we report results on both Cityscapes and LVIS validation sets.
We refer to the RS Loss repository for models trained with RS Loss.
Transformer-based Object Detection
BRS-DETR vs. Co-DETR
Backbone | Epoch | Detector | box AP | Log | Config | Model |
---|---|---|---|---|---|---|
ResNet-50 | 12 | Co-DETR | 49.3 | log | config | model |
ResNet-50 | 12 | BRS-DETR | 50.1 | log | config | model |
Swin-T | 12 | Co-DETR | 51.7 | log | config | model |
Swin-T | 12 | BRS-DETR | 52.3 | log | config | model |
Swin-L | 12 | Co-DETR | 56.9 | log | config | model |
Swin-L | 12 | BRS-DETR | 57.2 | log | config | model |
Multi-stage Object Detection
Faster R-CNN
Backbone | Epoch | Loss Func. | Time | box AP | Log | Config | Model |
---|---|---|---|---|---|---|---|
ResNet-50 | 12 | RS | 0.58 | 39.5 | log | config | model |
ResNet-50 | 12 | BRS | 0.19 (3.0x ↓) | 39.5 | log | config | model |
ResNet-101 | 36 | RS | 0.91 | 47.3 | log | config | model |
ResNet-101 | 36 | BRS | 0.47 (2.0x ↓) | 47.7 | log | config | model |
Cascade R-CNN
Backbone | Epoch | Loss Func. | Time | box AP | Log | Config | Model |
---|---|---|---|---|---|---|---|
ResNet-50 | 12 | RS | 1.54 | 41.1 | log | config | model |
ResNet-50 | 12 | BRS | 0.29 (5.3x ↓) | 41.1 | log | config | model |
One-stage Object Detection
ATSS
Backbone | Epoch | Loss Func. | Time | box AP | Log | Config | Model |
---|---|---|---|---|---|---|---|
ResNet-50 | 12 | AP | 0.34 | 38.3 | log | config | model |
ResNet-50 | 12 | BAP | 0.18 (1.9x ↓) | 38.5 | log | config | model |
ResNet-50 | 12 | RS | 0.44 | 39.8 | log | config | model |
ResNet-50 | 12 | BRS | 0.19 (2.4x ↓) | 39.8 | log | config | model |
PAA
Backbone | Epoch | Loss Func. | Time | box AP | Log | Config | Model |
---|---|---|---|---|---|---|---|
ResNet-50 | 12 | AP | TODO | 37.3 | log | config | model |
ResNet-50 | 12 | BAP | TODO (1.5x ↓) | 37.2 | log | config | model |
ResNet-50 | 12 | RS | TODO | 40.8 | log | config | model |
ResNet-50 | 12 | BRS | 0.36 (1.9x ↓) | 40.8 | log | config | model |
Instance Segmentation
We use Mask R-CNN as the baseline model to experiment with our method in the instance segmentation task.
Coco Val
Backbone | Epoch | Loss Func. | Time | mask AP | Log | Config | Model |
---|---|---|---|---|---|---|---|
ResNet-50 | 12 | RS | 0.68 | 36.3 | log | config | model |
ResNet-50 | 12 | BRS | 0.29 (2.3x ↓) | 36.2 | log | config | model |
ResNet-101 | 36 | RS | 0.71 | 40.2 | log | config | model |
ResNet-101 | 36 | BRS | 0.33 (2.2x ↓) | 40.3 | log | config | model |
Cityscapes
Backbone | Epoch | Loss Func. | Time | box AP | mask AP | Log | Config | Model |
---|---|---|---|---|---|---|---|---|
ResNet-50 | 12 | RS | 0.43 | 43.7 | 38.2 | log | config | model |
ResNet-50 | 12 | BRS | 0.19 (2.3x ↓) | 43.3 | 38.5 | log | config | model |
LVIS
Backbone | Epoch | Loss Func. | Time | mask AP | Log | Config | Model |
---|---|---|---|---|---|---|---|
ResNet-50 | 12 | RS | 0.87 | 25.6 | log | config | model |
ResNet-50 | 12 | BRS | 0.35 (2.5x ↓) | 25.8 | log | config | model |