Home

Awesome

Bucketed Ranking-based Losses for Efficient Training of Object Detectors

arXiv

The official implementation of Bucketed Ranking-based Losses. Our implementation is based on mmdetection.

Bucketed Ranking-based Losses for Efficient Training of Object Detectors,
Feyza Yavuz, Baris Can Cam, Adnan Harun Dogan, Kemal Oksuz, Emre Akbas, Sinan Kalkan, ECCV 2024. (arXiv pre-print)

Introduction

What is Bucketed Ranking-based (BR) Losses? Bucketing for ranking-based losses enhances the efficiency of such losses in object detection by grouping negative predictions into buckets, significantly reducing the number of pairwise comparisons required during training. Bucketing maintains the alignment with evaluation criteria and robustness against class imbalance of ranking-based loss functions while drastically improving the time complexity.

<p align="center"> <img src="figures/ranking_comparison_2.png" width="600"> </p>

BRS-DETR: Efficient and Robust Transformer-Based Object Detection with Bucketed Ranking-Based Losses BRS-DETR integrates Bucketed Ranking-Based Loss (BRS Loss) into Co-DETR, delivering superior performance and training efficiency on the COCO benchmark. (i) BRS-DETR achieves a 0.8 AP improvement on ResNet-50 and consistent gains across other transformer-based backbones. (ii) BRS-DETR provides faster training: cuts training time by 6×, optimizing the handling of positive examples and loss calculation of auxillary heads.

Benefits of BR Loss on Efficiency and Simplification of Training. With BR Loss, we achieve significant improvements in training efficiency: (i) The bucketed approach reduces the time complexity to O(max(N log(N),P²)), allowing faster training, (ii) BR Loss maintains the simplicity and robustness of ranking-based approaches without requiring complex sampling heuristics or additional auxiliary heads, and (iii) it enables efficient training of large-scale object detectors, including transformer-based models, with minimal tuning.

Benefits of BR Loss on Improving Performance. Using BR Loss, we train seven diverse visual detectors and demonstrate consistent performance improvements: (i) BR Loss accelerates training by 2× on average while preserving the accuracy of unbucketed versions, (ii) For the first time, we successfully train transformer-based detectors like CoDETR using ranking-based losses, consistently outperforming their original configurations across multiple backbones.

<p align="center"> <img src="figures/performance_comparison.png" width="600"> </p>

How to Cite

Please cite the paper if you benefit from our paper or the repository:

@inproceedings{BRLoss,
       title = {Bucketed Ranking-based Losses for Efficient Training of Object Detectors},
       author = {Feyza Yavuz and Baris Can Cam and Adnan Harun Dogan and Kemal Oksuz and Emre Akbas and Sinan Kalkan},
       booktitle = {European Conference on Computer Vision (ECCV)},
       year = {2024}
}

Specifications of Dependencies and Preparation

Please note that, we implement our method on MMDetection V2.25.3 and MMCV V1.5.0. More specifically, we use python=3.7.11, pytorch=1.11.0, cuda=11.3 versions.

Trained Models

Here, we report validation set results for object detection and instance segmentation tasks. For object detection we report results on COCO validation set. For instance segmentation we report results on both Cityscapes and LVIS validation sets.

We refer to the RS Loss repository for models trained with RS Loss.

Transformer-based Object Detection

BRS-DETR vs. Co-DETR

BackboneEpochDetectorbox APLogConfigModel
ResNet-5012Co-DETR49.3logconfigmodel
ResNet-5012BRS-DETR50.1logconfigmodel
Swin-T12Co-DETR51.7logconfigmodel
Swin-T12BRS-DETR52.3logconfigmodel
Swin-L12Co-DETR56.9logconfigmodel
Swin-L12BRS-DETR57.2logconfigmodel

Multi-stage Object Detection

Faster R-CNN

BackboneEpochLoss Func.Timebox APLogConfigModel
ResNet-5012RS0.5839.5logconfigmodel
ResNet-5012BRS0.19 (3.0x ↓)39.5logconfigmodel
ResNet-10136RS0.9147.3logconfigmodel
ResNet-10136BRS0.47 (2.0x ↓)47.7logconfigmodel

Cascade R-CNN

BackboneEpochLoss Func.Timebox APLogConfigModel
ResNet-5012RS1.5441.1logconfigmodel
ResNet-5012BRS0.29 (5.3x ↓)41.1logconfigmodel

One-stage Object Detection

ATSS

BackboneEpochLoss Func.Timebox APLogConfigModel
ResNet-5012AP0.3438.3logconfigmodel
ResNet-5012BAP0.18 (1.9x ↓)38.5logconfigmodel
ResNet-5012RS0.4439.8logconfigmodel
ResNet-5012BRS0.19 (2.4x ↓)39.8logconfigmodel

PAA

BackboneEpochLoss Func.Timebox APLogConfigModel
ResNet-5012APTODO37.3logconfigmodel
ResNet-5012BAPTODO (1.5x ↓)37.2logconfigmodel
ResNet-5012RSTODO40.8logconfigmodel
ResNet-5012BRS0.36 (1.9x ↓)40.8logconfigmodel

Instance Segmentation

We use Mask R-CNN as the baseline model to experiment with our method in the instance segmentation task.

Coco Val

BackboneEpochLoss Func.Timemask APLogConfigModel
ResNet-5012RS0.6836.3logconfigmodel
ResNet-5012BRS0.29 (2.3x ↓)36.2logconfigmodel
ResNet-10136RS0.7140.2logconfigmodel
ResNet-10136BRS0.33 (2.2x ↓)40.3logconfigmodel

Cityscapes

BackboneEpochLoss Func.Timebox APmask APLogConfigModel
ResNet-5012RS0.4343.738.2logconfigmodel
ResNet-5012BRS0.19 (2.3x ↓)43.338.5logconfigmodel

LVIS

BackboneEpochLoss Func.Timemask APLogConfigModel
ResNet-5012RS0.8725.6logconfigmodel
ResNet-5012BRS0.35 (2.5x ↓)25.8logconfigmodel