Awesome

Bucketed Ranking-based Losses for Efficient Training of Object Detectors

The official implementation of Bucketed Ranking-based Losses. Our implementation is based on mmdetection.

Bucketed Ranking-based Losses for Efficient Training of Object Detectors,
Feyza Yavuz, Baris Can Cam, Adnan Harun Dogan, Kemal Oksuz, Emre Akbas, Sinan Kalkan, ECCV 2024. (arXiv pre-print)

Introduction

What is Bucketed Ranking-based (BR) Losses? Bucketing for ranking-based losses enhances the efficiency of such losses in object detection by grouping negative predictions into buckets, significantly reducing the number of pairwise comparisons required during training. Bucketing maintains the alignment with evaluation criteria and robustness against class imbalance of ranking-based loss functions while drastically improving the time complexity.

BRS-DETR: Efficient and Robust Transformer-Based Object Detection with Bucketed Ranking-Based Losses BRS-DETR integrates Bucketed Ranking-Based Loss (BRS Loss) into Co-DETR, delivering superior performance and training efficiency on the COCO benchmark. (i) BRS-DETR achieves a 0.8 AP improvement on ResNet-50 and consistent gains across other transformer-based backbones. (ii) BRS-DETR provides faster training: cuts training time by 6×, optimizing the handling of positive examples and loss calculation of auxillary heads.

Benefits of BR Loss on Efficiency and Simplification of Training. With BR Loss, we achieve significant improvements in training efficiency: (i) The bucketed approach reduces the time complexity to O(max(N log(N),P²)), allowing faster training, (ii) BR Loss maintains the simplicity and robustness of ranking-based approaches without requiring complex sampling heuristics or additional auxiliary heads, and (iii) it enables efficient training of large-scale object detectors, including transformer-based models, with minimal tuning.

Benefits of BR Loss on Improving Performance. Using BR Loss, we train seven diverse visual detectors and demonstrate consistent performance improvements: (i) BR Loss accelerates training by 2× on average while preserving the accuracy of unbucketed versions, (ii) For the first time, we successfully train transformer-based detectors like CoDETR using ranking-based losses, consistently outperforming their original configurations across multiple backbones.

How to Cite

Please cite the paper if you benefit from our paper or the repository:

@inproceedings{BRLoss,
       title = {Bucketed Ranking-based Losses for Efficient Training of Object Detectors},
       author = {Feyza Yavuz and Baris Can Cam and Adnan Harun Dogan and Kemal Oksuz and Emre Akbas and Sinan Kalkan},
       booktitle = {European Conference on Computer Vision (ECCV)},
       year = {2024}
}

Specifications of Dependencies and Preparation

Please see get_started.md for requirements and installation of mmdetection.
Please see introduction.md for dataset preparation and basic usage of mmdetection.

Please note that, we implement our method on MMDetection V2.25.3 and MMCV V1.5.0. More specifically, we use python=3.7.11, pytorch=1.11.0, cuda=11.3 versions.

Trained Models

Here, we report validation set results for object detection and instance segmentation tasks. For object detection we report results on COCO validation set. For instance segmentation we report results on both Cityscapes and LVIS validation sets.

We refer to the RS Loss repository for models trained with RS Loss.

Transformer-based Object Detection

BRS-DETR vs. Co-DETR

Backbone	Epoch	Detector	box AP	Log	Config	Model
ResNet-50	12	Co-DETR	49.3	log	config	model
ResNet-50	12	BRS-DETR	50.1	log	config	model
Swin-T	12	Co-DETR	51.7	log	config	model
Swin-T	12	BRS-DETR	52.3	log	config	model
Swin-L	12	Co-DETR	56.9	log	config	model
Swin-L	12	BRS-DETR	57.2	log	config	model

Multi-stage Object Detection

Faster R-CNN

Backbone	Epoch	Loss Func.	Time	box AP	Log	Config	Model
ResNet-50	12	RS	0.58	39.5	log	config	model
ResNet-50	12	BRS	0.19 (3.0x ↓)	39.5	log	config	model
ResNet-101	36	RS	0.91	47.3	log	config	model
ResNet-101	36	BRS	0.47 (2.0x ↓)	47.7	log	config	model

Cascade R-CNN

Backbone	Epoch	Loss Func.	Time	box AP	Log	Config	Model
ResNet-50	12	RS	1.54	41.1	log	config	model
ResNet-50	12	BRS	0.29 (5.3x ↓)	41.1	log	config	model

One-stage Object Detection

ATSS

Backbone	Epoch	Loss Func.	Time	box AP	Log	Config	Model
ResNet-50	12	AP	0.34	38.3	log	config	model
ResNet-50	12	BAP	0.18 (1.9x ↓)	38.5	log	config	model
ResNet-50	12	RS	0.44	39.8	log	config	model
ResNet-50	12	BRS	0.19 (2.4x ↓)	39.8	log	config	model

PAA

Backbone	Epoch	Loss Func.	Time	box AP	Log	Config	Model
ResNet-50	12	AP	TODO	37.3	log	config	model
ResNet-50	12	BAP	TODO (1.5x ↓)	37.2	log	config	model
ResNet-50	12	RS	TODO	40.8	log	config	model
ResNet-50	12	BRS	0.36 (1.9x ↓)	40.8	log	config	model

Instance Segmentation

We use Mask R-CNN as the baseline model to experiment with our method in the instance segmentation task.

Coco Val

Backbone	Epoch	Loss Func.	Time	mask AP	Log	Config	Model
ResNet-50	12	RS	0.68	36.3	log	config	model
ResNet-50	12	BRS	0.29 (2.3x ↓)	36.2	log	config	model
ResNet-101	36	RS	0.71	40.2	log	config	model
ResNet-101	36	BRS	0.33 (2.2x ↓)	40.3	log	config	model

Cityscapes

Backbone	Epoch	Loss Func.	Time	box AP	mask AP	Log	Config	Model
ResNet-50	12	RS	0.43	43.7	38.2	log	config	model
ResNet-50	12	BRS	0.19 (2.3x ↓)	43.3	38.5	log	config	model

LVIS

Backbone	Epoch	Loss Func.	Time	mask AP	Log	Config	Model
ResNet-50	12	RS	0.87	25.6	log	config	model
ResNet-50	12	BRS	0.35 (2.5x ↓)	25.8	log	config	model