

SpatialDETR: Robust Scalable Transformer-Based 3D Object Detection from Multi-View Camera Images with Global Cross-Sensor Attention (ECCV 2022)

This is the official repository for SpatialDETR which will be published at ECCV 2022.


Authors: Simon Doll, Richard Schulz, Lukas Schneider, Viviane Benzin, Markus Enzweiler, Hendrik P.A. Lensch


Based on the key idea of DETR this paper introduces an object-centric 3D object detection framework that operates on a limited number of 3D object queries instead of dense bounding box proposals followed by non-maximum suppression. After image feature extraction a decoder-only transformer architecture is trained on a set-based loss. SpatialDETR infers the classification and bounding box estimates based on attention both spatially within each image and across the different views. To fuse the multi-view information in the attention block we introduce a novel geometric positional encoding that incorporates the view ray geometry to explicitly consider the extrinsic and intrinsic camera setup. This way, the spatially-aware cross-view attention exploits arbitrary receptive fields to integrate cross-sensor data and therefore global context. Extensive experiments on the nuScenes benchmark demonstrate the potential of global attention and result in state-of-the-art performance.

If you find this repository useful, please cite

  author = {Doll, Simon and Schulz, Richard and Schneider, Lukas and Benzin, Viviane and Enzweiler Markus and Lensch, Hendrik P.A.},
  title = {SpatialDETR: Robust Scalable Transformer-Based 3D Object Detection from Multi-View Camera Images with Global Cross-Sensor Attention},
  booktitle = {European Conference on Computer Vision(ECCV)},
  year = {2022}

You can find the Paper here


To setup the repository and run trainings we refer to getting_started.md



Experimental results

The baseline models have been trained on 4xV100 GPUs, the submission models on 8xA100 GPUs. For more details we refer to the corresponding configuration / log files. Keep in mind that the performance can vary between runs and that the current codebase uses mmdetection3d@rc1.0

query_proj_value_proj.py (baseline)log / modelval4rc1.00.3150.8430.2790.4970.7870.2080.396

Qualitative results


See license_infos.md for details.


This repo contains the implementations of SpatialDETR. Our implementation is a plugin to MMDetection3D and also uses a fork of DETR3D. Full credits belong to the contributors of those frameworks and we truly thank them for enabling our research!