Home

Awesome

🌠 MonoTTA

This is the official project repository for MonoTTA: Fully Test-Time Adaptation for Monocular 3D Object Detection

News

To do

Demo

https://github.com/Hongbin98/MonoTTA/assets/23754668/1abee3cc-8620-4974-bdf5-d7b83f05512b

https://github.com/Hongbin98/MonoTTA/assets/23754668/6fa98b3c-5772-4466-8aa7-f4fba1435158

https://github.com/Hongbin98/MonoTTA/assets/23754668/4f675ff8-7abf-42da-96b3-fe73d0fde758

Note

Abstract

Monocular 3D object detection (Mono 3Det) aims to identify 3D objects from a single RGB image. However, existing methods often assume training and test data follow the same distribution, which may not hold in real-world test scenarios. To address the out-of-distribution (OOD) problems, we explore a new adaptation paradigm for Mono 3Det, termed Fully Test-time Adaptation. It aims to adapt a well-trained model to unlabeled test data by handling potential data distribution shifts at test time without access to training data and test labels. However, applying this paradigm in Mono 3Det poses significant challenges due to OOD test data causing a remarkable decline in object detection scores. This decline conflicts with the pre-defined score thresholds of existing detection methods, leading to severe object omissions (i.e., rare positive detections and many false negatives). Consequently, the limited positive detection and plenty of noisy predictions cause test-time adaptation to fail in Mono 3Det.

image

MonoTTA consists of:

Data Preparation

Tianyi Cloud: [https://cloud.189.cn/t/aYZzumiUZrY3 (password:s9dc)]

Google Drive: https://drive.google.com/file/d/1Se_0wpGCV4-pEEmLVY59J_H1VkkjfWTX/view?usp=sharing

Installation

We recommend reproducing experiments of MonoFlex (https://github.com/zhangyp15/MonoFlex) due to its relatively stable performance

  1. We adopt torch1.7.1+cu110, by
pip install torch==1.7.1+cu110 torchvision==0.8.2+cu110 torchaudio==0.7.2 -f https://download.pytorch.org/whl/torch_stable.html

pip install -r requirements.txt 
  1. Then
cd model/backbone/DCNv2

. make.sh

cd ../../..

python setup develop
  1. For the source-only setting (with the official checkpoint [https://drive.google.com/drive/folders/1U60gUYp4JFOkG0VMefc4aVEMxtGM-AMu?usp=sharing]). Don't forget to config the data path in './config/paths_catalog.py'
CUDA_VISIBLE_DEVICES=0 python tools/plain_train_net.py --config runs/monoflex.yaml --ckpt YOUR_CKPT  --eval

--Docker Env--

If you find that the environment above conflicts with your GPU server (e.g., conflicting cuda or gcc versions), it is recommended to use Docker to build the environment:

cd docker/pytorch1.7.1_cu110
docker build -t monotta:pytorch1.7.1-cuda11.0 .

Then, you can initilize a docker container with the built image "monotta:pytorch1.7.1-cuda11.0" by

docker run --gpus all -it --shm-size=64g monotta:pytorch1.7.1-cuda11.0

or (with directory mounting)

docker run --gpus all -it --shm-size=64g -v /dir_local:/dir_docker monotta:pytorch1.7.1-cuda11.0

Next, you can continue configuring the environment in the Docker container mentioned above.

Usage

You can run MonoTTA by

CUDA_VISIBLE_DEVICES=0 python tools/tta_monotta.py --config runs/monoflex.yaml --ckpt model_moderate_best_soft.pth --eval --output kitti-c/gaussian1

Results

MonoFlex(Gaussian_Noise_1)CarPedestrianCyclistAvg.
No adapt4.820.230.341.8
MonoTTA (ours)21.156.543.0110.27

Note: The small difference between this repo and the main paper is due to the randomness in generating perturbed data.

Correspondence

Please contact Hongbin Lin by [linhongbinanthem@gmail.com] if you have any questions. 📬

Citation

If our MonoTTA method or fully TTA for Monocular 3D Object Detection settings are helpful in your research, please consider citing our paper:

@inproceedings{lin2025monotta,
  title={MonoTTA: Fully Test-Time Adaptation for Monocular 3D Object Detection},
  author={Lin, Hongbin and Zhang, Yifan and Niu, Shuaicheng and Cui, Shuguang and Li, Zhen},
  booktitle={European Conference on Computer Vision},
  pages={96--114},
  year={2025},
  organization={Springer}
}

Acknowledgment

The code is inspired by the MonoFlex 🔗 Tent 🔗, EATA 🔗.