Home

Awesome

Multi-branch Collaborative Learning Network for 3D Visual Grounding

:tada::tada::tada: This is a PyTorch implementation of MCLN proposed by our paper ["Multi-branch Collaborative Learning Network for 3D Visual Grounding"].(ECCV2024) image

0. Installation

1. Quick visualization demo

We showing visualization via wandb for superpoints, kps points, bad case analyse, predict/ground_truth masks and box.

self.visualization_superpoint = False
self.visualization_pred = False
self.visualization_gt = False
self.bad_case_visualization = False
self.kps_points_visualization = False
self.bad_case_threshold = 0.15

2. Data preparation

The final required files are as follows:

├── [DATA_ROOT]
│	├── [1] train_v3scans.pkl # Packaged ScanNet training set
│	├── [2] val_v3scans.pkl   # Packaged ScanNet validation set
│	├── [3] ScanRefer/        # ScanRefer utterance data
│	│	│	├── ScanRefer_filtered_train.json
│	│	│	├── ScanRefer_filtered_val.json
│	│	│	└── ...
│	├── [4] ReferIt3D/        # NR3D/SR3D utterance data
│	│	│	├── nr3d.csv
│	│	│	├── sr3d.csv
│	│	│	└── ...
│	├── [5] group_free_pred_bboxes/  # detected boxes (optional)
│	├── [6] gf_detector_l6o256.pth   # pointnet++ checkpoint (optional)
│	├── [7] roberta-base/     # roberta pretrained language model
│	├── [8] checkpoints/      # mcln pretrained models
ScanNetv2
├── data
│   ├── scannetv2
│   │   ├── scans
│   │   ├── scans_test
│   │   ├── train
│   │   ├── val
│   │   ├── test
│   │   ├── val_gt

3. Models

Dataset/ModelREC mAP@0.25REC mAP@0.5RES mIoUModel
ScanRefer/mcln57.1745.5344.72GoogleDrive

4. Training

5. Evaluation

6. Acknowledgements

This repository is built on reusing codes of EDA and 3DRefTR. We recommend using their code repository in your research and reading the related article. We are also quite grateful for SPFormer, BUTD-DETR, GroupFree, ScanRefer, and SceneGraphParser.

7. Citation

If you find our work useful in your research, please consider citing:

@misc{qian2024multibranchcollaborativelearningnetwork,
      title={Multi-branch Collaborative Learning Network for 3D Visual Grounding}, 
      author={Zhipeng Qian and Yiwei Ma and Zhekai Lin and Jiayi Ji and Xiawu Zheng and Xiaoshuai Sun and Rongrong Ji},
      year={2024},
      eprint={2407.05363},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2407.05363}}