Awesome

Pixel-Aligned Recurrent Queries for Multi-View 3D Object Detection

Project Page | Paper

Pixel-Aligned Recurrent Queries for Multi-View 3D Object Detection
Yiming Xie, Huaizu Jiang, Georgia Gkioxari*, Julian Straub*
ICCV 2023

real-time video

How to use

Installation

conda env create -f environment.yml

Pretrained Model on ScanNet

Download the pretrained weights and put it under PROJECT_PATH/checkpoint/. You can also use gdown to download it in command line:

gdown --id 1FuIf1jDPX-ooOx0x-tS69ejhdn9NFuXz

Data Preperation for ScanNet

Download and extract ScanNet by following the instructions provided at http://www.scan-net.org/.

<details> <summary>[Expected directory structure of ScanNet (click to expand)]</summary>

You can obtain the train/val/test split information from here.

PROJECT_PATH
└───data
|   └───scannet
|   │   └───scans
|   │   |   └───scene0000_00
|   │   |       └───color
|   │   |       │   │   0.jpg
|   │   |       │   │   1.jpg
|   │   |       │   │   ...
|   │   |       │   ...
|   │   └───scans_raw
|   │   |   └───scene0000_00
|   │   |       └───scene0000_00.aggregation.json
|   │   |       └───scene0000_00_vh_clean_2.labels.ply
|   │   |       └───scene0000_00_vh_clean_2.0.010000.segs.json
|   │   |       │   ...
|   |   └───scannetv2_test.txt
|   |   └───scannetv2_train.txt
|   |   └───scannetv2_val.txt
|   |   └───scannetv2-labels.combined.tsv

</details>

Next download the generated oriented boxes annotations and put it under PROJECT_PATH/data/scannet/

OR you can run the data preparation script by yourself.

Inference on ScanNet val-set

python eval.py --cfg ./config/eval.yaml CHECKPOINT_PATH ./checkpoint/parq_release.ckpt

Training on ScanNet

Training with 8 gpus:

python train.py --cfg ./config/train.yaml TRAINER.GPUS 8

Real-time Demo on Custom Data with Camera Poses from ARKit.

We provide a demo of PARQ running with self-captured ARKit data. Please refer to DEMO.md for details about capturing and processing the data. We also provide the example data captured using iPhoneXR.

Citation

If you find this code useful for your research, please use the following BibTeX entry.

@inproceedings{xie2023parq,
  title={Pixel-Aligned Recurrent Queries for Multi-View {3D} Object Detection},
  author={Xie, Yiming and Jiang, Huaizu and Gkioxari, Georgia and Straub, Julian},
  booktitle={ICCV},
  year={2023}
}

License

The majority of PARQ is relased under the MIT License. LICENSE-MIT file is for file model/transformer_parq.py. LICENSE file is for other files.

Acknowledgment

We want to thank the following contributors that our code is based on: DETR, VoteNet, RotationContinuity, Pixloc .