Home

Awesome

Pixel-Aligned Recurrent Queries for Multi-View 3D Object Detection

Project Page | Paper

Pixel-Aligned Recurrent Queries for Multi-View 3D Object Detection
Yiming Xie, Huaizu Jiang, Georgia Gkioxari*, Julian Straub*
ICCV 2023

real-time video

<br/> <!-- ## TODO --> <!-- - [x] ScanNet Dataset --> <!-- - [ ] ARKitScenes Dataset -->

How to use

Installation

conda env create -f environment.yml

Pretrained Model on ScanNet

Download the pretrained weights and put it under PROJECT_PATH/checkpoint/. You can also use gdown to download it in command line:

gdown --id 1FuIf1jDPX-ooOx0x-tS69ejhdn9NFuXz

Data Preperation for ScanNet

Download and extract ScanNet by following the instructions provided at http://www.scan-net.org/.

<details> <summary>[Expected directory structure of ScanNet (click to expand)]</summary>

You can obtain the train/val/test split information from here.

PROJECT_PATH
└───data
|   └───scannet
|   │   └───scans
|   │   |   └───scene0000_00
|   │   |       └───color
|   │   |       │   │   0.jpg
|   │   |       │   │   1.jpg
|   │   |       │   │   ...
|   │   |       │   ...
|   │   └───scans_raw
|   │   |   └───scene0000_00
|   │   |       └───scene0000_00.aggregation.json
|   │   |       └───scene0000_00_vh_clean_2.labels.ply
|   │   |       └───scene0000_00_vh_clean_2.0.010000.segs.json
|   │   |       │   ...
|   |   └───scannetv2_test.txt
|   |   └───scannetv2_train.txt
|   |   └───scannetv2_val.txt
|   |   └───scannetv2-labels.combined.tsv
</details>

Next download the generated oriented boxes annotations and put it under PROJECT_PATH/data/scannet/

OR you can run the data preparation script by yourself.

Inference on ScanNet val-set

python eval.py --cfg ./config/eval.yaml CHECKPOINT_PATH ./checkpoint/parq_release.ckpt

Training on ScanNet

Training with 8 gpus:

python train.py --cfg ./config/train.yaml TRAINER.GPUS 8

Real-time Demo on Custom Data with Camera Poses from ARKit.

We provide a demo of PARQ running with self-captured ARKit data. Please refer to DEMO.md for details about capturing and processing the data. We also provide the example data captured using iPhoneXR.

<!-- ## Coordinates illustration for ScanNet World coordinate: ScanNet world coordinate Camera coordinate: Camera coodinate with OpenCV format (face forward +z, right: +x) PseudoCam coordinate: gravity-aligned camera coordinate (rotate camera coodinate to make the coordinate gravity-aligned) Local coordinate: the PseudoCam coordinate of middle frame -->

Citation

If you find this code useful for your research, please use the following BibTeX entry.

@inproceedings{xie2023parq,
  title={Pixel-Aligned Recurrent Queries for Multi-View {3D} Object Detection},
  author={Xie, Yiming and Jiang, Huaizu and Gkioxari, Georgia and Straub, Julian},
  booktitle={ICCV},
  year={2023}
}

License

The majority of PARQ is relased under the MIT License. LICENSE-MIT file is for file model/transformer_parq.py. LICENSE file is for other files.

Acknowledgment

We want to thank the following contributors that our code is based on: DETR, VoteNet, RotationContinuity, Pixloc .