Awesome
Pixel-Aligned Recurrent Queries for Multi-View 3D Object Detection
Project Page | Paper
<br/> <!-- ## TODO --> <!-- - [x] ScanNet Dataset --> <!-- - [ ] ARKitScenes Dataset -->Pixel-Aligned Recurrent Queries for Multi-View 3D Object Detection
Yiming Xie, Huaizu Jiang, Georgia Gkioxari*, Julian Straub*
ICCV 2023
How to use
Installation
conda env create -f environment.yml
Pretrained Model on ScanNet
Download the pretrained weights and put it under
PROJECT_PATH/checkpoint/
.
You can also use gdown to download it in command line:
gdown --id 1FuIf1jDPX-ooOx0x-tS69ejhdn9NFuXz
Data Preperation for ScanNet
Download and extract ScanNet by following the instructions provided at http://www.scan-net.org/.
<details> <summary>[Expected directory structure of ScanNet (click to expand)]</summary>You can obtain the train/val/test split information from here.
PROJECT_PATH
└───data
| └───scannet
| │ └───scans
| │ | └───scene0000_00
| │ | └───color
| │ | │ │ 0.jpg
| │ | │ │ 1.jpg
| │ | │ │ ...
| │ | │ ...
| │ └───scans_raw
| │ | └───scene0000_00
| │ | └───scene0000_00.aggregation.json
| │ | └───scene0000_00_vh_clean_2.labels.ply
| │ | └───scene0000_00_vh_clean_2.0.010000.segs.json
| │ | │ ...
| | └───scannetv2_test.txt
| | └───scannetv2_train.txt
| | └───scannetv2_val.txt
| | └───scannetv2-labels.combined.tsv
</details>
Next download the generated oriented boxes annotations and put it under PROJECT_PATH/data/scannet/
OR you can run the data preparation script by yourself.
Inference on ScanNet val-set
python eval.py --cfg ./config/eval.yaml CHECKPOINT_PATH ./checkpoint/parq_release.ckpt
Training on ScanNet
Training with 8 gpus:
python train.py --cfg ./config/train.yaml TRAINER.GPUS 8
Real-time Demo on Custom Data with Camera Poses from ARKit.
We provide a demo of PARQ running with self-captured ARKit data. Please refer to DEMO.md for details about capturing and processing the data. We also provide the example data captured using iPhoneXR.
<!-- ## Coordinates illustration for ScanNet World coordinate: ScanNet world coordinate Camera coordinate: Camera coodinate with OpenCV format (face forward +z, right: +x) PseudoCam coordinate: gravity-aligned camera coordinate (rotate camera coodinate to make the coordinate gravity-aligned) Local coordinate: the PseudoCam coordinate of middle frame -->Citation
If you find this code useful for your research, please use the following BibTeX entry.
@inproceedings{xie2023parq,
title={Pixel-Aligned Recurrent Queries for Multi-View {3D} Object Detection},
author={Xie, Yiming and Jiang, Huaizu and Gkioxari, Georgia and Straub, Julian},
booktitle={ICCV},
year={2023}
}
License
The majority of PARQ is relased under the MIT License.
LICENSE-MIT file is for file model/transformer_parq.py
.
LICENSE file is for other files.
Acknowledgment
We want to thank the following contributors that our code is based on: DETR, VoteNet, RotationContinuity, Pixloc .