Home

Awesome

PlaneRecTR: Unified Query Learning for 3D Plane Recovery from a Single View (ICCV 2023)

Jingjia Shi, Shuaifeng Zhi, Kai Xu

[arXiv] [Page] [BibTeX]

Updates


model-architecture PlaneRecTR consists of three main modules: (1) A pixel-level module to extract dense pixel-wise image features; (2) A Transformer module to jointly predict 4 plane-related properties from each plane query, including plane classification probability, plane parameter, mask and depth embedding; (3) A plane-level module to calculate dense plane-level binary masks/depths, then filter non-plane predictions and produce the final 3D plane recovery.

Usage Instructions

Installation

This repository requires Python 3.7 and makes use of several external libraries (e.g., pytorch, detectron2). The script below is an example conda environment setup.

# 1.Create a conda environment.
conda create --name planerectr python=3.7
conda activate planerectr

# 2. Install Pytorch. Note, Please refer to https://pytorch.org/get-started/locally/ to select the appropriate version. 
pip install torch==1.10.1+cu113 torchvision==0.11.2+cu113 torchaudio==0.10.1+cu113 -f https://download.pytorch.org/whl/cu113/torch_stable.html

# 3. Install dependencies.
pip install -r requirements.txt

# 4. Install Detectron2. Note, please check PyTorch version matches that is required by Detectron2.
git clone -b v0.6 git@github.com:facebookresearch/detectron2.git
cd detectron2
pip install -e .

# 5. compile CUDA kernel for MSDeformAttn
cd PlaneRecTR/modeling/pixel_decoder/ops
sh make.sh

Data preparation

Follow Mask2Former, all datasets are placed in the directory specified by the environment variable DETECTRON2_DATASETS. you can set the location for builtin datasets by export DETECTRON2_DATASETS=YOUR_DATASETS_FOLDER/, and then detectron2 will look for datasets in the following directory structure:

$DETECTRON2_DATASETS/
  scannetv1_plane/
  nyuv2_plane/

If left unset, the default is ./datasets relative to your current working directory.

1) ScanNetv1-Plane

Please download tfrecords data from PlaneNet, then run the following commands to convert it to npz data and generate json files required by detectron2:

# YOUR_DOWNLOADED_TFRECORDS_FOLDER: folder to host tfrecords files of PlaneNet.
python tools/generate_scannetv1_plane_dataset.py --input-folder YOUR_DOWNLOADED_TFRECORDS_FOLDER/ --output-folder $DETECTRON2_DATASETS
<!-- Change line 113 of ./PlaneRecTR/data/datasets/register_scannetv1_plane.py as follows: ```python _root = "[YOUR_OUTPUT_FOLDER]" ``` -->
2) NYUv2-Plane

Please download NYUv2-Plane dataset generated by PlaneAE from here and original NYUv2 data from here and here. The structure of the data folder should be:

YOUR_DOWNLOADED_NYUv2_FOLDER/
  nyu_depth_v2_plane_gt/ # from PlaneAE
    *.png
    *.npz
  nyu_depth_v2_labeled.mat # original NYUv2 data
  splits.mat # original NYUv2 data

Run the following command to process the downloaded files and generate the json file required by detectron2:

python tools/generate_nyuv2_plane_dataset.py --input-folder YOUR_DOWNLOADED_NYUv2_FOLDER/ --output-folder $DETECTRON2_DATASETS

Training

Please choose one of pretrained backbone models (ResNet-50, HRNet-32, Swin-B) from here and place it under the 'checkpoint/' folder. you can simply run the following commands to train on the ScanNet dataset with the corresponding config file:

# ResNet-50:
python train_net.py --num-gpus 1 --config-file configs/PlaneRecTRScanNetV1/PlaneRecTR_R50_bs16_50ep.yaml
# HRNet-32: 
python train_net.py --num-gpus 1 --config-file configs/PlaneRecTRScanNetV1/hrnet/PlaneRecTR_hrnet_w32_imagenet_pretrained.yaml
# Swin-B: 
python train_net.py --num-gpus 1 --config-file configs/PlaneRecTRScanNetV1/swin/PlaneRecTR_swin_base_384_bs16_50ep.yaml

Evaluation

Please first download our trained models from here and place them under the 'checkpoint/' folder. Run the following commands to evaluate on the ScanNetv1-Plane and NYUv2-Plane datasets:

# PlaneRecTR (ResNet-50): 
python train_net.py --eval-only --num-gpus 1 --config-file configs/{PlaneRecTRScanNetV1, PlaneRecTRNYUV2}/PlaneRecTR_R50_bs16_50ep.yaml MODEL.WEIGHTS checkpoint/PlaneRecTR_r50_pretrained.pth
# PlaneRecTR (HRNet-32):
python train_net.py --eval-only --num-gpus 1 --config-file configs/{PlaneRecTRScanNetV1, PlaneRecTRNYUV2}/hrnet/PlaneRecTR_hrnet_w32_imagenet_pretrained.yaml MODEL.WEIGHTS checkpoint/PlaneRecTR_hrnet32_pretrained.pth
# PlaneRecTR (Swin-B): 
python train_net.py --eval-only --num-gpus 1 --config-file configs/{PlaneRecTRScanNetV1, PlaneRecTRNYUV2}/swin/PlaneRecTR_swin_base_384_bs16_50ep.yaml MODEL.WEIGHTS checkpoint/PlaneRecTR_swinb_pretrained.pth 

<a name="InferenceDemo"></a>Inference demo

During inference, PLANE_MASK_THRESHOLD in config file can be modified to adjust segmentation for unseen scene. Based on the input, modify the camera K (--fx, --fy, --ox, --oy) and original image size (--original-w, --original-h).

# PlaneRecTR (ResNet-50): 
python demo/demo.py --config-file configs/PlaneRecTRScanNetV1/PlaneRecTR_R50_demo.yaml --input demo/359_d2_image.png --output demo/test_result/ --fx 517.97 --fy 517.97 --ox 320 --oy 240

<a name="CitingPlaneRecTR"></a>Citing PlaneRecTR

If you use PlaneRecTR in your research, please use the following BibTeX entry.

@InProceedings{shi2023planerectr,
  author={Shi, Jingjia and Zhi, Shuaifeng and Xu, Kai},
  title={PlaneRecTR: Unified Query Learning for 3D Plane Recovery from a Single View}, 
  booktitle={ICCV},
  year={2023}
}

Ackownledgements

PlaneRecTR is built on top of these great works: mask2former, planeTR, planeAE, and planeNet.