Home

Awesome

DProST: Dynamic Projective Spatial Transformer Network

Pytorch implementation of "DProST: Dynamic Projective Spatial Transformer Network for 6D Pose Estimation"[ECCV2022]
DProST is a 6D-object pose estimation method based on grid matching in object space.

[ECCV][ArXiv]

Overview of DProST

Predicting the object's 6D pose from a single RGB image is a fundamental computer vision task. Generally, the distance between transformed object vertices is employed as an objective function for pose estimation methods. However, projective geometry in the camera space is not considered in those methods and causes performance degradation. In this regard, we propose a new pose estimation system based on a projective grid instead of object vertices. Our pose estimation method, dynamic projective spatial transformer network (DProST), localizes the region of interest grid on the rays in camera space and transforms the grid to object space by estimated pose. The transformed grid is used as both a sampling grid and a new criterion of the estimated pose. Additionally, because DProST does not require object vertices, our method can be used in a mesh-less setting by replacing the mesh with a reconstructed feature. Experimental results show that mesh-less DProST outperforms the state-of-the-art mesh-based methods on the LINEMOD and LINEMOD-OCCLUSION dataset, and shows competitive performance on the YCBV dataset with mesh data.

Installation

git clone https://github.com/parkjaewoo0611/DProST.git
cd DProST
source source_install.sh

Dataset

./source_download.sh
Dataset
├──LINEMOD
    ├──dataset_info.md
    ├──train.pickle
    ├──test.pickle
    ├──train_pbr.pickle
    ├──train_syn.pickle
    ├──index
        ├──ape_test.txt
        ...
    ├──models
        ├──models_info.json
        ├──obj_000001.ply
        ...
    ├──pbr
        ├──000000
        ...
    ├──test
        ├──000001
        ...
    ├──test_bboxes
        ├──bbox_faster_all.json
        ├──bbox_yolov3_all.json
    ├──train
        ├──000001
        ...
    ├──syn (optional)
        ├──ape
        ...
    ├──backgrounds (optional)
        ├──2007_000027.jpg
        ...
├──OCCLUSION
├──YCBV

Train

python train.py --gpu_id 0 --data_dir Dataset/LINEMOD --is_toy true --batch_size 1
Ex)
# LINEMOD ape object on gpu 0
python train.py --gpu_id 0 --data_dir Dataset/LINEMOD --use_mesh false --obj_list 1 --mode train_pbr 

# LINEMOD each object on gpu 0,1,2 (with gpu_scheduling)
simple_gpu_scheduler --gpus 0 1 2 < gpu_commands.txt

# OCCLUSION all objects on gpu 0,1,2 (with DDP)
python train.py --gpu_id 0 1 2 --data_dir Dataset/OCCLUSION --use_mesh false --obj_list 1 5 6 8 9 10 11 12 --mode train_pbr

# train YCBV 002_master_chef_can object with mesh on gpu 0 
python train.py --gpu_id 0 --data_dir Dataset/YCBV --use_mesh true --obj_list 1 --mode train_pbr --epochs 300 --save_period 10 --early_stop 100 -- lr_step_size 200 --valid_metrics ADD_S_AUC

Test

Pre-trained Model

Ex)
# evaluate & visualize result
python test.py -r pretrained/LMO_all/all/model/model_best.pth -p pretrained/LMO_all/all/result -v true --gpu_scheduler false --data_dir Dataset/OCCLUSION
-r: Path to pretrained model & config.json.  
-p: Path to save the visualize qualitative results.  
-v: boolean option to visualization.  
--gpu_scheduler: turn off the gpu_scheduler in test.  

Qualitatiave Results

For each gray cell, the first row shows gt, the second row shows the initial pose, and the third and fourth-row show results of the first iteration and second iteration, respectively. Also, the first column shows the contour of the projection, the second column shows the projection of the reference feature, and the third row shows the object space grid. Blue contour shows the predicted pose (or initial pose), and green contour shows the gt pose.

Acknowledgements

Our project is based on the following projects. We appreciate the authors for sharing their great code and dataset.

Citation

@inproceedings{park2022dprost,
  title={DProST: Dynamic Projective Spatial Transformer Network for 6D Pose Estimation},
  author={Park, Jaewoo and Cho, Nam Ik},
  booktitle={European Conference on Computer Vision},
  pages={363--379},
  year={2022},
  organization={Springer}
}