Home

Awesome

Pix2Pose

Original implementation of the paper, Kiru Park, Timothy Patten and Markus Vincze, "Pix2Pose: Pix2Pose: Pixel-Wise Coordinate Regression of Objects for 6D Pose Estimation", ICCV 2019, https://arxiv.org/abs/1908.07433

Notice

The Resnet-50 backbone, which can be initialized with weights for ImageNet, is supported instead of the original encoder network, which performs better (in terms of accuracy).

For the YCB-Video dataset, the improvements are (in terms of the BOP score):

You can download the weights for the YCB-Video dataset using Resnet-50 here

To use the resnet-50 backbone, add

"backbone":"resnet50"

in the config json file. (e.g., cfg/cfg_bop_2019.json or ros_config.json). Please make sure the repository is up-to-date.

Requirements:

For detection pipelines,

git clone https://github.com/matterport/Mask_RCNN.git
git clone https://github.com/fizyr/keras-retinanet.git

Citation

If you use this code, please cite the following

@InProceedings{Park_2019_ICCV,
author = {Park, Kiru and Patten, Timothy and Vincze, Markus},
title = {Pix2Pose: Pix2Pose: Pixel-Wise Coordinate Regression of Objects for 6D Pose Estimation},
booktitle = {The IEEE International Conference on Computer Vision (ICCV)},
month = {Oct},
year = {2019}
}

Run the recognition for BOP datasets

The original codes are updated to support the format of the most recent 6D pose benchmark, BOP: Benchmark for 6D Object Pose Estimation

  1. Download a dataset from the BOP website and extract files in a folder
    • e.g.) <path_to_dataset>/<dataset_name>
    • For the recognition, "Base archive", "Object models", and "Test images" have to be downloaded at least.
  2. Download and extract weights in the same dataset folder used in 1.
  3. Make sure the directories follows the structure below.
    • <path_to_dataset>/<dataset_name>/models or model_eval or model_recont..: model directory that contains .ply files of models
    • <path_to_dataset>/<dataset_name>/models_xyz: norm_factor.json and .ply files of colorized 3d models
    • <path_to_dataset>/<dataset_name>/weight_detection: weight files for the detection
    • <path_to_dataset>/<dataset_name>/pix2pose_weights/<obj_name>/inference.hdf5 : weight files for each objects
  4. Set config file
    1. Set directories properly based on your environment
    2. For the bop challenge dataset: <path_to_src>/cfg/cfg_bop2019.json
    3. Use trained weights for the paper: <path_to_src>/cfg/cfg_<dataset_name>_paper.json (e.g., cfg_tless_paper.json)
    4. score_type: 1-scores from a 2D detetion pipeline is used (used for the paper), 2-scores are caluclated using detection score+overlapped mask (only supported for Mask RCNN, used for the BOP challenge)
    5. task_type : 1 - SiSo task (2017 BOP Challenge), 2 - ViVo task (2019 BOP challenge format)
    6. cand_factor: a factor for the number of detection candidates
  5. Execute the script
python3 tools/5_evaluation_bop_basic.py <gpu_id> <cfg_path> <dataset_name>

to run with the 3D-ICP refinement,

python3 tools/5_evaluation_bop_icp3d.py <gpu_id> <path_cfg_json> <dataset_name>
  1. The output will be stored in the 'path_to_output' in csv format, which can be used to calculate metric using bop_toolkit.

Important Note Differ from the paper, we used multiple outlier thresholds in the second stage for the BOP challenge since it is not allowed to have different parameters for each object or each dataset. This can be done easily by set the "outlier_th" in a 1D-array (refer to cfg_bop2019.json). In this setup, the best result, which has the largest inlier points, will be derived during estimation after applying all values in the second stage. To reproduce the results in the paper with fixed outlier threshold values, a 2D-array should be given as in "cfg_tless_paper.json")

(Optional) Environment setup using Docker

  1. Build Dockerfile docker build -t <container_name> .
  2. Start the container with
nvidia-docker run -it -v <dasetdir>:/bop -v <detection_repo>:<detection_dir> -v <other_dir>:<other_dir> <container_name> bash

ROS interface (tested with ROS-Kinetic)

export PYTHONPATH=/usr/local/lib/python3.5/dist-packages:$PYTHONPATH(including other ROS related pathes)

Training for a new dataset

We assume the dataset is organized in the BOP 2019 format. For a new dataset (not in the BOP), modify bop_io.py properly to provide proper directories for training. Theses training codes are used to prepare and train the network for the BOP 2019.

1. Convert 3D models to colored coodinate models

python3 tools/2_1_ply_file_to_3d_coord_model <cfg_path> <dataset_name>

The file converts 3D models and save them to the target folder with a dimension information in a file, "norm_factor.json".

2. Render and generate training pairs

python3 tools/2_2_render_pix2pose_training.py <cfg_path> <dataset_name>

3. Train pix2pose network for each object

python3 tools/3_train_pix2pose.py <cfg_path> <dataset_name> <obj_name> [background_img_folder]

4. Convert the last wegiht file to an inference file.

python3 tools/4_convert_weights_inference.py <pix2pose_weights folder>

This program looks for the last weight file in each directory

5. [Optional] Training of 2D detection pipelines (if required, skip this when you have your own 2D detection pipeline)

(1) Generation of images for 2D detection training
python3 tools/1_1_scene_gen_for_detection.py <cfg_path> <dataset_name> <mask=1(true)/0(false)>

Output files

(2) Train Mask-RCNN or Keras-Retinanet

To train Mask-RCNN, the pre-trained weight for the MS-COCO dataset should be place in <path/to/Mask-RCNN>/mask_rcnn_coco.h5.

python3 tools/1_2_train_maskrcnn.py <cfg_path> <dataset_name>

or Train Keras-retinanet using the script in the repository. It is highly recommended to initialize the network using the weights trained for the MS-COCO dataset. link

keras_retinanet/bin/train.py csv <path_to_dataset>/gt.csv <path_to_dataset>/label.csv --freeze-backbone --weights resnet50_coco_best_v2.1.0.h5

After training, the weights should be converted into inference model by,

keras_retinanet/bin/convert_model.py /path/to/training/model.h5 /path/to/save/inference/model.h5

Disclaimers:


Download pre-trained weights


Download: trained weights for the BOP challenge 2019

For the BOP challenge, we used Mask-RCNN to measure a score values for the current estimations using ovelapped ratios between masks from Mask-RCNN and the Pix2Pose estimation. All the hyperparameters, including augmentation, are set to the same for all datasets during the training and test. (33K iterations using 50 images in a mini batch)

These trained weights here are used to submit the results of core datasets in the BOP Challenge 2019. Due to the large number of objects for training, the number of iterations are reduced (200 epoch --> 100 epoch).

Download the zip files and extract them to the bop dataset folder e.g., for hb, the extracted files should placed in

Contributors: