Awesome
ZS6D
<img src="./assets/overview.png" width="500" alt="teaser"/>We demonstrate the effectiveness of deep features extracted from self-supervised, pre-trained Vision Transformer (ViT) for Zero-shot 6D pose estimation. For more detailed information check out the corresponding [paper].
Overview of the Pipeline:
Note that this repo only deals with 6D pose estimation, you need segmentation masks as input. These can be obtained with supervised trained methods or zero-shot methods. For zero-shot we refer to cnos.
Installation:
To setup the environment to run the code locally follow these steps:
conda env create -f environment.yml
conda activate zs6d
git submodule update --init --recursive
Otherwise, run the following commands:
conda create --name zs6d python=3.9
conda activate zs6d
conda install pytorch==2.0.1 torchvision==0.15.2 torchaudio==2.0.2 pytorch-cuda=11.7 -c pytorch -c nvidia
pip install tqdm==4.65.0
pip install timm==0.9.16
pip install matplotlib==3.8.3
pip install scikit-learn==1.4.1.post1
pip install opencv-python==4.9.0
pip install git+https://github.com/lucasb-eyer/pydensecrf.git@dd070546eda51e21ab772ee6f14807c7f5b1548b
pip install transforms3d==0.4.1
pip install pillow==9.4.0
pip install plyfile==1.0.3
pip install trimesh==4.1.4
pip install imageio==2.34.0
pip install pypng==0.20220715.0
pip install vispy==0.12.2
pip install pyopengl==3.1.1a1
pip install pyglet==2.0.10
pip install numba==0.59.0
pip install jupyter==1.0.0
git submodule update --init --recursive
Docker setup:
ROS integration:
To run the ros wrapper do the following:
- set up the NVIDIA container toolkit
- download ycbv templates from this link and put the ycbv folder into ./templates
- edit camera intrinsics and obj name mappings in
zs6d_configs/bop_eval_configs/cfg_ros_ycbv_inference_bop.json
. The keys in the object name mapping are the names that are passed to the pose estimator and the values are the bop object ids (as strings). - Set the ROS_IP and ROS_MASTER_URI in
ros_entrypoint.sh
. - update the submodules with
git submodule update --init --recursive
- Build the docker image with
docker build -t zs6d .
- Allow the docker container to access the display by running
xhost local:docker
. - Run the docker container with the following command:
docker run -it --rm --runtime nvidia --privileged -e DISPLAY=${DISPLAY} -e NVIDIA_DRIVER_CAPABILITIES=all -v /PATH_TO_REPOSITORY:/code -v /PATH_TO_REPOSITORY/torch_cache:/root/.cache/torch -v /tmp/.X11-unix:/tmp/.X11-unix torch --net host zs6d /bin/bash
Once you are inside the docker container, run the following command to check if the docker container has access to the GPU:
glxinfo | grep "OpenGL version string"
The output should be OpenGL version string: > 2.xx
and show the nvidia driver version.
Afterwards, run the following command to calculate the descriptors for the templates and prepare the ground truth:
python prepare_templates_and_gt.py
All of the previous step only have to be done once per machine.
The following commands have to be run every time you want to start the ros container:
xhost local:docker
docker run -it --rm --runtime nvidia --privileged -e DISPLAY=${DISPLAY} -e NVIDIA_DRIVER_CAPABILITIES=all -v /PATH_TO_REPOSITORY:/code -v /PATH_TO_REPOSITORY/torch_cache:/root/.cache/torch -v /tmp/.X11-unix:/tmp/.X11-unix torch --net host zs6d
Template rendering:
To generate templates from a object model to perform inference, we refer to the ZS6D_template_rendering repository.
Template preparation:
- set up a config file for template preparation
zs6d_configs/template_gt_preparation_configs/your_template_config.json
- run the preparation script with your config_file to generate your_template_gt_file.json and prepare the template descriptors and template uv maps
python3 prepare_templates_and_gt.py --config_file zs6d_configs/template_gt_preparation_configs/your_template_config.json
Inference:
After setting up your_template_config.json you can instantiate your ZS6D module and perform inference. An example is provided in:
test_zs6d.ipynb
Evaluation on BOP Datasets:
- set up a config file for BOP evaluation
zs6d_configs/bop_eval_configs/your_eval_config.json
- Create a ground truth file for testing, the files for BOP'19-23 test images are provided for lmo, tless and ycbv. For example for lmo:
gts/test_gts/lmo_bop_test_gt_sam.json
Additionally, you have to download the corresponding BOP test images. If you want to test another dataset as the provided, you have to generate a ground truth file with the following structure:
{
"object_id": [
{
"scene_id": "00001",
"img_name": "relative_path_to_image/image_name.png",
"obj_id": "..",
"bbox_obj": [],
"cam_t_m2c": [],
"cam_R_m2c": [],
"cam_K":[],
"mask_sam": [] // mask in RLE encoding
}
,...
]
}
- run the evaluation script with your_eval_config.json
python3 prepare_templates_and_gt.py --config_file zs6d_configs/template_gt_preparation_configs/your_eval_config.json
Acknowledgements
This project is built upon dino-vit-features, which performed a very comprehensive study about features of self-supervised pretrained Vision Transformers and their applications, including local correspondence matching. Here is a link to their paper. We thank the authors for their great work and repo.
Citation
If you found this repository useful please consider starring ⭐ and citing :
@inproceedings{ausserlechner2024zs6d,
title={Zs6d: Zero-shot 6d object pose estimation using vision transformers},
author={Ausserlechner, Philipp and Haberger, David and Thalhammer, Stefan and Weibel, Jean-Baptiste and Vincze, Markus},
booktitle={2024 IEEE International Conference on Robotics and Automation (ICRA)},
pages={463--469},
year={2024},
organization={IEEE}
}