Awesome
Oryon: Open-Vocabulary Object 6D Pose Estimation [CVPR2024 highlight]
🔥🔥🔥Check out the new Oryon version technical report!
This is the repository that contains source code for the Oryon website and its implementation. This work featured as highlight paper at CVPR'24.
Roadmap
- 25/06/24: New Oryon version released
- 26/03/24: Code released
- 07/12/23: Added test and train splits
- 04/12/23: Website and arxiv released
Installation
First of all, download oryon_data.zip
and pretrained_models.zip
from the release of this repository.
The first contains the ground-truth information and the specification of the image pairs used, the second contains the third-party checkpoint used in Oryon (i.e, the tokenizer and PointDSC).
Run setup.sh
to install the environment and download the external checkpoints.
Running Oryon
By default all experiments folder are created in exp_data/
.
This can be modified in the config file.
Training with default settings:
python run_train.py exp_name=baseline
Run the following to obtain results with the basic 4 configurations. By default, the last checkpoint is used.
python run_test.py -cp exp_data/baseline/ dataset.test.name=nocs test.mask=predicted
python run_test.py -cp exp_data/baseline/ dataset.test.name=nocs test.mask=oracle
python run_test.py -cp exp_data/baseline/ dataset.test.name=toyl test.mask=predicted
python run_test.py -cp exp_data/baseline/ dataset.test.name=toyl test.mask=oracle
Dataset preparation
Our data is based on three publicly available datasets:
- REAL275, used for test. We sample from the real test partition.
- Toyota-Light (TOYL), used for test. We sample from real the test partition from the BOP challenge.
- ShapeNet6D (SN6D), used for training. Note that SN6D itself does not provide textual annotations, but it uses object models from ShapeNetSem, which do provide object names and synsets for each object model.
We sample scenes from each dataset to build the training and testing partition (20000 image pairs for SN6D and 2000 for REAL275 and TOYL), and report in the following folder the scene ids and image ids used for each partition.
REAL275 (referred as NOCS)
From the repository download the test ground-truth, the object models and the data of the real_test
partition. This should result in three files: obj_models.zip
, gts.zip
and real_test.zip
Run the prepare_nocs.sh
script to unzip and run the preprocessing.
By default this will create the nocs
folder in data
, and can be changed by modifying the above script.
Toyota-Light
Download the object models and the test partition from the official BOP website:
wget https://bop.felk.cvut.cz/media/data/bop_datasets/tyol_models.zip
wget https://bop.felk.cvut.cz/media/data/bop_datasets/tyol_test_bop19.zip
Run the prepare_toyl.sh
script to unzip and run the preprocessing.
By default this will create the toyl
folder in data
, and can be changed by modifying the above script.
ShapeNet6D
Download the images from the official repository of ShapeNet6D, and the object models of ShapeNet from HuggingFace.
Run the prepare_sn6d.sh
script to unzip and run the preprocessing.
Note that each image of ShapeNet6D shows a different random background, so that we consider each image as being part of a different scene. ShapeNet6D provides a map from their object ids to the object ids of the original ShapeNetSem: we use this map to associated the object name and synonym sets of ShapeNetSem to each object model in ShapeNet6D.
NB: ShapeNet6D is not currently supported for evaluation (i.e., the symmetry annotations needed by the BOP toolkit are missing).
Acknowledgements
This work was supported by the European Union’s Horizon Europe research and innovation programme under grant agreement No 101058589 (AI-PRISM), and made use of time on Tier 2 HPC facility JADE2, funded by EPSRC (EP/T022205/1).
We thank the authors of the following repositories for open-sourcing the code, on which we relied for this project:
Citing Oryon
@inproceedings{corsetti2024oryon,
title= {Open-vocabulary object 6D pose estimation},
author = {Corsetti, Jaime and Boscaini, Davide and Oh, Changjae and Cavallaro, Andrea and Poiesi, Fabio},
journal = {IEEE/CVF Computer Vision and Pattern Recognition Conference (CVPR)},
year = {2024}
}
Website License
The website template is from Nerfies.
<a rel="license" href="http://creativecommons.org/licenses/by-sa/4.0/"><img alt="Creative Commons License" style="border-width:0" src="https://i.creativecommons.org/l/by-sa/4.0/88x31.png" /></a><br />This work is licensed under a <a rel="license" href="http://creativecommons.org/licenses/by-sa/4.0/">Creative Commons Attribution-ShareAlike 4.0 International License</a>.