Awesome

Oryon: Open-Vocabulary Object 6D Pose Estimation [CVPR2024 highlight]

🔥🔥🔥Check out the new Oryon version technical report!

This is the repository that contains source code for the Oryon website and its implementation. This work featured as highlight paper at CVPR'24.

Roadmap

25/06/24: New Oryon version released
26/03/24: Code released
07/12/23: Added test and train splits
04/12/23: Website and arxiv released

Installation

First of all, download oryon_data.zip and pretrained_models.zip from the release of this repository. The first contains the ground-truth information and the specification of the image pairs used, the second contains the third-party checkpoint used in Oryon (i.e, the tokenizer and PointDSC).

Run setup.sh to install the environment and download the external checkpoints.

Running Oryon

By default all experiments folder are created in exp_data/. This can be modified in the config file. Training with default settings:

python run_train.py exp_name=baseline

Run the following to obtain results with the basic 4 configurations. By default, the last checkpoint is used.

python run_test.py -cp exp_data/baseline/ dataset.test.name=nocs test.mask=predicted

python run_test.py -cp exp_data/baseline/ dataset.test.name=nocs test.mask=oracle

python run_test.py -cp exp_data/baseline/ dataset.test.name=toyl test.mask=predicted

python run_test.py -cp exp_data/baseline/ dataset.test.name=toyl test.mask=oracle

Dataset preparation

Our data is based on three publicly available datasets:

REAL275, used for test. We sample from the real test partition.
Toyota-Light (TOYL), used for test. We sample from real the test partition from the BOP challenge.
ShapeNet6D (SN6D), used for training. Note that SN6D itself does not provide textual annotations, but it uses object models from ShapeNetSem, which do provide object names and synsets for each object model.

We sample scenes from each dataset to build the training and testing partition (20000 image pairs for SN6D and 2000 for REAL275 and TOYL), and report in the following folder the scene ids and image ids used for each partition.

REAL275 (referred as NOCS)

From the repository download the test ground-truth, the object models and the data of the real_test partition. This should result in three files: obj_models.zip, gts.zip and real_test.zip

Run the prepare_nocs.sh script to unzip and run the preprocessing.

By default this will create the nocs folder in data, and can be changed by modifying the above script.

Toyota-Light

Download the object models and the test partition from the official BOP website:

wget https://bop.felk.cvut.cz/media/data/bop_datasets/tyol_models.zip

wget https://bop.felk.cvut.cz/media/data/bop_datasets/tyol_test_bop19.zip

Run the prepare_toyl.sh script to unzip and run the preprocessing.

By default this will create the toyl folder in data, and can be changed by modifying the above script.

ShapeNet6D

Download the images from the official repository of ShapeNet6D, and the object models of ShapeNet from HuggingFace.

Run the prepare_sn6d.sh script to unzip and run the preprocessing.

Note that each image of ShapeNet6D shows a different random background, so that we consider each image as being part of a different scene. ShapeNet6D provides a map from their object ids to the object ids of the original ShapeNetSem: we use this map to associated the object name and synonym sets of ShapeNetSem to each object model in ShapeNet6D.

NB: ShapeNet6D is not currently supported for evaluation (i.e., the symmetry annotations needed by the BOP toolkit are missing).

Acknowledgements

This work was supported by the European Union’s Horizon Europe research and innovation programme under grant agreement No 101058589 (AI-PRISM), and made use of time on Tier 2 HPC facility JADE2, funded by EPSRC (EP/T022205/1).

We thank the authors of the following repositories for open-sourcing the code, on which we relied for this project:

Citing Oryon

@inproceedings{corsetti2024oryon,
  title= {Open-vocabulary object 6D pose estimation}, 
  author = {Corsetti, Jaime and Boscaini, Davide and Oh, Changjae and Cavallaro, Andrea and Poiesi, Fabio},
  journal = {IEEE/CVF Computer Vision and Pattern Recognition Conference (CVPR)},
  year = {2024}
}

Website License

The website template is from Nerfies.

<a rel="license" href="http://creativecommons.org/licenses/by-sa/4.0/"><img alt="Creative Commons License" style="border-width:0" src="https://i.creativecommons.org/l/by-sa/4.0/88x31.png" /></a><br />This work is licensed under a <a rel="license" href="http://creativecommons.org/licenses/by-sa/4.0/">Creative Commons Attribution-ShareAlike 4.0 International License</a>.