Home

Awesome

<img align="left" style="padding: 5px" src="assets/pegasus_eye.png" height="120px"/> <em>PEGASUS</em>: Physically Enhanced Gaussian Splatting Simulation System for 6DoF Object Pose Dataset Generation

Lukas Meyer*, Floris Erich, Yusuke Yoshiyasu, Marc Stamminger, Noriaki Ando, Yukiyasu Domae<br> <span style="font-size:0.5em;">*This work was conducted during an internship at the National Institute of Advanced Industrial Science and Technology.</span> <br>

| Webpage | Full Paper | Ramen Dataset (~50 GB) | PEGASET (~50 GB) |<br> Teaser image

We introduce Physical Enhanced Gaussian Splatting Simulation System (PEGASUS) for 6DOF object pose dataset generation, a versatile dataset generator based on 3D Gaussian Splatting. Preparation starts by separate scanning of both environments and objects. PEGASUS allows the composition of new scenes by merging the respective underlying Gaussian Splatting point cloud of an environment with one or multiple objects. Leveraging a physics engine enables the simulation of natural object placement within a scene by interacting with their extracted mesh. Consequently, an extensive amount of new scenes - static or dynamic - can be created by combining different environments and objects. By rendering scenes from various perspectives, diverse data points such as RGB images, depth maps, semantic masks, and 6DoF object poses can be extracted. Our study demonstrates that training on data generated by PEGASUS enables pose estimation networks to successfully transfer from synthetic data to real-world data. Moreover, we introduce the CupNoodle dataset, comprising 30 Japanese cup noodle items. This dataset includes spherical scans that captures images from both object hemisphere and the Gaussian Splatting reconstruction, making them compatible with PEGASUS.

<p align="center"> <a href="https://unit.aist.go.jp/icps/icps-am/en/"><img style="padding: 10px" height="150px" src="assets/aist.png"> </a> <a href="https://www.fau.eu/"><img style="padding: 10px" height="150px" src="assets/FAU.png"> </a> <a href="https://www.lgdv.tf.fau.de/"><img style="padding: 10px" height="150px" src="assets/vce.svg"> </a> </p>

Funding and Acknowledgments

This paper is one of the achievements of joint research with and is owned copyrighted material of ROBOT Industrial Basic Technology Collaborative Innovation Partnership. This research has been supported by the New Energy and Industrial Technology Development Organization (NEDO), under the project ID JPNP20016.

Cloning the Repository

The repository contains submodules, thus please check it out with

git clone https://github.com/meyerls/PEGASUS.git --recursive # HTTPS
git submodule update --init --recursive

Requirements

The coda has been tested with the following dependencies:

Setup

Our default, provided install method is based on Conda package and is provided by the following script. This script has to be executed in the top layer of the repository. Currently, the setup script has only be tested on Ubuntu 20. An installation on windows should be possible but will not be provided in this repo.

./setup.sh

Overview

Teaser image

PEGASUS contains of three main components:

GS Base Environment reconstruction

<details> <summary>Click me</summary>

Will be updated soon

</details>

GS Object Reconstruction

<details> <summary>Click me</summary>

Will be updated soon! Not yet complete

For object reconstruction we provide two different processing weights. The first is scanning objects in the wild by taking videos from both sides of the object and the second one is using a camera rig to scan the object on a turntable. The first approach uses XMEM to create a segmentation mask of the selected object. For scanning one has to place only an aruco marker into the scene to obtain the correct scale. The turntable approach uses an arbitrary calibration object (I have used a texture rich paper with an aruco marker) to reuse its precomputed camera poses. A detailed workflow is provided in the following section.

In the Wild scanning

The workflow for scanning objects in the wild is:

1. Select Object

Currently it does not work for texture poor objects. Therefore the camera rig is more suitable. The reason is that computing the poses and also registering images from the bottom view does simply not work with COLMAP. Place the object onto a planer scene such as a table and make sure to move all around the object.

2. Aruco Marker

Print out an aruco marker and place it next to the object. For scaling the object measure and note down the size of the square aruco marker. A website to create aruco marker can be found here.

3. Scanning

Record two videos with your phone camera or DSLR camera (We have used an iphone 12 in our example). The first video contains a hemispherical scan of the top view of the object. Try to cover a 360 degree view at 2-3 different height levels. For the second video this process must be repeated for the flipped object.

4. Segmentation Mask

For extracting the semantic masks of the video we used XMEM.

XMEM can be started from the root directory of PEGASUS:

python submodules/XMem/interactive_demo.py --video[path to the video] --num_objects 1 --size -1
<img align="right" style="padding: 10px" src="assets/xmem.png" alt="drawing" width="400px"/>

In the XMEM GUI select the object you want to extract (the object should be highlighted in red). Afterward click the button Forward Propagate (<font color='red'>1</font>) to extract the masks. Depending on the video length it takes around 1-2 min. To save the detected masks click on Export Overlays as Video (<font color='red'>2</font>) to save the binary masks as images. More info on how to use XMEM can be found here.

Note: please select the image size according to your GPU size or the quality you want to get. -1 uses the original image size. If you set a value it will resize the image according to its shorter side.

6. Dataset Integration

First both extracted images and masks have to be put into a common folder. This folder should be placed in a dataset folder where multiple reconstructed objects can be stored.

.
└── bouillon 
    ├── down
    │   ├── images
    │   ├── masks
    └── up
        ├── images
        └── masks

To use the scanned object and included it in PEGASUS one has to define the object as a Dataset-Object in in_the_wild_dataset.py. The class (here Bouillon) name takes the name of the object.

class Bouillon(InTheWild):
    OBJECT_NAME = 'bouillon'
    ID = 201
    TYPE = 'object'
    RECORDING_TYPE = 'spherical'  # 'spherical' or 'hemispherical'
    ALPHA = 0.3
    DATASET_TYPE = 'wild'
    ARUCO_SIZE = 0.037  # in meter

    def __init__(self, dataset_path):
        super().__init__(dataset_path=Path(dataset_path))
<img align="right" style="padding: 10px" src="assets/reconstruction.png" alt="drawing" width="400px"/>
7. GS Reconstruction
python src/reconstruction/in_the_wild_object_reconstruction.py
8. Integrate into PEGASUS
</details>

Available Objects (Ramen Dataset and PEGASET)

We provide two different datasets. The IDs for the Ramen dataset are between 101 and 130. The YCB-V IDs are identical to the original YCB-V ids.

Ramen Dataset

The Ramen Dataset contains out of 30 cup noodle objects and 9 environments.

<p align="center"> <img style="padding: 5px" src="assets/konbini_dataset.jpg" alt="drawing" width="500px"/> </p> <p align="center"> <img style="padding: 5px" src="assets/cup_noodle_04.gif" alt="drawing" width="250px"/> <img style="padding: 5px" src="assets/cup_noodle_15.gif" alt="drawing" width="250px"/> </p>
.
└── Dataset 
    ├── calibration
    │   ├── ...
    ├── environment
    │   ├── ...
    ├── object
    │   ├── ...
    └── urdf
        └── ...

PEGASET

The PEGASET contains out of the well known 21 YCB-V and 9 environments.

<p align="center"> <img style="padding: 5px" src="assets/appendix_ycb.png" alt="drawing" width="500px"/> </p> <p align="center"> <img style="padding: 5px" src="assets/cracker_box.gif" alt="drawing" width="250px"/> <img style="padding: 5px" src="assets/yellow_mustard.gif" alt="drawing" width="250px"/> </p>
.
└── Dataset 
    ├── calibration
    │   ├── ...
    ├── environment
    │   ├── ...
    ├── object
    │   ├── ...
    └── urdf
        └── ...

PEGASUS Dataset Extraction

Before rendering a dataset the dataset provided for PEGASUS must have been downloaded from Ramen Dataset or PEGASET. If you use both dataset you should merge both into one folder structure.

All objects and environments which are relevant for dataset generation should be added to the obj_list and env_list.

Parameters:

BibTex

@Article{PEGASUS2024,
      author       = {Meyer, Lukas and Erich, Floris and Yoshiyasu, Yusuke and Stamminger, Marc and Ando, Noriaki and Domae, Yukiyasu },
      title        = {PEGASUS: Physical Enhanced Gaussian Splatting Simulation System for 6DOF Object Pose Dataset Generation},
      journal      = {IROS},
      month        = {October},
      year         = {2024},
      url          = {https://meyerls.github.io/pegasus_web}
}

Thanks to the authors of 3D Gaussians for their excellent code, please consider to also cite this repository:

@Article{kerbl3Dgaussians,
      author       = {Kerbl, Bernhard and Kopanas, Georgios and Leimk{\"u}hler, Thomas and Drettakis, George},
      title        = {3D Gaussian Splatting for Real-Time Radiance Field Rendering},
      journal      = {ACM Transactions on Graphics},
      number       = {4},
      volume       = {42},
      month        = {July},
      year         = {2023},
      url          = {https://repo-sam.inria.fr/fungraph/3d-gaussian-splatting/}
}

And thanks to authors of the BOP Toolkit for their benchmark for 6D object pose estimation interface.