

DTTD: Digital-Twin Tracking Dataset Official Repository


This repository is the implementation code of the paper "Digital-Twin Tracking Dataset (DTTD): A Time-of-Flight 3D Object Tracking Dataset for High-Quality AR Applications".

In this work we create a novel RGB-D dataset, Digital-Twin Tracking Dataset (DTTD), to enable further research of the digital-twin tracking problem in pursuit of a Digital Twin solution. In our dataset, we select two time-of-flight (ToF) depth sensors, Microsoft Azure Kinect and Apple iPhone 12 Pro, to record 100 scenes each of 16 common purchasable objects, each frame annotated with a per-pixel semantic segmentation and ground truth object poses. We also provide source code in this repository as references to data generation and annotation pipeline in our paper.

Dataset File Structure

├── train_data_list.txt
├── test_data_list.txt
├── classes.txt
├── cameras
│   └── iphone14pro_camera1 (to be released...)
├── data
│   ├── scene_1
│   │   └── data
│   │   │   ├── 00001_color.jpg
│   │   │   ├── 00001_depth.png
│   │   │   ├── 00001_label_debug.png
│   │   │   ├── 00001_label.png
│   │   │   ├── 00001_meta.json
│   │   │   └── ...
|   |   └── scene_meta.yaml
│   ├── scene_2
│   │   └── data
|   |   └── scene_meta.yaml
|   ...
└── objects
    ├── apple
    │   ├── apple.mtl
    │   ├── apple.obj
    │   ├── front.xyz
    │   ├── points.xyz
    │   ├── textured_0_etZloZLC.jpg
    │   ├── textured_0_norm_etZloZLC.jpg
    │   ├── textured_0_occl_etZloZLC.jpg
    │   ├── textured_0_roughness_etZloZLC.jpg
    │   └── textured.obj.mtl
    ├── black_expo_marker
    ├── blue_expo_marker
    ├── cereal_box_modified
    ├── cheezit_box_modified
    ├── chicken_can_modified
    ├── clam_can_modified
    ├── hammer_modified
    ├── itoen_green_tea
    ├── mac_cheese_modified
    ├── mustard_modified
    ├── pear
    ├── pink_expo_marker
    ├── pocky_pink_modified
    ├── pocky_red_modified
    ├── pocky_white_modified
    ├── pop_tarts_modified
    ├── spam_modified
    ├── tomato_can_modified
    └── tuna_can_modified


Before running our data generation and annotation pipeline, you can activate a conda environment where Python Version >= 3.7:

conda create --name [YOUR ENVIR NAME] python = [PYTHON VERSION]
conda activate [YOUR ENVIR NAME]

then install all necessary packages:

pip install -r requirements.txt

Code Structure

Dataset Structure

Final dataset output:

What you Need to Collect your own Data

  1. OptiTrack Motion Capture system with Motive tracking software
    • This doesn't have to be running on the same computer as the other sensors. We will export the tracked poses to a CSV file.
    • Create a rigid body to track a camera's OptiTrack markers, give the rigid body the same name that is passed into tools/capture_data.py
  2. Microsoft Azure Kinect
  3. iPhone 14 pro
    • Please build the project in iphone_app/ in XCode and install on the mobile device.

iphone Data Collection Pipeline

Configuration & Setup

  1. Place ARUCO marker somewhere visible.
  2. Put 5 markers on the body of the iPhone, create ridge body named iPhone14Pro_camera in the OptiTrack software.

Caculate Extrinsic Process (Re-caculate the extrinsic on the start of collection day)

Data Collection Step

  1. Place markers on the corners of the aruco marker, in the order from down-left, down-right, up-right, up-left. We use this to compute the (aruco -> opti) transform.
  2. Place marker positions into calculate_extrinsic/aruco_corners.yaml, labeled under keys: quad1, quad2, quad3, and quad4.
  3. Start the OptiTrack recording.
  4. Synchronization Phase
    1. Press start calibration on iphone to begin recording data.
    2. Observe the ARUCO marker in the scene and move the camera in different trajectories to build synchronization data (back and forth 2 to 3 times, slowly).
    3. Press stop calibration when finished.
  5. Data Capturing Phase
    1. Press start collection to begin recording data.
    2. Observe the ARUCO marker while moving around the marker. (Perform 90-180 revolution around the marker, one way.)
    3. Press stop collection when finished.
  6. Stop OptiTrack recording.
  7. Export OptiTrack recording to a CSV file with 60Hz report rate.
  8. Move tracking CSV file to /extrinsics_scenes/<scene name>/camera_poses/camera_poses.csv.
  9. Export the app_data to /extrinsics_scenes/<scene name>/iphone_data.
  10. Move the timestamps.csv to /extrinsics_scenes/<scene name>.

Process Data and Calcualte Extrinsic

  1. Convert iPhone data formats to Kinect data formats (tools/process_iphone_data.py)
    • This tool converts everything to common image names, formats, and does distortion parameter fitting.
    • Code: <code> python tools/process_ipone_data.py <camera_name> --depth_type <depth_type> --scene_name <scene_name> --extrinstic </code>
  2. Clean raw opti poses and Sync opti poses with frames (tools/process_data.py --extrinsic)
    • Code: <code> python tools/process_data.py —-scene_name <scene_name> —-extrinstic </code>
  3. Calculate camera extrinsic (tools/calculate_camera_extrinsic.py)
    • Code: <code> python tools/caculate_camera_extrinsic.py —-scene_name <scene_name> </code>
  4. Output will be placed in cameras/<camera name>/extrinsic.txt

Scene Collection Process (objects 6DoF pose collections)

Data Collection Step

  1. Setup LiDARDepth APP (ARKit version) using Xcode (Need to reinstall before each scene).
  2. Start the OptiTrack recording.
  3. Synchronization Phase.
    1. Press start calibration to begin recording data.
    2. Observe the ARUCO marker in the scene and move the camera in different trajectories to build synchronization data (back and forth 2 to 3 times, slowly).
    3. Press end calibration when finished.
  4. Data Capturing Phase
    1. cover the ARUCO marker.
    2. Press Start collection to begin recording data.
    3. Observe the objects while moving around. (Perform 90-180 revolution around the objects, one way.)
    4. Press End collection when finished.
  5. Stop OptiTrack recording.
  6. Export OptiTrack recording to a CSV file with 60Hz report rate.
  7. Move tracking CSV file to scenes/<scene name>/camera_poses/camera_poses.csv.
  8. Export the app_data to scenes/<scene name>/iphone_data.
  9. Move the timestamps.csv to scenes/<scene name>.

Process Data

  1. Convert iPhone data formats to Kinect data formats (tools/process_iphone_data.py)
    • This tool converts everything to common image names, formats, and does distortion parameter fitting
    • Code: <code> python tools/process_ipone_data.py <camera_name> --depth_type <depth_type> --scene_name <scene_name> </code>
  2. Clean raw opti poses and Sync opti poses with frames (tools/process_data.py)
    • Code: <code> python tools/process_data.py —-scene_name [SCENE_NAME] </code>

Anotation Process

  1. Manually annotate the first few frame of the object poses (tools/manual_annotate_poses.py).
    • Modify ([SCENE_NAME]/scene_meta.yml) by adding (objects) field to the file according to objects and their corresponding ids.<br>
    • Code: python tools/manual_annotate_poses.py [SCENE_NAME]
    • Check the control instructions in the pose_refinement/README.md.
  2. Recover all frame object poses and verify correctness (tools/generate_scene_labeling.py) <br>
    • Generate semantic labeling and adjust per frame object poses (tools/generate_scene_labeling.py)<br>
    • Code: <code>python /tools/generate_scene_labeling.py [SCENE_NAME]</code>


Best Scene Collection Practices