Home

Awesome

DTTD: Digital Twin Tracking Dataset Official Repository

Overview

This repository is the implementation code of the paper "Digital Twin Tracking Dataset (DTTD): A New RGB+Depth 3D Dataset for Longer-Range Object Tracking Applications" (arXiv, paper, video). Our work is accepted to CVPR 2023 Workshop on Vision Datasets Understanding.

In this work, we create a RGB-D dataset, called Digital-Twin Track-ing Dataset (DTTD), to enable further research of the problem to extend potential solutions to longer-range in a meter scale. We select Microsoft Azure Kinect as the state-of-the-art time-of-flight (ToF) camera. In total, 103 scenes of 10 common off-the-shelf objects with rich textures are recorded, with each frame annotated with a per-pixel semantic segmentation and ground-truth object poses provided by a commercial motion capturing system. We also provide source code in this repository as references to data generation and annotation pipeline in our paper.

Recent Update

Dataset File Structure

DTTD_Dataset
├── train_data_list.txt
├── test_data_list.txt
├── classes.txt
├── cameras
│   ├── az_camera1
│   └── iphone12pro_camera1 (to be released...)
├── data
│   ├── az_new_night_1
│   │   └── data
│   │   │   ├── 00001_color.jpg
│   │   │   ├── 00001_depth.png
│   │   │   ├── 00001_label_debug.png
│   │   │   ├── 00001_label.png
│   │   │   ├── 00001_meta.json
│   │   │   └── ...
|   |   └── scene_meta.yaml
│   ├── az_new_night_2
│   │   └── data
|   |   └── scene_meta.yaml
|   ...
|
└── objects
    ├── apple
    │   ├── apple.mtl
    │   ├── apple.obj
    │   ├── front.xyz
    │   ├── points.xyz
    │   ├── textured_0_etZloZLC.jpg
    │   ├── textured_0_norm_etZloZLC.jpg
    │   ├── textured_0_occl_etZloZLC.jpg
    │   ├── textured_0_roughness_etZloZLC.jpg
    │   └── textured.obj.mtl
    ├── black_expo_marker
    ├── blue_expo_marker
    ├── cereal_box_modified
    ├── cheezit_box_modified
    ├── chicken_can_modified
    ├── clam_can_modified
    ├── hammer_modified
    ├── itoen_green_tea
    ├── mac_cheese_modified
    ├── mustard_modified
    ├── pear
    ├── pink_expo_marker
    ├── pocky_pink_modified
    ├── pocky_red_modified
    ├── pocky_white_modified
    ├── pop_tarts_modified
    ├── spam_modified
    ├── tomato_can_modified
    └── tuna_can_modified

Requirements

Before running our data generation and annotation pipeline, you can activate a conda environment where Python Version >= 3.7:

conda create --name [YOUR ENVIR NAME] python = [PYTHON VERSION]
conda activate [YOUR ENVIR NAME]

then install all necessary packages:

pip install -r requirements.txt

Code Structure

What you Need to Collect your own Data

  1. OptiTrack Motion Capture system with Motive tracking software
    • This doesn't have to be running on the same computer as the other sensors. We will export the tracked poses to a CSV file.
    • Create a rigid body to track a camera's OptiTrack markers, give the rigid body the same name that is passed into tools/capture_data.py
  2. Microsoft Azure Kinect
  3. iPhone 12 Pro / iPhone 13 (to be released...)
    • Please build the project in iphone_app/ in XCode and install on the mobile device.

Data Collection Pipeline (for Azure Kinect)

Link to tutorial video: https://youtu.be/ioKmeriW650.

Configuration & Setup

  1. Place ARUCO marker somewhere visible
  2. Place markers on the corners of the aruco marker, we use this to compute the (aruco -> opti) transform
  3. Place marker positions into calculate_extrinsic/aruco_corners.yaml, labeled under keys: quad1, quad2, quad3, and quad4.

Record Data (tools/capture_data.py)

  1. Data collection
    • If extrinsic scene, data collection phase should be spent observing ARUCO marker, run tools/capture_data.py --extrinsic
  2. Example data collection scene (not extrinsic): python tools/capture_data.py --scene_name test az_camera1

Data Recording Process

  1. Start the OptiTrack recording
  2. Synchronization Phase
    1. Press c to begin recording data
    2. Observe the ARUCO marker in the scene and move the camera in different trajectories to build synchronization data
    3. Press p when finished
  3. Data Capturing Phase
    1. Press d to begin recording data
    2. If extrinsic scene, observe the ARUCO marker.
    3. If data collection scene, observe objects to track
    4. Press q when finished
  4. Stop OptiTrack recording
  5. Export OptiTrack recording to a CSV file with 60Hz report rate.
  6. Move tracking CSV file to <scene name>/camera_poses/camera_pose.csv

Process Extrinsic Data to Calculate Extrinsic (If extrinsic scene)

  1. Clean raw opti poses (tools/process_data.py --extrinsic)
  2. Sync opti poses with frames (tools/process_data.py --extrinsic)
  3. Calculate camera extrinsic (tools/calculate_camera_extrinsic.py)
  4. Output will be placed in cameras/<camera name>/extrinsic.txt

Process Data (If data scene)

  1. Clean raw opti poses (tools/process_data.py) <br> Example: <code>python tools/process_data.py --scene_name [SCENE_NAME]</code>
  2. Sync opti poses with frames (tools/process_data.py) <br> Example: <code>python tools/process_data.py --scene_name [SCENE_NAME]</code>
  3. Manually annotate first frame object poses (tools/manual_annotate_poses.py) 1. Modify ([SCENE_NAME]/scene_meta.yml) by adding (objects) field to the file according to objects and their corresponding ids.<br> Example: python tools/manual_annotate_poses.py test
  4. Recover all frame object poses and verify correctness (tools/generate_scene_labeling.py) <br> Example: <code>python tools/generate_scene_labeling.py --fast [SCENE_NAME]</code>
    1. Generate semantic labeling (tools/generate_scene_labeling.py)<br> Example: <code>python /tools/generate_scene_labeling.py [SCENE_NAME]</code>
    2. Generate per frame object poses (tools/generate_scene_labeling.py)<br> Example: <code>python tools/generate_scene_labeling.py [SCENE_NAME]</code>

Citation

If DTTD is useful or relevant to your research, please kindly recognize our contributions by citing our papers:

@InProceedings{DTTDv1,
    author    = {Feng, Weiyu and Zhao, Seth Z. and Pan, Chuanyu and Chang, Adam and Chen, Yichen and Wang, Zekun and Yang, Allen Y.},
    title     = {Digital Twin Tracking Dataset (DTTD): A New RGB+Depth 3D Dataset for Longer-Range Object Tracking Applications},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops},
    month     = {June},
    year      = {2023},
    pages     = {3288-3297}
}

@misc{DTTDv2,
      title={Robust Digital-Twin Localization via An RGBD-based Transformer Network and A Comprehensive Evaluation on a Mobile Dataset}, 
      author={Zixun Huang and Keling Yao and Seth Z. Zhao and Chuanyu Pan and Tianjian Xu and Weiyu Feng and Allen Y. Yang},
      year={2023},
      eprint={2309.13570},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

Minutia

Best Scene Collection Practices