Awesome

GTA-IM Dataset [Website]

Long-term Human Motion Prediction with Scene Context, ECCV 2020 (Oral) PDF Zhe Cao, Hang Gao, Karttikeya Mangalam, Qi-Zhi Cai, Minh Vo, Jitendra Malik.

This repository maintains our GTA Indoor Motion dataset (GTA-IM) that emphasizes human-scene interactions in the indoor environments. We collect HD RGB-D image seuqences of 3D human motion from realistic game engine. The dataset has clean 3D human pose and camera pose annoations, and large diversity in human appearances, indoor environments, camera views, and human activities.

Table of contents

A demo for playing with our dataset.
Instructions to request our full dataset.
Documentation on our dataset structure and contents.

Demo

(0) Getting Started

Clone this repository, and create local environment: conda env create -f environment.yml.

For your convinience, we provide a fragment of our data in demo directory. And in this section, you will be able to play with different parts of our data using maintained tool scripts.

(1) 3D skeleton & point cloud

$ python vis_skeleton_pcd.py -h
usage: vis_skeleton_pcd.py [-h] [-pa PATH] [-f FRAME] [-fw FUSION_WINDOW]

# now visualize demo 3d skeleton and point cloud!
$ python vis_skeleton_pcd.py -pa demo -f 2720 -fw 80

You should be able to see a open3d viewer with our 3D skeleton and point cloud data, press 'h' in the viewer to see how to control the viewpoint: <img src="assets/vis_skeleton_pcd.gif" width=100%>

Note that we use open3d == 0.7.0, the visualization code is not compatible with the newer version of open3d.

(2) 2D skeleton & depth map

$ python vis_2d_pose_depth.py -h
usage: vis_2d_pose_depth.py [-h] [-pa PATH]

# now visualize 2d skeleton and depth map!
$ python vis_2d_pose_depth.py -pa demo

You should be able to find a created demo/vis/ directory with *_vis.jpg that render to a movie strip like this: <img src="assets/vis_2d_pose_depth.gif" width=80%>

(3) RGB video

$ python vis_video.py -h
usage: vis_video.py [-h] [-pa PATH] [-s SCALE] [-fr FRAME_RATE]

# now visualize demo video!
$ python vis_video.py -pa demo -fr 15

You should be able to find a created demo/vis/ directory with a video.mp4:

Requesting Dataset

To obtain the Dataset, please send an email to Zhe Cao (with the title "GTA-IM Dataset Download") stating:

Your name, title and affilation
Your intended use of the data
The following statement:

With this email we declare that we will use the GTA-IM Dataset for non-commercial research purposes only. We also undertake to purchase a copy of Grand Theft Auto V. We will not redistribute the data in any form except in academic publications where necessary to present examples.

We will promptly reply with the download link.

Dataset Contents

After you download data from our link and unzip, each sequence folder will contain the following files:

images:
- color images: *.jpg
- depth images: *.jpg
- instance masks: *_id.png

info_frames.pickle: a pickle file contains camera information, 3d human poses (98 joints) in the global coordinate, weather condition, the character ID, and so on.
```
import pickle
info = pickle.load(open(data_path + 'info_frames.pickle', 'rb'))
print(info[0].keys())
```

info_frames.npz: it contains five arrays. 21 joints out of 98 human joints are extraced to form the minimal skeleton. Here is how we generate it from raw captures.
- joints_2d: 2d human poses on the HD image plane.
- joints_3d_cam: 3d human poses in the current frame's camera coordinate
- joints_3d_world: 3d human poses in the game/world coordinate
- world2cam_trans: the world to camera transformation matrix for each frame
- intrinsics: camera intrinsics
 
```
import numpy as np
info_npz = np.load(rec_idx+'info_frames.npz'); 
print(info_npz.files)
# 2d poses for frame 0
print(npz['joints_2d'][0]) 
```

realtimeinfo.pickle: a backup pickle file which contains all information from the data collection.

Joint Types

The human skeleton connection and joints index name:

LIMBS = [
    (0, 1),  # head_center -> neck
    (1, 2),  # neck -> right_clavicle
    (2, 3),  # right_clavicle -> right_shoulder
    (3, 4),  # right_shoulder -> right_elbow
    (4, 5),  # right_elbow -> right_wrist
    (1, 6),  # neck -> left_clavicle
    (6, 7),  # left_clavicle -> left_shoulder
    (7, 8),  # left_shoulder -> left_elbow
    (8, 9),  # left_elbow -> left_wrist
    (1, 10),  # neck -> spine0
    (10, 11),  # spine0 -> spine1
    (11, 12),  # spine1 -> spine2
    (12, 13),  # spine2 -> spine3
    (13, 14),  # spine3 -> spine4
    (14, 15),  # spine4 -> right_hip
    (15, 16),  # right_hip -> right_knee
    (16, 17),  # right_knee -> right_ankle
    (14, 18),  # spine4 -> left_hip
    (18, 19),  # left_hip -> left_knee
    (19, 20)  # left_knee -> left_ankle
]

Important Note

This dataset is for non-commercial research purpose only. Due to public interest, I decided to reimplement the data generation pipeline from scratch to collect the GTA-IM dataset again. I do not use Facebook resources to reproduce the data.

Citation

We believe in open research and we will be happy if you find this data useful. If you use it, please consider citing our work.

@incollection{caoHMP2020,
  author = {Zhe Cao and
    Hang Gao and
    Karttikeya Mangalam and
    Qizhi Cai and
    Minh Vo and
    Jitendra Malik},
  title = {Long-term human motion prediction with scene context},
  booktitle = ECCV,
  year = {2020},
  }

Acknowledgement

Our data collection pipeline was built upon this plugin and this tool.

LICENSE

Our project is released under CC-BY-NC 4.0.