Home

Awesome

The MTA Dataset

The MTA (Multi Camera Track Auto) is a large multi target multi camera tracking dataset. It contains over 2,800 person identities, 6 cameras and a video length of over 100 minutes per camera. It contains a day and a night period.

Take a look at a short timelapse video:

https://www.youtube.com/watch?v=tZ5UwpkcfmE

Or a short extracted part with annotations:

https://www.youtube.com/watch?v=e6WNgCQHcCc

Associated repositories

The dataset was recorded and created by the MTA-Mod (https://github.com/koehlp/MTA-Mod). If you want to perform multi camera tracking on the dataset, you could use the WDA-Tracker (https://github.com/koehlp/wda_tracker).

Release Notes

Getting the dataset

To obtain the Dataset, you first need the MTA-Download-URLs, so please send an email (to be determined) philipp.koehl@gmail.com (with object "MTA Dataset Download") stating:

The following statement:

With this email we declare that we will use the MTA Dataset 
for research and educational purposes only, since we are aware that 
commercial use is prohibited. We also undertake to purchase a copy of Grand Theft Auto V.

We will send you the MTA-Download-URLs as fast as possible that you can use to download the following files:

Multi person multi camera tracking

MTA_videos.zip

MTA_videos_coords.zip

MTA_ext_short.zip

MTA_ext_short_coords.zip

Person re-identification

MTA_reid.zip

Contents

Multi person multi camera tracking

MTA_{videos,ext_short}.zip

MTA_{videos,ext_short}_coords.zip

Person re-identification

MTA_reid.zip

Annotations

Multi person multi camera tracking

Annotations in coords_fib_cam_{0-5}.csv
column nameDescription
frame_no_camframe number per camera starting from 0
person_idperson id
x_top_left_BBx top left bounding box coordinate
y_top_left_BBy top left bounding box coordinate
x_bottom_right_BBx bottom right bounding box coordinate
y_bottom_right_BBy bottom right bounding box coordinate
Annotations in coords_cam_{0-5}.csv
column nameDescription
frame_no_gtaingame native frame number
frame_no_camframe number per camera starting from 0
person_idperson id
appearance_idallows to recreate a person with the same appearance
joint_typejoint type
x_2D_jointx 2D joint position
y_2D_jointy 2D joint position
x_3D_jointx 3D joint position
y_3D_jointy 3D joint position
z_3D_jointz 3D joint position
joint_occluded1 if the joint is occluded; 0 otherwise
joint_self_occluded1 if the joint is occluded by its owner; 0 otherwise
x_3D_camx 3D camera position
y_3D_camy 3D camera position
z_3D_camz 3D camera position
x_rot_camx camera rotation
y_rot_camy camera rotation
z_rot_camz camera rotation
fovcamera field of view
x_3D_personx 3D person position
y_3D_persony 3D person position
z_3D_personz 3D person position
x_2D_personx 2D person position
y_2D_persony 2D person position
ped_typepedestrian type
wears_glasses1 if the person is wearing glasses; 0 otherwise
yaw_personperson yaw
hours_gtaingame time hours
minutes_gtaingame time minutes
seconds_gtaingame time seconds
x_top_left_BBx top left bounding box coordinate
y_top_left_BBy top left bounding box coordinate
x_bottom_right_BBx bottom right bounding box coordinate
y_bottom_right_BBy bottom right bounding box coordinate

Note that the camera 3D coordinates are in the GTA 3D coordinate system.

Pedestrian types

ped_typeDescription
0Player character Michael
1Player character Franklin
2Player character Trevor
29Army
28Animal
27SWAT
21Los Santos Fire Department
20Paramedic
6Cop
4Male
5Female
26Human

Source: http://www.dev-c.com/nativedb/

Joint type

 0: head_top
 1: head_center
 2: neck
 3: right_clavicle
 4: right_shoulder
 5: right_elbow
 6: right_wrist
 7: left_clavicle
 8: left_shoulder
 9: left_elbow
10: left_wrist
11: spine0
12: spine1
13: spine2
14: spine3
15: spine4
16: right_hip
17: right_knee
18: right_ankle
19: left_hip
20: left_knee
21: left_ankle

Person re-identification

The annotations are encoded in the image filenames using the following format:

As the image filename goes as follows:

The following annotations are available:

annotation namedescription
framegtaingame frame number aka frame_no_gta
camidcamera id aka cam_id
pidperson id aka person_id

These annotations come from the multi person multi camera tracking annotations in coords_cam_{0-5}.csv. This means it is possible to get more annotations for re-id images like e.g. the joint positions by linking via frame_no_gta and person_id.

Scripts

To use the following scripts it is necessary to install the python requirements:

pip install -r requirements.txt  

mta_to_coco.py

Converts the mta videos and annotations into the coco annotation format and images.
Note that just the bounding box annotations are available.

Example:

python mta_to_coco.py \
    --mta_dataset_folder /media/philipp/philippkoehl_ssd/MTA_ext_short/test \
    --coco_mta_output_folder /media/philipp/philippkoehl_ssd/coco_MTA_ext_short/test \
    --sampling_rate 41 \
    --camera_ids 0,1,2,3,4,5

draw_full_annotations.py

Draws joint annotations, bounding box annotations and person ids into the frames of the MTA data and outputs a video.

Example:

python draw_full_annotations.py \
    --coords_folder "/media/philipp/philippkoehl_ssd/MTA_ext_short_coords/test" \
    --video_folder "/media/philipp/philippkoehl_ssd/MTA_ext_short/test" \
    --output_folder "/media/philipp/philippkoehl_ssd/MTA_ext_short_annotation_videos" \
    --camera_ids "0,1,2,3,4,5"

Citation

If you use it, please cite our work. The affiliated paper was published at the CVPR 2020 VUHCS Workshop (https://vuhcs.github.io/)

@InProceedings{Kohl_2020_CVPR_Workshops,
    author = {Kohl, Philipp and Specker, Andreas and Schumann, Arne and Beyerer, Jurgen},
    title = {The MTA Dataset for Multi-Target Multi-Camera Pedestrian Tracking by Weighted Distance Aggregation},
    booktitle = {The IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops},
    month = {June},
    year = {2020}
}