Awesome

The MTA Dataset

The MTA (Multi Camera Track Auto) is a large multi target multi camera tracking dataset. It contains over 2,800 person identities, 6 cameras and a video length of over 100 minutes per camera. It contains a day and a night period.

Take a look at a short timelapse video:

https://www.youtube.com/watch?v=tZ5UwpkcfmE

Or a short extracted part with annotations:

https://www.youtube.com/watch?v=e6WNgCQHcCc

Associated repositories

The dataset was recorded and created by the MTA-Mod (https://github.com/koehlp/MTA-Mod). If you want to perform multi camera tracking on the dataset, you could use the WDA-Tracker (https://github.com/koehlp/wda_tracker).

Release Notes

Major changes will be listed here.

Getting the dataset

To obtain the Dataset, you first need the MTA-Download-URLs, so please send an email (to be determined) philipp.koehl@gmail.com (with object "MTA Dataset Download") stating:

Your name, title and affilation
Your intended use of the data

The following statement:

With this email we declare that we will use the MTA Dataset 
for research and educational purposes only, since we are aware that 
commercial use is prohibited. We also undertake to purchase a copy of Grand Theft Auto V.

We will send you the MTA-Download-URLs as fast as possible that you can use to download the following files:

Multi person multi camera tracking

MTA_videos.zip

Contains 12 videos (train and test videos for 6 cameras)
and multi camera tracking annotations (frame number, person id, bounding box).
41GB zipped and 42GB unzipped
Overall video length: 102min

MTA_videos_coords.zip

Full annotations e.g. body joints
28.6GB zipped and 235GB unzipped.

MTA_ext_short.zip

Extracted short part of MTA_videos (videos and annotations)
1.7GB zipped and 1.8GB unzipped
Overall video length: 4min

MTA_ext_short_coords.zip

Full annotations of the extracted short part
1.1GB zipped and 8.9GB unzipped

Person re-identification

MTA_reid.zip

Re-id dataset based on the MTA_videos
0.8GB zipped and 1.2GB unzipped

Multi person multi camera tracking

MTA_{videos,ext_short}.zip

6 train and 6 test set camera videos
- MTA_{videos,ext_short}/{train,test}/cam_{0-5}/cam_{0-5}.mp4
6 train and 6 test set camera tracking annotation csv files
- MTA_{videos,ext_short}/{train,test}/cam_{0-5}/coords_fib_cam_{0-5}.csv

MTA_{videos,ext_short}_coords.zip

6 train and 6 test set camera full annotation csv files
- MTA_{videos,ext_short}_coords/{train,test}/cam_{0-5}/coords_cam_{0-5}.csv

Person re-identification

MTA_reid.zip

72301 images in re-id train set
15165 images in the re-id query set
60448 images in the re-id test set
- MTA_reid/{train,query,test}/framegta_{int}_camid_{0-5}_pid_{int}.png
36100 overall excluded distractor images which don't show enough to recognize a person
- MTA_reid/distractors/{train,query,test}/framegta_{int}_camid_{0-5}_pid_{int}.png

Annotations

Multi person multi camera tracking

Annotations in `coords_fib_cam_{0-5}.csv`

column name	Description
frame_no_cam	frame number per camera starting from 0
person_id	person id
x_top_left_BB	x top left bounding box coordinate
y_top_left_BB	y top left bounding box coordinate
x_bottom_right_BB	x bottom right bounding box coordinate
y_bottom_right_BB	y bottom right bounding box coordinate

Annotations in `coords_cam_{0-5}.csv`

column name	Description
frame_no_gta	ingame native frame number
frame_no_cam	frame number per camera starting from 0
person_id	person id
appearance_id	allows to recreate a person with the same appearance
joint_type	joint type
x_2D_joint	x 2D joint position
y_2D_joint	y 2D joint position
x_3D_joint	x 3D joint position
y_3D_joint	y 3D joint position
z_3D_joint	z 3D joint position
joint_occluded	1 if the joint is occluded; 0 otherwise
joint_self_occluded	1 if the joint is occluded by its owner; 0 otherwise
x_3D_cam	x 3D camera position
y_3D_cam	y 3D camera position
z_3D_cam	z 3D camera position
x_rot_cam	x camera rotation
y_rot_cam	y camera rotation
z_rot_cam	z camera rotation
fov	camera field of view
x_3D_person	x 3D person position
y_3D_person	y 3D person position
z_3D_person	z 3D person position
x_2D_person	x 2D person position
y_2D_person	y 2D person position
ped_type	pedestrian type
wears_glasses	1 if the person is wearing glasses; 0 otherwise
yaw_person	person yaw
hours_gta	ingame time hours
minutes_gta	ingame time minutes
seconds_gta	ingame time seconds
x_top_left_BB	x top left bounding box coordinate
y_top_left_BB	y top left bounding box coordinate
x_bottom_right_BB	x bottom right bounding box coordinate
y_bottom_right_BB	y bottom right bounding box coordinate

Note that the camera 3D coordinates are in the GTA 3D coordinate system.

Pedestrian types

ped_type	Description
0	Player character Michael
1	Player character Franklin
2	Player character Trevor
29	Army
28	Animal
27	SWAT
21	Los Santos Fire Department
20	Paramedic
6	Cop
4	Male
5	Female
26	Human

Source: http://www.dev-c.com/nativedb/

Joint type

 0: head_top
 1: head_center
 2: neck
 3: right_clavicle
 4: right_shoulder
 5: right_elbow
 6: right_wrist
 7: left_clavicle
 8: left_shoulder
 9: left_elbow
10: left_wrist
11: spine0
12: spine1
13: spine2
14: spine3
15: spine4
16: right_hip
17: right_knee
18: right_ankle
19: left_hip
20: left_knee
21: left_ankle

Person re-identification

The annotations are encoded in the image filenames using the following format:

{annotation_name}_{value}.

As the image filename goes as follows:

framegta_{int}_camid_{0-5}_pid_{int}.png

The following annotations are available:

annotation name	description
framegta	ingame frame number aka frame_no_gta
camid	camera id aka cam_id
pid	person id aka person_id

These annotations come from the multi person multi camera tracking annotations in coords_cam_{0-5}.csv. This means it is possible to get more annotations for re-id images like e.g. the joint positions by linking via frame_no_gta and person_id.

Scripts

To use the following scripts it is necessary to install the python requirements:

pip install -r requirements.txt

mta_to_coco.py

Converts the mta videos and annotations into the coco annotation format and images.
Note that just the bounding box annotations are available.

Example:

python mta_to_coco.py \
    --mta_dataset_folder /media/philipp/philippkoehl_ssd/MTA_ext_short/test \
    --coco_mta_output_folder /media/philipp/philippkoehl_ssd/coco_MTA_ext_short/test \
    --sampling_rate 41 \
    --camera_ids 0,1,2,3,4,5

draw_full_annotations.py

Draws joint annotations, bounding box annotations and person ids into the frames of the MTA data and outputs a video.

Example:

python draw_full_annotations.py \
    --coords_folder "/media/philipp/philippkoehl_ssd/MTA_ext_short_coords/test" \
    --video_folder "/media/philipp/philippkoehl_ssd/MTA_ext_short/test" \
    --output_folder "/media/philipp/philippkoehl_ssd/MTA_ext_short_annotation_videos" \
    --camera_ids "0,1,2,3,4,5"

Citation

If you use it, please cite our work. The affiliated paper was published at the CVPR 2020 VUHCS Workshop (https://vuhcs.github.io/)

@InProceedings{Kohl_2020_CVPR_Workshops,
    author = {Kohl, Philipp and Specker, Andreas and Schumann, Arne and Beyerer, Jurgen},
    title = {The MTA Dataset for Multi-Target Multi-Camera Pedestrian Tracking by Weighted Distance Aggregation},
    booktitle = {The IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops},
    month = {June},
    year = {2020}
}

Awesome

The MTA Dataset

Getting the dataset

Multi person multi camera tracking

Person re-identification

Contents

Multi person multi camera tracking

Person re-identification

Annotations

Multi person multi camera tracking

Annotations in coords_fib_cam_{0-5}.csv

Annotations in coords_cam_{0-5}.csv

Person re-identification

Scripts

Citation

Annotations in `coords_fib_cam_{0-5}.csv`

Annotations in `coords_cam_{0-5}.csv`