Home

Awesome

EPIC KITCHENS-55 Dataset

<!-- start badges -->

CircleCI GitHub release arXiv-1804.02748 Data

<!-- end badges -->

EPIC-KITCHENS-55 is the largest dataset in first-person (egocentric) vision; 55 hours of multi-faceted, non-scripted recordings in native environments - i.e. the wearers' homes, capturing all daily activities in the kitchen over multiple days. Annotations are collected using a novel `live' audio commentary approach.

Authors

Dima Damen (1) Hazel Doughty (1) Giovanni Maria Farinella (3) Sanja Fidler (2) Antonino Furnari (3) Evangelos Kazakos (1) Davide Moltisanti (1) Jonathan Munro (1) Toby Perrett (1) Will Price (1) Michael Wray (1)

Contact: uob-epic-kitchens@bristol.ac.uk

Citing

When using the dataset, kindly reference:

@INPROCEEDINGS{Damen2018EPICKITCHENS,
   title={Scaling Egocentric Vision: The EPIC-KITCHENS Dataset},
   author={Damen, Dima and Doughty, Hazel and Farinella, Giovanni Maria  and Fidler, Sanja and 
           Furnari, Antonino and Kazakos, Evangelos and Moltisanti, Davide and Munro, Jonathan 
           and Perrett, Toby and Price, Will and Wray, Michael},
   booktitle={European Conference on Computer Vision (ECCV)},
   year={2018}
} 

(Check publication here)

Dataset Details

Ground Truth

We provide ground truth for action segments and object bounding boxes.

Dataset Splits

The dataset is comprised of three splits with the corresponding ground truth:

Initially we are only releasing the full ground truth for the training set in order to run action and object challenges.

Important Files

Additional Files

We direct the reader to RDSF for the videos and rgb/flow frames.

We provide html and pdf alternatives to this README which are auto-generated.

Files Structure

EPIC_train_action_labels.csv

CSV file containing 14 columns:

Column NameTypeExampleDescription
uidint6374Unique ID of the segment.
video_idstringP03_01Video the segment is in.
narrationstringclose fridgeEnglish description of the action provided by the participant.
start_timestampstring00:23:43.847Start time in HH:mm:ss.SSS of the action.
stop_timestampstring00:23:47.212End time in HH:mm:ss.SSS of the action.
start_frameint85430Start frame of the action (WARNING only for frames extracted as detailed in Video Information).
stop_frameint85643End frame of the action (WARNING only for frames extracted as detailed in Video Information).
participant_idstringP03ID of the participant.
verbstringcloseParsed verb from the narration.
nounstringfridgeFirst parsed noun from the narration.
verb_classint3Numeric ID of the parsed verb's class.
noun_classint10Numeric ID of the parsed noun's class.
all_nounslist of string (1 or more)['fridge']List of all parsed nouns from the narration.
all_noun_classeslist of int (1 or more)[10]List of numeric IDs corresponding to all of the parsed nouns' classes from the narration.

Please note we have included a python pickle file for ease of use. This includes a pandas dataframe with the same layout as above. This pickle file was created with pickle protocol 2 on pandas version 0.22.0.

EPIC_train_invalid_labels.csv

CSV file containing 14 columns:

Column NameTypeExampleDescription
uidint6374Unique ID of the segment.
video_idstringP03_01Video the segment is in.
narrationstringclose fridgeEnglish description of the action provided by the participant.
start_timestampstring00:23:43.847Start time in HH:mm:ss.SSS of the action.
stop_timestampstring00:23:47.212End time in HH:mm:ss.SSS of the action.
start_frameint85430Start frame of the action (WARNING only for frames extracted as detailed in Video Information).
stop_frameint85643End frame of the action (WARNING only for frames extracted as detailed in Video Information).
participant_idstringP03ID of the participant.
verbstringcloseParsed verb from the narration.
nounstringfridgeFirst parsed noun from the narration.
verb_classint3Numeric ID of the parsed verb's class.
noun_classint10Numeric ID of the parsed noun's class.
all_nounslist of string (1 or more)['fridge']List of all parsed nouns from the narration.
all_noun_classeslist of int (1 or more)[10]List of numeric IDs corresponding to all of the parsed nouns' classes from the narration.

Please note we have included a python pickle file for ease of use. This includes a pandas dataframe with the same layout as above. This pickle file was created with pickle protocol 2 on pandas version 0.22.0.

EPIC_train_action_narrations.csv

CSV file containing 5 columns:

Note: The start/end timestamp refers to the start/end time of the narration, not the action itself.

Column NameTypeExampleDescription
participant_idstringP03ID of the participant.
video_idstringP03_01Video the segment is in.
start_timestampstring00:23:43.847Start time in HH:mm:ss.SSS of the narration.
stop_timestampstring00:23:47.212End time in HH:mm:ss.SSS of the narration.
narrationstringclose fridgeEnglish description of the action provided by the participant.

EPIC_train_object_labels.csv

CSV file containing 6 columns:

Column NameTypeExampleDescription
noun_classint20Integer value representing the class in noun-classes.csv.
nounstringbagOriginal string name for the object.
participant_idstringP01ID of participant.
video_idstringP01_01Video the object was annotated in.
frameint056581Frame number of the annotated object.
bounding_boxeslist of 4-tuple (0 or more)"[(76, 1260, 462, 186)]"Annotated boxes with format (<top:int>,<left:int>,<height:int>,<width:int>).

EPIC_train_object_action_correspondence.csv

CSV file containing 5 columns:

Column NameTypeExampleDescription
participant_idstringP01ID of participant.
video_idstringP01_01Video the frames are part of.
object_frameint56581Frame number of the object detection image from object_detection_images.
action_frameint56638Frame number of the corresponding image in the released frames for action recognition in frames_rgb_flow.
timestampstring00:00:00.00Timestamp in HH:mm:ss.SS corresponding to the frame.

Please note we have included a python pickle file for ease of use. This includes a pandas dataframe with the same layout as above. This pickle file was created with pickle protocol 2 on pandas version 0.22.0.

EPIC_test_s1_object_action_correspondence.csv

CSV file containing 5 columns:

Column NameTypeExampleDescription
participant_idstringP01ID of participant.
video_idstringP01_11Video containing the object s1 test frames.
object_frameint33601Frame number of the object detection image from object_detection_images.
action_frameint33635Frame number of the corresponding image in the released frames for action recognition in frames_rgb_flow.
timestampstring00:09:20.58Timestamp in HH:mm:ss.SS corresponding to the frames.

Please note we have included a python pickle file for ease of use. This includes a pandas dataframe with the same layout as above. This pickle file was created with pickle protocol 2 on pandas version 0.22.0.

EPIC_test_s2_object_action_correspondence.csv

CSV file containing 5 columns:

Column NameTypeExampleDescription
participant_idstringP09ID of participant.
video_idstringP09_05Video containing the object s2 test frames.
object_frameint15991Frame number of the object detection image from object_detection_images.
action_frameint16007Frame number of the corresponding image in the released frames for action recognition in frames_rgb_flow.
timestampstring00:04:26.78Timestamp in HH:mm:ss.SS corresponding to the frames.

Please note we have included a python pickle file for ease of use. This includes a pandas dataframe with the same layout as above. This pickle file was created with pickle protocol 2 on pandas version 0.22.0.

EPIC_test_s1_object_video_list.csv

CSV file listing the videos used to obtain the object s1 test frames. The frames can be obtained from RDSF under object_detection_images/test. Please test all frames from this folder for the videos listed in this csv.

Column NameTypeExampleDescription
video_idstringP01_11Video containing the object s1 test frames.
participant_idstringP01ID of the participant.

EPIC_test_s2_object_video_list.csv

CSV file listing the videos used to obtain the object s2 test frames. The frames can be obtained from RDSF under object_detection_images/test. Please test all frames from this folder for the videos listed in this csv.

Column NameTypeExampleDescription
video_idstringP01_11Video containing the object s2 test frames.
participant_idstringP01ID of the participant.

EPIC_test_s1_timestamps.csv

CSV file containing 7 columns:

Column NameTypeExampleDescription
uidint1924Unique ID of the segment.
participant_idstringP01ID of the participant.
video_idstringP01_11Video the segment is in.
start_timestampstring00:00:00.000Start time in HH:mm:ss.SSS of the action.
stop_timestampstring00:00:01.890End time in HH:mm:ss.SSS of the action.
start_frameint1Start frame of the action (WARNING only for frames extracted as detailed in Video Information).
stop_frameint93End frame of the action (WARNING only for frames extracted as detailed in Video Information).

Please note we have included a python pickle file for ease of use. This includes a pandas dataframe with the same layout as above. This pickle file was created with pickle protocol 2 on pandas version 0.22.0.

EPIC_test_s2_timestamps.csv

CSV file containing 7 columns:

Column NameTypeExampleDescription
uidint15582Unique ID of the segment.
participant_idstringP09ID of the participant.
video_idstringP09_01Video the segment is in.
start_timestampstring00:00:01.970Start time in HH:mm:ss.SSS of the action.
stop_timestampstring00:00:03.090End time in HH:mm:ss.SSS of the action.
start_frameint118Start frame of the action (WARNING only for frames extracted as detailed in Video Information).
stop_frameint185End frame of the action (WARNING only for frames extracted as detailed in Video Information).

Please note we have included a python pickle file for ease of use. This includes a pandas dataframe with the same layout as above. This pickle file was created with pickle protocol 2 on pandas version 0.22.0.

EPIC_noun_classes.csv

CSV file containing 3 columns:

Note: a colon represents a compound noun with the more generic noun first. So pan:dust should be read as dust pan.

Column NameTypeExampleDescription
noun_idint2ID of the noun class.
class_keystringpan:dustKey of the noun class.
nounslist of string (1 or more)"['pan:dust', 'dustpan']"All nouns within the class (includes the key).

EPIC_verb_classes.csv

CSV file containing 3 columns:

Column NameTypeExampleDescription
verb_idint3ID of the verb class.
class_keystringcloseKey of the verb class.
verbslist of string (1 or more)"['close', 'close-off', 'shut']"All verbs within the class (includes the key).

EPIC_descriptions.csv

CSV file containing 4 columns:

Column NameTypeExampleDescription
video_idstringP01_01ID of the video.
datestring30/04/2017Date on which the video was shot.
timestring13:49:00Local recording time of the video.
descriptionstringprepared breakfast with soy milk and cerealsDescription of the activities contained in the video.

EPIC_many_shot_verbs.csv

CSV file containing the many shot verbs. A verb class is considered many shot if it appears more than 100 times in training. (NOTE: this file is derived from EPIC_train_action_labels.csv, checkout the accompanying notebook demonstrating how we compute these classes)

Column NameTypeExampleDescription
verb_classint1Numeric ID of the verb class
verbstringputVerb corresponding to the verb class

EPIC_many_shot_nouns.csv

CSV file containing the many shot nouns. A noun class is considered many shot if it appears more than 100 times in training. (NOTE: this file is derived from EPIC_train_action_labels.csv, checkout the accompanying notebook demonstrating how we compute these classes)

Column NameTypeExampleDescription
noun_classint3Numeric ID of the noun class
nounstringtapNoun corresponding to the noun class

EPIC_many_shot_actions.csv

CSV file containing the many shot actions. An action class (composed of a verb class and noun class) is considered many shot if BOTH the verb class and noun class are many shot AND the action class appears in training at least once. (NOTE: this file is derived from EPIC_train_action_labels.csv, checkout the accompanying notebook demonstrating how we compute these classes)

Column NameTypeExampleDescription
action_class(int, int)(9, 84)Numeric Pair of IDs, first the verb, then the noun
verb_classint9Numeric ID of the verb class
verbstringmoveVerb corresponding to the verb class
noun_classint84Numeric ID of the noun class
nounstringsausageNoun corresponding to the noun class

EPIC_video_info.csv

CSV file containing information for each video

Column NameTypeExampleDescription
video(string)P01_01Video ID
resolution(string)1920x1080Resolution of the video, format is WIDTHxHEIGHT
duration(float)1652.152817Duration of the video, in seconds
fps(float)59.9400599400599Frame rate of the video

File Downloads

Due to the size of the dataset we provide scripts for downloading parts of the dataset:

Note: These scripts will work for Linux and Mac. For Windows users a bash installation should work.

These scripts replicate the folder structure of the dataset release, found here.

If you wish to download part of the dataset instructions can be found here.

Video Information

Videos are recorded in 1080p at 59.94 FPS on a GoPro Hero 5 with linear field of view. There are a minority of videos which were shot at different resolutions, field of views, or FPS due to participant error or camera. These videos identified using ffprobe are:

The GoPro Hero 5 was also set to drop the framerate in low light conditions to preserve exposure leading to variable FPS in some videos. If you wish to extract frames we suggest you resample at 60 FPS to mitigate issues with variable FPS, this can be achieved in a single step with FFmpeg:

ffmpeg -i "P##_**.MP4" -vf "scale=-2:256" -q:v 4 -r 60 "P##_**/frame_%010d.jpg"

where ## is the Participant ID and ** is the video ID.

Optical flow was extracted using a fork of gpu_flow made available on github. We set the parameters: stride = 2, dilation = 3, bound = 25 and size = 256.

License

All files in this dataset are copyright by us and published under the Creative Commons Attribution-NonCommerial 4.0 International License, found here. This means that you must give appropriate credit, provide a link to the license, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use. You may not use the material for commercial purposes.

Disclaimer

EPIC-KITCHENS-55 and EPIC-KITCHENS-100 were collected as a tool for research in computer vision, however, it is worth noting that the dataset may have unintended biases (including those of a societal, gender or racial nature).

Changelog

See release history for changelog.