


<img src="./teaser.png"/>

This project introduces a novel video dataset, named HACS (Human Action Clips and Segments). It consists of two kinds of manual annotations. HACS Clips contains 1.55M 2-second clip annotations; HACS Segments has complete action segments (from action start to end) on 50K videos. The large-scale dataset is effective for pretraining action recognition and localization models, and also serves as a new benchmark for temporal action localization. (*SLAC dataset is now part of HACS dataset.)

Project Website: http://hacs.csail.mit.edu/

Paper: https://arxiv.org/abs/1712.09374

*Updates on v1.1.1

A minor change comparing to v1.1: we remove some invalid videos from the dataset.

Download Annotation Files

  1. Clone this repository:
git clone https://github.com/hangzhaomit/HACS-dataset.git
  1. Unzip annotation files:
unzip HACS_v1.1.1.zip
  1. Check dataset statistics:
python dataset_stats.py

   You should expect the following output:

====Parsing clips====
[training set]: 492748 videos, 1509478 clips
[validation set]: 5981 videos, 20245 clips
[testing set]: 5987 videos, 20293 clips
====Parsing segments====
[training set]: 37613 videos
[validation set]: 5981 videos
[testing set]: 5987 videos

Annotation File Format

  1. For HACS Clips, the annotation file is HACS_v1.1.1/HACS_clips_v1.1.1.csv. "label": 1/"label": -1 refers to positive/negative sample. The format looks like the following:
  1. For HACS Segments, the annotation file is HACS_v1.1.1/HACS_segments_v1.1.1.json, with the same format as ActivityNet dataset:
  "database": {
    "--0edUL8zmA": {
        "annotations": [
            {"label": "Dodgeball", "segment": [5.40, 11.60]},
            {"label": "Dodgeball", "segment": [12.60, 88.16]},
        "subset": "training",
        "duration": "92.166667",
        "url": "https://www.youtube.com/watch?v=--0edUL8zmA"

Download Videos

  1. Install the following libraries:
  1. Run the following command to download videos:

python download_videos.py --root_dir ROOT_DIR [--dataset {all,segments}] [--shortside SHORTSIDE]

Request testing videos and missing videos: (NEW)


If you use find the dataset helpful, please cite:

  title={Hacs: Human action clips and segments dataset for recognition and temporal localization},
  author={Zhao, Hang and Torralba, Antonio and Torresani, Lorenzo and Yan, Zhicheng},
  booktitle={Proceedings of the IEEE International Conference on Computer Vision},