Home

Awesome

The How-2 Dataset

How-2 is a multimodal dataset which consists of around 80,000 instructional videos (about 2,000 hours) with associated English sub-titles and summaries. About 300 hours have also been translated into Portuguese using crowd-sourcing, and used during the JSALT 2018 Workshop. The How-2 training data was split into 300h and 2000h, with only the former supporting Portuguese Machine Translation. The 2000h set can be used for other tasks such as speech recognition, speech summarization, text summarization, and their multimodal extensions.

We currently have released the following packages pertaining to the How-2 data to be able to replicate our results and encourage further research:

Please fill the Data Request form

Please cite the following paper in all academic work that uses this dataset:

@inproceedings{sanabria18how2,
  title = {{How2:} A Large-scale Dataset For Multimodal Language Understanding},
  author = {Sanabria, Ramon and Caglayan, Ozan and Palaskar, Shruti and Elliott, Desmond and Barrault, Lo\"ic and Specia, Lucia and Metze, Florian},
  booktitle = {Proceedings of the Workshop on Visually Grounded Interaction and Language (ViGIL)},
  year = {2018},
  organization={NeurIPS},
  url = {http://arxiv.org/abs/1811.00347}
}

More papers can be found in the bibliography.

To subscribe to the How2 mailing list click here.

Speech Summarization

How2 has been used for end to end speech summarization- we are releasing 43 dim fbank+pitch features to support this. See our ESPNet Recipe and paper. Please consider citing our paper on speech summarization if you utilize this data release.

@inproceedings{Sharma2022, 
author={Sharma, Roshan and Palaskar, Shruti and Black, Alan W and Metze, Florian},
booktitle={ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
title={End-to-End Speech Summarization Using Restricted Self-Attention},
year={2022}, 
volume={},
number={},
pages={8072-8076},
doi={10.1109/ICASSP43922.2022.9747320}
}

How2 Get Help

Please use the issues ticket system (https://github.com/srvk/how2-dataset/issues) to ask questions and get clarifications.

How2 License

License information for every video can be found in the .info.json file that is being downloaded for every video. At the time of release, all videos included in this dataset were being made available by the original content providers under the standard YouTube License.

Unless noted otherwise, we are providing the contents of this repository under the Creative Commons BY-SA 4.0 (Attribution-Share-Alike) License (for data-like content) and/ or BSD-2-Clause License (for software-type content).