Home

Awesome

<div align="center"> <h2><a href="https://arxiv.org/abs/2311.15964"><image src="./assets/eccv.png" alt="ECCV 2024" width=5%/> Efficient Pre-training for Localized Instruction Generation of Videos</a></h2>

Anil Batra, Davide Moltisanti, Laura Sevilla-Lara, Marcus Rohrbach, Frank Keller

</div> <h5 align="center">

arXiv Dataset <br>

</h5>

Abstract

In this work we propose Sieve & Swap technique, to automatically generate high quality pre-training data for the recipe domain: (i) Sieve: filters irrelevant transcripts and (ii) Swap: acquires high quality text by replacing transcripts with human-written instruction from a text-only recipe dataset. The resulting dataset is three orders of magnitude smaller than current web-scale datasets but enables efficient training of large-scale models. Alongside Sieve & Swap, we propose Procedure Transformer (ProcX), a model for end-to-end step localization and instruction generation for procedural videos. When pre-trained on our curated dataset, this model achieves state-of-the-art performance on YouCook2 and Tasty while using a fraction of the training data.

<p align="center"> <image src="./assets/sieve_n_swap.png" alt="sieve and swap approach" width=60%/> </p>

Dataset

Sieve & Swap : Our curated dataset along with processed features can be downloaded from :hugs: Hugging Face. More details are available in data.md

Raw Pre-Training : HowTo100M, RecipeNLG

Downstream Task : YouCook2, Tasty

Code

Coming Soon!

:page_facing_up: Citation

If you find this project useful in your research, please consider cite:


@inproceedings{batra2025efficient,
  title={Efficient Pre-training for Localized Instruction Generation of Procedural Videos},
  author={Batra, Anil and Moltisanti, Davide and Sevilla-Lara, Laura and Rohrbach, Marcus and Keller, Frank},
  booktitle={European Conference on Computer Vision},
  pages={347--363},
  year={2025},
  organization={Springer}
}

Licenses

This code is released under the MIT License. The licenses for datasets used in the paper are available at the following links: HowTo100M, YouCook2, and Tasty.

:dizzy: Acknowledgement

Thanks to the open source of the following projects:

PDVC, VidChapter, Lightning-Hydra-Template.