Home

Awesome

VideoMAE for Action Detection (NeurIPS 2022 Spotlight) [Arxiv]

VideoMAE Framework

License: CC BY-NC 4.0<br> PWC

VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training<br> Zhan Tong, Yibing Song, Jue Wang, Limin Wang<br>Nanjing University, Tencent AI Lab

This repo contains the supported code and scripts to reproduce action detection results of VideoMAE. The code of pre-training is available in original repo.

πŸ“° News

[2023.1.16] Code and pre-trained models are available now! <br>

πŸš€ Main Results

✨ AVA 2.2

MethodExtra DataExtra LabelBackbone#Frame x Sample RatemAP
VideoMAEKinetics-400βœ—ViT-S16x422.5
VideoMAEKinetics-400βœ“ViT-S16x428.4
VideoMAEKinetics-400βœ—ViT-B16x426.7
VideoMAEKinetics-400βœ“ViT-B16x431.8
VideoMAEKinetics-400βœ—ViT-L16x434.3
VideoMAEKinetics-400βœ“ViT-L16x437.0
VideoMAEKinetics-400βœ—ViT-H16x436.5
VideoMAEKinetics-400βœ“ViT-H16x439.5
VideoMAEKinetics-700βœ—ViT-L16x436.1
VideoMAEKinetics-700βœ“ViT-L16x439.3

πŸ”¨ Installation

Please follow the instructions in INSTALL.md.

➑️ Data Preparation

Please follow the instructions in DATASET.md for data preparation.

‴️ Fine-tuning with pre-trained models

The fine-tuning instruction is in FINETUNE.md.

πŸ“Model Zoo

We provide pre-trained and fine-tuned models in MODEL_ZOO.md.

☎️ Contact

Zhan Tong: tongzhan@smail.nju.edu.cn

πŸ‘ Acknowledgements

Thanks to Lei Chen for support. This project is built upon MAE-pytorch, BEiT and AlphAction. Thanks to the contributors of these great codebases.

πŸ”’ License

The majority of this project is released under the CC-BY-NC 4.0 license as found in the LICENSE file. Portions of the project are available under separate license terms: pytorch-image-models are licensed under the Apache 2.0 license. BEiT is licensed under the MIT license.

✏️ Citation

If you think this project is helpful, please feel free to leave a star⭐️ and cite our paper:

@inproceedings{tong2022videomae,
  title={Video{MAE}: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training},
  author={Zhan Tong and Yibing Song and Jue Wang and Limin Wang},
  booktitle={Advances in Neural Information Processing Systems},
  year={2022}
}

@article{videomae,
  title={VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training},
  author={Tong, Zhan and Song, Yibing and Wang, Jue and Wang, Limin},
  journal={arXiv preprint arXiv:2203.12602},
  year={2022}
}