Home

Awesome

Masked Video Distillation (CVPR 2023)

PWC<br> PWC PWC PWC

Official PyTorch implementation of "Masked Video Distillation: Rethinking Masked Feature Modeling for Self-supervised Video Representation Learning".

MVD Framework

News

[2023.5.21] Pretrained models have been released in MODEL_ZOO.md.

[2023.4.9] Code of MVD is available now!

[2023.2.28] MVD is accepted by CVPR 2023.

Main Results

Something-Something V2

MethodPretrain Video DataBackboneTeacherEpochTop-1Top-5resolution#Frames x Clips x CropsParam
MVDKinetics-400ViT-SViT-B40070.792.622416x2x322M
MVDKinetics-400ViT-SViT-L40070.992.822416x2x322M
MVDKinetics-400ViT-BViT-B40072.593.622416x2x387M
MVDKinetics-400ViT-BViT-L40073.794.022416x2x387M
MVDKinetics-400ViT-LViT-L40076.195.422416x2x3305M
MVDKinetics-400ViT-LViT-L80076.795.522416x2x3305M
MVDKinetics-400ViT-HViT-H80077.395.722416x2x3633M

Kinetics-400

MethodPretrain Video DataBackboneTeacherEpochTop-1Top-5resolution#Frames x Clips x CropsParam
MVDKinetics-400ViT-SViT-B40080.694.722416x5x322M
MVDKinetics-400ViT-SViT-L40081.094.822416x5x322M
MVDKinetics-400ViT-BViT-B40082.795.422416x5x387M
MVDKinetics-400ViT-BViT-L40083.495.822416x5x387M
MVDKinetics-400ViT-LViT-L40086.096.922416x5x3305M
MVDKinetics-400ViT-LViT-L80086.497.022416x5x3305M
MVDKinetics-400ViT-HViT-H80087.397.422416x5x3633M

AVA v2.2

MethodPretrain Video DataExtra LabelBackboneTeacherEpochmAP#Frames x Sample RateParam
MVDKinetics-400ViT-BViT-B40029.316x487M
MVDKinetics-400ViT-BViT-B40033.616x487M
MVDKinetics-400ViT-BViT-L40031.116x487M
MVDKinetics-400ViT-BViT-L40034.216x487M
MVDKinetics-400ViT-LViT-L80037.716x4305M
MVDKinetics-400ViT-LViT-L80038.716x4305M
MVDKinetics-400ViT-HViT-H80040.116x4633M
MVDKinetics-400ViT-HViT-H80041.116x4633M

UCF101 & HMDB51

MethodPretrain Video DataBackboneTeacherEpochUCF101 Top-1HMDB51 Top-1
MVDKinetics-400ViT-BViT-B40097.076.4
MVDKinetics-400ViT-BViT-L40097.579.7

Installation

Please follow the instructions in INSTALL.md.

Data Preparation

Please follow the instructions in DATASET.md for data preparation.

Pre-training

The pre-training instruction is in PRETRAIN.md.

Fine-tuning with pre-trained models

The fine-tuning instruction is in FINETUNE.md.

Model Zoo

We provide pre-trained models in MODEL_ZOO.md.

Acknowledgements

This project is built upon MAE and VideoMAE. Thanks to the contributors of these great codebases.

Citation

If this work is helpful for your research, please consider citing MVD.

@inproceedings{wang2022masked,
  title={Masked Video Distillation: Rethinking Masked Feature Modeling for Self-supervised Video Representation Learning},
  author={Wang, Rui and Chen, Dongdong and Wu, Zuxuan and Chen, Yinpeng and Dai, Xiyang and Liu, Mengchen and Yuan, Lu and Jiang, Yu-Gang},
  booktitle={CVPR},
  year={2023}
}