Awesome
MAR: Masked Autoencoders for Efficient Action Recognition
Zhiwu Qing, Shiwei Zhang, Ziyuan Huang, Xiang Wang, Yiliang Lv, Changxin Gao, Nong Sang <br/> [Paper].
<br/> <div align="center"> <img src="framework.png" /> </div> <br/>Latest
[2022-11] Codes are available!
This repo is a modification on the TAdaConv repo.
Installation
Requirements:
- Python>=3.6
- torch>=1.5
- torchvision (version corresponding with torch)
- simplejson==3.11.1
- decord>=0.6.0
- pyyaml
- einops
- oss2
- psutil
- tqdm
- pandas
optional requirements
- fvcore (for flops calculation)
Guidelines
Installation, data preparation and running
The general pipeline for using this repo is the installation, data preparation and running. See GUIDELINES.md.
Getting Pre-trained Checkpoints
You can download the Video-MAE pre-trained checkpoints from here.
Next please use this simple python script to convert the pre-trained checkpoints to adapt to our code base.
Then you need modify the TRAIN.CHECKPOINT_FILE_PATH
to the converted checkpoints for fine-tuning.
Running instructions
<!-- To train the model with MAR, set the `_BASE_RUN` to point to `configs/pool/run/training/simclr.yaml`. See `configs/projects/hico/simclr_*_s3dg.yaml` for more details. Alternatively, you can also find some pre-trained model in the `MODEL_ZOO.md`. -->For detailed explanations on the approach itself, please refer to the paper.
For an example run, set the DATA_ROOT_DIR
, ANNO_DIR
, TRAIN.CHECKPOINT_FILE_PATH
and OUTPUT_DIR
in configs\projects\mar\ft-ssv2\vit_base_50%.yaml
, and run the command for the training:
python tools/run_net.py --cfg configs/projects/mar/ft-ssv2/vit_base_50%.yaml
Citing MAR
If you find MAR useful for your research, please consider citing the paper as follows:
@article{qing2022mar,
title={Mar: Masked autoencoders for efficient action recognition},
author={Qing, Zhiwu and Zhang, Shiwei and Huang, Ziyuan and Wang, Xiang and Wang, Yuehuan and Lv, Yiliang and Gao, Changxin and Sang, Nong},
journal={arXiv preprint arXiv:2207.11660},
year={2022}
}