Awesome

MAR: Masked Autoencoders for Efficient Action Recognition

Zhiwu Qing, Shiwei Zhang, Ziyuan Huang, Xiang Wang, Yiliang Lv, Changxin Gao, Nong Sang <br/> [Paper].

Latest

[2022-11] Codes are available!

This repo is a modification on the TAdaConv repo.

Installation

Requirements:

Python>=3.6
torch>=1.5
torchvision (version corresponding with torch)
simplejson==3.11.1
decord>=0.6.0
pyyaml
einops
oss2
psutil
tqdm
pandas

optional requirements

fvcore (for flops calculation)

Guidelines

Installation, data preparation and running

The general pipeline for using this repo is the installation, data preparation and running. See GUIDELINES.md.

Getting Pre-trained Checkpoints

You can download the Video-MAE pre-trained checkpoints from here. Next please use this simple python script to convert the pre-trained checkpoints to adapt to our code base. Then you need modify the TRAIN.CHECKPOINT_FILE_PATH to the converted checkpoints for fine-tuning.

Running instructions

For detailed explanations on the approach itself, please refer to the paper.

For an example run, set the DATA_ROOT_DIR, ANNO_DIR, TRAIN.CHECKPOINT_FILE_PATH and OUTPUT_DIR in configs\projects\mar\ft-ssv2\vit_base_50%.yaml, and run the command for the training:

python tools/run_net.py --cfg configs/projects/mar/ft-ssv2/vit_base_50%.yaml

Citing MAR

If you find MAR useful for your research, please consider citing the paper as follows:

@article{qing2022mar,
  title={Mar: Masked autoencoders for efficient action recognition},
  author={Qing, Zhiwu and Zhang, Shiwei and Huang, Ziyuan and Wang, Xiang and Wang, Yuehuan and Lv, Yiliang and Gao, Changxin and Sang, Nong},
  journal={arXiv preprint arXiv:2207.11660},
  year={2022}
}