Awesome
Official PyTorch Implementation of EVAD
Efficient Video Action Detection with Token Dropout and Context Refinement<br>Lei Chen, Zhan Tong, Yibing Song, Gangshan Wu, Limin Wang<br>
News
[2023.07.14] Our EVAD is accepted by ICCV 2023! <br> [2023.06.09] Code and model weights have been released! <br>
Installation
Please find installation instructions in INSTALL.md.
Data Preparation
Please follow the instructions in DATASET.md to prepare AVA dataset.
Model Zoo
method | keep rate | enhanced weight | config | backbone | pre-train | #frame x sample rate | GFLOPs | mAP | model |
---|---|---|---|---|---|---|---|---|---|
EVAD | 1.0 | 1 | ViT_B_16x4 | ViT-B (VideoMAE) | K400 | 16x4 | 425 | 32.1 | link |
EVAD | 0.7 | 1 | ViT_B_16x4_KTP | ViT-B (VideoMAE) | K400 | 16x4 | 243 | 32.3 | link |
EVAD | 0.6 | 4 | ViT_B_16x4_KTP_EW | ViT-B (VideoMAE) | K400 | 16x4 | 209 | 31.8 | link |
EVAD | 0.7 | 1 | ViT_B_16x4_KTP | ViT-B (VideoMAEv2) | K710+K400 | 16x4 | 243 | 37.7 | link |
EVAD | 0.7 | 1 | ViT_L_16x4_KTP | ViT-L (VideoMAE) | K700 | 16x4 | 737 | 39.7 | link |
Training
python -m torch.distributed.launch --nproc_per_node=8 projects/evad/run_net.py --cfg "projects/evad/configs/config_file.yaml" DATA.PATH_TO_DATA_DIR "path/to/ava" TRAIN.CHECKPOINT_FILE_PATH "path/to/pretrain.pth" OUTPUT_DIR "path/to/output"
Validation
You can load specific checkpoint file with TEST.CHECKPOINT_FILE_PATH
or autoload the last checkpoint from the output folder.
python -m torch.distributed.launch --nproc_per_node=1 projects/evad/run_net.py --cfg "projects/evad/configs/config_file.yaml" DATA.PATH_TO_DATA_DIR "path/to/ava" TRAIN.ENABLE False TEST.ENABLE True NUM_GPUS 1 OUTPUT_DIR "path/to/output"
Acknowledgements
This project is built upon SparseR-CNN and PySlowFast. We also reference and use some code from WOO and VideoMAE. Thanks to the contributors of these great codebases.
License
The majority of this project is released under the CC-BY-NC 4.0 license as found in the LICENSE file. Portions of the project are available under separate license terms: SlowFast and pytorch-image-models are licensed under the Apache 2.0 license. SparseR-CNN is licensed under the MIT license.
Citation
If you find this project useful, please feel free to leave a star and cite our paper:
@inproceedings{chen2023efficient,
author = {Chen, Lei and Tong, Zhan and Song, Yibing and Wu, Gangshan and Wang, Limin},
title = {Efficient Video Action Detection with Token Dropout and Context Refinement},
booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
year = {2023}
}
@article{chen2023efficient,
title={Efficient Video Action Detection with Token Dropout and Context Refinement},
author={Chen, Lei and Tong, Zhan and Song, Yibing and Wu, Gangshan and Wang, Limin},
journal={arXiv preprint arXiv:2304.08451},
year={2023}
}