Awesome

TransRAC: Encoding Multi-scale Temporal Correlation with Transformers for Repetitive Action Counting（CVPR 2022 Oral）

Here is the official implementation for CVPR 2022 paper "TransRAC: Encoding Multi-scale Temporal Correlation with Transformers for Repetitive Action Counting"

🌱News

2024-11-14: We noticed the authors of RepNet posted a note titled "A Short Note on Evaluating RepNet for Temporal Repetition Counting in Videos." We are writing a short paper with a detailed explanation of our experimental setting. In a word, we evaluate different frameworks by retraining them on RepCount-A.
2023-07-13: We are planning to release the RepCount-B dataset within a week.
2023-04-10: We have updated the Chinese introduction of the paper. [Zhihu]
2022-07-18: The model ckpt has been available.[OneDrive(extraction code: transrac)][BaiduDisk(extraction code: 2022)]
2022-06-24: We are invited to oral presentation with virtual attendance.
2022-06-01: The oral presentation of our work is available. [Youtube] [Bilibili]
2022-04-05: The preprint of the paper is available. [Paper]
2022-03-22: The Repetition Action Counting Dataset Homepage is open for the community. [Homepage]
2022-03-02: This paper has been accepted by CVPR 2022 as Oral presentation

Introduction

Counting repetitive actions are widely seen in human activities such as physical exercise. Existing methods focus on performing repetitive action counting in short videos, which is tough for dealing with longer videos in more realistic scenarios. In the data-driven era, the degradation of such generalization capability is mainly attributed to the lack of long video datasets. To complement this margin, we introduce a new large-scale repetitive action counting dataset covering a wide variety of video lengths, along with more realistic situations where action interruption or action inconsistencies occur in the video. Besides, we also provide a fine-grained annotation of the action cycles instead of just counting annotation along with a numerical value. Such a dataset contains 1451 videos with about 20000 annotations, which is more challenging. For repetitive action counting towards more realistic scenarios, we further propose encoding multi-scale temporal correlation with transformers that can take into account both performance and efficiency. Furthermore, with the help of fine-grained annotation of action cycles, we propose a density map regression-based method to predict the action period, which yields better performance with sufficient interpretability. Our proposed method outperforms state-of-the-art methods on all datasets and also achieves better performance on the un-seen dataset without fine-tuning.

RepCount Dataset

The Homepage of RepCount Dataset is available now.

Dataset introduction

We introduce a novel repetition action counting dataset called RepCount that contains videos with significant variations in length and allows for multiple kinds of anomaly cases. These video data collaborate with fine-grained annotations that indicate the beginning and end of each action period. Furthermore, the dataset consists of two subsets namely Part-A and Part-B. The videos in Part-A are fetched from YouTube, while the others in Part-B record simulated physical examinations by junior school students and teachers.

Video Presentation

<center><a href="https://www.bilibili.com/video/BV1B94y1S7oP?share_source=copy_web" target="_blank" style="color: #990000"> Bilibili </a></center> <br/> <center><a href="https://youtu.be/SFpUS9mHHpk" target="_blank" style="color: #990000"> YouTube </a></center>

Usage

Install

Please refer to install.md for installation.

Data preparation

Firstly, you should loading the pretrained model Video Swin Transformer(github) in to the folder 'pretrained'.

Secondly, you should modify train.py to your config.

Tips: The data form can be .mp4 or .npz. We recommend to use .npz data because it is faster. We will upload the preprocessed data(.npz) soon. You can also refer to video2npz to transform them by yourself.

Train

python train.py

Model Zoo

RepCount Dataset

Method	Backbone	Frame	Training Dataset	CheckPoint	MAE	OBO
Ours	Video Swin Transformer	64	RepCount-A	OneDrive(extraction code: transrac) / BaiduDisk(extraction code: 2022)	0.44	0.29

We will upload more TransRAC trained model soon which may help you.

If you have any questions, don't hesitate to contact us!

But please understand that the response may be delayed as we are working on other research.😖

Citation

If you find the project or the dataset is useful, please consider citing the paper.

@inproceedings{hu2022transrac,
  title={TransRAC: Encoding Multi-scale Temporal Correlation with Transformers for Repetitive Action Counting},
  author={Hu, Huazhang and Dong, Sixun and Zhao, Yiqun and Lian, Dongze and Li, Zhengxin and Gao, Shenghua},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages={19013--19022},
  year={2022}
}

@article{hu2022transrac,
  title={TransRAC: Encoding Multi-scale Temporal Correlation with Transformers for Repetitive Action Counting},
  author={Hu, Huazhang and Dong, Sixun and Zhao, Yiqun and Lian, Dongze and Li, Zhengxin and Gao, Shenghua},
  journal={arXiv preprint arXiv:2204.01018},
  year={2022}
}