

Repetitive Activity Counting by Sight and Sound (CVPR 2021)

Yunhua Zhang, Ling Shao, Cees G.M. Snoek

CVPR Presentation Video

<img width="400" alt="Screenshot 2021-04-09 at 00 27 31" src="https://user-images.githubusercontent.com/22721775/114104033-70e7fe80-98ca-11eb-9541-7268fc683ad9.png">

Demo video

Demo code


Run Demo

import moviepy.editor as mp
clip = mp.VideoFileClip(path_to_video).subclip(start_time, end_time)

If you want our extracted audio files, pls send me an email or create an issue with your email address.

Training on Countix & Countix-AV

For the following code, we train the modules separately so two NVIDIA 1080Ti GPUs are enough for the training. The visual model is trained on Countix, and the audio model and the cross-modal modules are trained on Countix-AV. The resulted overall model is expected to test on Countix-AV. To test on the Countix dataset, the reliablity estimation should be retrained on the Countix dataset. For our model, the hyparameters influence the performance to some extent, see the supplementary material for more details. To be specific, we try the number of branches from 20 to 50 to find the best one and for the margin for the temporal stride decision module, we try from 1.0 to 3.0.

python train.py

Then, generate the counting predictions with the model of the sample rate from 1 to 7. After that, run this script to get the csv file for training the temporal stride decision module:

python generate_csv4sr.py
python train_sr.py
python train_sr_audio.py
python train_audio.py
python train_conf.py

We provide the train, validation, and test sets of Countix-AV dataset in CountixAV_train.csv, CountixAV_val.csv, and CountixAV_test.csv.

Extreme Countix-AV

The dataset can be downloaded at this link


If you have any problems with the code, feel free to send an email to me: y.zhang9@uva.nl or create an issue.