Home

Awesome

Repetitive Activity Counting by Sight and Sound (CVPR 2021)

Yunhua Zhang, Ling Shao, Cees G.M. Snoek

CVPR Presentation Video

<img width="400" alt="Screenshot 2021-04-09 at 00 27 31" src="https://user-images.githubusercontent.com/22721775/114104033-70e7fe80-98ca-11eb-9541-7268fc683ad9.png">

Demo video

Demo video

Demo code

Requirements

Run Demo

Some Illustrations

import moviepy.editor as mp
clip = mp.VideoFileClip(path_to_video).subclip(start_time, end_time)
clip.audio.write_audiofile(path_for_save)

If you want our extracted audio files, pls send me an email or create an issue with your email address.

Training on Countix & Countix-AV

For the following code, we train the modules separately so two NVIDIA 1080Ti GPUs are enough for the training. The visual model is trained on Countix, and the audio model and the cross-modal modules are trained on Countix-AV. The resulted overall model is expected to test on Countix-AV. To test on the Countix dataset, the reliablity estimation should be retrained on the Countix dataset. For our model, the hyparameters influence the performance to some extent, see the supplementary material for more details. To be specific, we try the number of branches from 20 to 50 to find the best one and for the margin for the temporal stride decision module, we try from 1.0 to 3.0.

python train.py

Then, generate the counting predictions with the model of the sample rate from 1 to 7. After that, run this script to get the csv file for training the temporal stride decision module:

python generate_csv4sr.py
python train_sr.py
python train_sr_audio.py
python train_audio.py
python train_conf.py

Some Tips for further improvement

Datasets

Countix-AV

We provide the train, validation, and test sets of Countix-AV dataset in CountixAV_train.csv, CountixAV_val.csv, and CountixAV_test.csv.

Extreme Countix-AV

The dataset can be downloaded at this link

Contact

If you have any problems with the code, feel free to send an email to me: y.zhang9@uva.nl or create an issue.