Home

Awesome

SEVERE Benchmark

📰 News

[2023.8.22] Code and pre-trained models of Tubelet Contrast will be released soon! Keep a look at this repo!<br> [2023.8.22] Code for evaluation of Tubelet Contrast pretrained models is added this repo. 🎉<br> [2023.7.13] Our [Tubelet Contrast] (https://arxiv.org/abs/2303.11003) paper is accepted by ICCV 2023! 🎉<br>

Official code for our ECCV 2022 paper How Severe is Benchmark-Sensitivity in Video Self-Supervised Learning?

TL;DR. We propose the SEVERE (<ins>SE</ins>nsitivity of <ins>V</ins>id<ins>E</ins>o <ins>RE</ins>presentations) benchmark for evaluating the generalizability of representations obtained by existing and future self-supervised video learning methods.

Overview of Experiments

We evaluate 9 video self-supervised learning (VSSL) methods on 7 video datasets for 6 video understanding tasks.

Evaluated VSSL models

Below are the video self-suprevised methods that we evaluate.

ModelURL
SeLaVihttps://github.com/facebookresearch/selavi
MoCohttps://github.com/tinapan-pt/VideoMoCo
VideoMoCohttps://github.com/tinapan-pt/VideoMoCo
Pretext-Contrasthttps://github.com/BestJuly/Pretext-Contrastive-Learning
RSPNethttps://github.com/PeihaoChen/RSPNet
AVID-CMAhttps://github.com/facebookresearch/AVID-CMA
CtPhttps://github.com/microsoft/CtP
TCLRhttps://github.com/DAVEISHAN/TCLR
GDThttps://github.com/facebookresearch/GDT
Supervisedhttps://pytorch.org/vision/0.8/_modules/torchvision/models/video/resnet.html#r2plus1d_18

Download Kinetics-400 pretrained R(2+1D)-18 weights for each method from here. Unzip the downloaded file and it shall create a folder checkpoints_pretraining/ with all the pretraining model weights.

Experiments

We divide these downstream evaluations across four axes:

I. Downstream domain-shift

We evaluate the sensitivity of self-supervised methods to the domain shift in downstream dataset with respect to the pre-training dataset i.e. Kinetics.

Please refer to action_recognition/README.md for steps to reproduce the experiments with varying downstream domain datasets like .

II. Downstream sample-sizes

We evaluate the sensitivity of self-supervised methods to the amount of downstream samples available for finetuning.

Please refer to action_recognition/README.md for steps to reproduce the experiments with varying downstream samples.

III. Downstream action granularities

We investigate whether self-supervised methods can learn fine-grained features required for recognizing semantically similar actions.

<!--- We evaluate on various subsets defined for [Fine-Gym](https://sdolivia.github.io/FineGym/) dataset. -->

Please refer to action_recognition/README.md for steps to reproduce the experiments with varying downstream actions.

IV. Downstream task-shift

We study the sensitivity of video self-supervised methods to nature of the downstream task.

In-domain task shift: For task-shift within-domain, we evaluate the UCF dataset for the task of repetition counting. Please refer to Repetition-Counting/README.md for steps to reproduce experiments.

Out-of-domain task shift: For task-shift as well as domain shift, we evaluate on multi-label action classification on Charades and action detection on AVA. Please refer to action_detection_multi_label_classification/README.md for steps to reproduce the experiments.

The SEVERE Benchmark

From our analysis we distill the SEVERE-benchmark, a subset of our experiments, that can be useful for evaluating current and future video representations beyond standard benchmarks.

Citation

If you use our work or code, kindly consider citing our paper:

@inproceedings{thoker2022severe,
  author    = {Thoker, Fida Mohammad and Doughty, Hazel and Bagad, Piyush and Snoek, Cees},
  title     = {How Severe is Benchmark-Sensitivity in Video Self-Supervised Learning?},
  journal   = {ECCV},
  year      = {2022},
}

Acknowledgements

Maintainers

:bell: If you face an issue or have suggestions, please create a Github issue and we will try our best to address soon.