Home

Awesome

Delving into Details: Synopsis-to-Detail Networks for Video Recognition (S2DNet)

This repo is the official Pytorch implementation of our paper:

Delving into Details: Synopsis-to-Detail Networks for Video Recognition
Shuxian Liang, Xu Shen, Jianqiang Huang, Xian-Sheng Hua
ECCV 2022 Oral

s2dnet

Prerequisites

The code is built with following libraries:

Data Preparation

Our work uses Kinetics and Something-Something V1&V2 for evaluation. We first extract videos into frames for fast reading. Please refer to TSN and TSM repositories for the detailed guide of data pre-processing.

For Mini-Kinetics dataset used in our paper, you need to use the train/val splits file from AR-Net.

Training and Evaluation

(1) To train S2DNet (e.g., on MINI-Kinetics), run

### Stage1: WARMUP
python s2d_main.py \
    mini-kinetics \
    configs/mini_kinetics/train_warmup.yaml \
    NUM_GPUS 2 \
    TRAIN.BASE_LR 1e-5 \
    TRAIN.BATCH_SIZE 32 \
    SNET.ARCH 'mobilenetv2'

### Stage2: SAMPLING
python s2d_main.py \
    mini-kinetics \
    configs/mini_kinetics/train_sampling.yaml \
    NUM_GPUS 2 \
    TRAIN.RESUME path/to/warmup_ckpt \
    TRAIN.BASE_LR 1e-6 \
    TRAIN.BATCH_SIZE 32 \
    SNET.ARCH 'mobilenetv2'

(2) To test S2DNet (e.g., on MINI-Kinetics), run

python s2d_main.py \
    mini-kinetics \
    configs/mini_kinetics/train_sampling.yaml \
    NUM_GPUS 2 \
    TRAIN.RESUME path/to/test_ckpt \
    TRAIN.BATCH_SIZE 32 \
    SNET.ARCH 'mobilenetv2' \
    EVALUATE True

Acknowledgement

In this project we use parts of the implementations of the following works:

Citing

If you find our code or paper useful, please consider citing

@inproceedings{liang2022delving,
    title={Delving into Details: Synopsis-to-Detail Networks for Video Recognition},
    author={Shuxian Liang, Xu Shen, Jianqiang Huang, Xian-Sheng Hua},
    booktitle={European Conference on Computer Vision},
    year={2022}
}