Awesome
TAM: Temporal Adaptive Module for Video Recognition [arXiv]
@inproceedings{liu2021tam,
title={TAM: Temporal adaptive module for video recognition},
author={Liu, Zhaoyang and Wang, Limin and Wu, Wayne and Qian, Chen and Lu, Tong},
booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
pages={13708--13718},
year={2021}
}
[NEW!] 2021/07/23 - Our paper has been accepted by ICCV2021. More pretrained models will be released soon for research purpose. Welcom to follow our work!
[NEW!] 2021/06/01 - Our temporal adaptive module has been integrated into MMAction2! We are glad to see our TAM achieved higher accuracy with MMaction2 in several datasets.
[NEW!] 2020/10/10 - We have released the code of TAM for research purpose.
Overview
We release the PyTorch code of the Temporal Adaptive Module.
<div align="center"> <img src="./visualization/full_arch.png" width = "600" alt="Architecture" align=center /> <br> <div style="color:orange; border-bottom: 2px solid #d9d9d9; display: inline-block; color: #999; padding: 10px;"> The overall architecture of TANet: ResNet-Block vs. TA-Block. </div> </div>Content
Prerequisites
The code is built with following libraries:
-
python 3.6 or higher
-
PyTorch 1.0 or higher
-
torchvision 0.2 or higher
-
opencv-python 4.1 or higher
Data Preparation
As following TSN and TSM repos, we provide a series of tools (vidtools) to extracte frames of video.
For convenience, the processing of video data can be summarized as follows:
-
Extract frames from videos.
-
Firstly, you need clone vidtools:
git clone https://github.com/liu-zhy/vidtools.git & cd vidtools
-
Extract frames by running:
python extract_frames.py VIDEOS_PATH/ \ -o DATASETS_PATH/frames/ \ -j 16 --out_ext png
We suggest you use
--out_ext jpg
with limited disk storage.
-
-
Generate the annotation.
The annotation usually includes train.txt, val.txt and test.txt (optional). The format of *.txt file is like:
frames/video_1 num_frames label_1 frames/video_2 num_frames label_2 frames/video_3 num_frames label_3 ... frames/video_N num_frames label_N
The pre-processed dataset is organized with the following structure:
datasets |_ Kinetics400 |_ frames | |_ [video_0] | | |_ img_00001.png | | |_ img_00001.png | | |_ ... | |_ [video_1] | |_ img_00001.png | |_ img_00002.png | |_ ... |_ annotations |_ train.txt |_ val.txt |_ test.txt (optional)
-
Configure the dataset in ops/dataset_configs.py.
Model ZOO
Here we provide some off-the-shelf pretrained models. The accuracy might vary a little bit compared to the paper, since the raw video of Kinetics downloaded by users may have some differences.
Models | Datasets | Resolution | Frames * Crops * Clips | Top-1 | Top-5 | Checkpoints |
---|---|---|---|---|---|---|
TAM-R50 | Kinetics-400 | 256 * 256 | 8 * 3 * 10 | 76.1% | 92.3% | ckpt |
TAM-R50 | Kinetics-400 | 256 * 256 | 16 * 3 * 10 | 76.9% | 92.9% | ckpt |
TAM-R50 | Sth-Sth v1 | 224 * 224 | 8 * 1 * 1 | 46.5% | 75.8% | ckpt |
TAM-R50 | Sth-Sth v1 | 224 * 224 | 16 * 1 * 1 | 47.6% | 77.7% | ckpt |
TAM-R50 | Sth-Sth v2 | 256 * 256 | 8 * 3 * 2 | 62.7% | 88.0% | ckpt |
TAM-R50 | Sth-Sth v2 | 256 * 256 | 16 * 3 * 2 | 64.6% | 89.5% | ckpt |
After downloading the checkpoints and putting them into the target path, you can test the TAM with these pretrained weights.
Testing
For example, to test the downloaded pretrained models on Kinetics, you can run scripts/test_tam_kinetics_rgb_8f.sh
. The scripts will test TAM with 8-frame setting:
# test TAM on Kinetics-400
python -u test_models.py kinetics \
--weights=./checkpoints/kinetics_RGB_resnet50_tam_avg_segment8_e100_dense/ckpt.best.pth.tar \
--test_segments=8 --test_crops=3 \
--full_res --sample dense-10 --batch_size 8
We should notice that --sample
can determine the sampling strategy in the testing. Specifically, --sample uniform-N
denotes the model takes N clips uniformly sampled from video as inputs, and --sample dense-N
denotes the model takes N clips densely sampled from video as inputs.
You also can test TAM on Something-Something V2 by running scripts/test_tam_somethingv2_rgb_8f.sh:
# test TAM on Something-Something V2
python -u test_models.py somethingv2 \
--weights=./checkpoints/something_RGB_resnet50_tam_avg_segment8_e50/ckpt.best.pth.tar \
--test_segments=8 --test_crops=3 \
--full_res --sample uniform-2 --batch_size 32
Training
We provided several scripts to train TAM in this repo:
-
To train on Kinetics from ImageNet pretrained models, you can run
scripts/train_tam_kinetics_rgb_8f.sh
, which contains:python -u main.py kinetics RGB --arch resnet50 \ --num_segments 8 --gd 20 --lr 0.01 --lr_steps 50 75 90 --epochs 100 --batch-size 8 \ -j 8 --dropout 0.5 --consensus_type=avg --root_log ./checkpoint/this_ckpt \ --root_model ./checkpoint/this_ckpt --eval-freq=1 --npb \ --self_conv --dense_sample --wd 0.0001
After training, you should get a new checkpoint as downloaded above.
-
To train on Something-Something dataset (V1 & V2), you can run following commands:
# train TAM on Something-Something V1 bash scripts/train_tam_something_rgb_8f.sh # train TAM on Something-Something V2 bash scripts/train_tam_somethingv2_rgb_8f.sh