Awesome
Video Feature Extraction with Video Swin Transformer
Installation
Please see the instruction of the original repo.
Preparation
Video List
Create a video list with each line containing a video path and a dummy label. For example,
/PATH/TO/video1.mp4 0
/PATH/TO/video2.mp4 0
...
Checkpoints
Download checkpoints from the orginal repo.
Usage
python tools/extract.py \
CONFIG \
CHECKPOINT \
OUTPUT \
--cfg-options \
data.test.ann_file=FILE_LIST \
[OTHER_OPTIONS] \
[--dataset DATASET]
This implementation only supports feature extraction with Swin-B pre-trained on Kinetics 400 or 600 and running on a single GPU. For example, to extract features of VATEX with Swin-B pre-trained on Kinetics 600,
python tools/extract.py \
configs/recognition/swin/swin_base_patch244_window877_kinetics600_22k.py \
swin_base_patch244_window877_kinetics600_22k.pth \
vatex.h5 \
--cfg-options \
data.test.ann_file=vatex.txt \
data.test.pipeline.1.window_interval=32 \
model.test_cfg.max_testing_views=4 \
--dataset vatex
Set data.test.pipeline.1.window_interval
to adjust the number of frames between two windows.
Set model.test_cfg.max_testing_views
to fit your GPU memory size.
The features of all videos are collected in an hdf5 file OUTPUT
.
Specify --dataset
if you need a customed key for mapping to video feature in the hdf5 file.
You have to implement the key parser in the function get_key_parser
in tools/extract.py
, which, given a video path, outputs the video feature key.
The default feature key of a video is its file name without the path and extension.