Awesome
<p align="center">LOGO: A Long-Form Video Dataset for Group Action Quality Assessment (CVPR 2023)</p>
<p align="center">Shiyi Zhang, Wenxun Dai, Sujia Wang, Xiangwei Shen, Jiwen Lu, Jie Zhou, Yansong Tangβ </p>
<p align="center">[Paper] [Google Drive] [Baidu Drive] (extract number: v329)</p>
This repository contains the LOGO dataset and PyTorch implementation for the paper "LOGO: A Long-Form Video Dataset for Group Action Quality Assessment" (CVPR 2023)
π‘ LOGO Dataset and GOAT Pipeline
LOGO is a multi-person long-form video dataset with frame-wise annotations on both action procedures (as shown in the second line) and formations (as shown in the third line, which reflects relations among actors) based on artistic swimming scenarios. It also contains score annotations for AQA.
GOAT (short for GrOup-aware ATtention)
π To-Do List
- Release the dataset
- The code of GOAT
- Pretrained features for LOGO
:books: Dataset
ποΈ Lexicon
LOGO is organized by temporal structure, which contains action and formation manual annotations. Herein, we design the labeling system with professional artistic swimming athletes to construct a lexicon for annotation, considering FINA rules and the actual scenario of the competitions. In the Technical event, the group size is eight people, the video length is $170Β±15s$, and the actions include Upper, Lower, Float, None, Acrobatic, Cadence, and five Required Elements. Each competition cycle needs to complete five Required Elements, at least two Acrobatic movements, and at least one Cadence action. In the Free events, there are 8 people, the video length is $240Β±15s$, and the actions include Upper, Lower, Float, None, Acrobatic, Cadence, and Free elements. When performing Required, Upper, Lower, and Float, the athletes form neat polygons.
:pen: Annotation
Given an RGB artistic swimming video, the annotator utilizes our defined lexicon to label each frame with its action and formation. We accomplish the 25fps frame-wise action annotation stage utilizing the COIN Annotation Toolbox and the 1fps frame-wise formation labels using Labelme. Specifically, we set strict rules defining the boundaries between artistic swimming sequences and the formation marking position and employ eight workers with prior knowledge in the artistic swimming domain to label the dataset frame by frame following the rules. The annotation results of one worker are checked and adjusted by another, which ensures annotation results are double-checked.
The annotation information is saved in [Google Drive] or [Baidu Drive] (extract number: ojgf)
The annotation information contained in anno dict.pkl
for each sample is:
List Num. | Type | Description | Example |
---|---|---|---|
0 | string | Event type. | 'tech' |
1 | float | The score of the video. | 90.25 |
2 | float | / | / |
3 | list | End frame of the action instance. | [76, 141, 187, 246, 263, Β·Β·Β·] |
4 | list | Action type of each frame. | [12, 12, 12, 12, 12, Β·Β·Β·] |
:chart_with_upwards_trend: Statistics
The LOGO dataset consists of 200 video samples from 26 events with 204.2s average duration and above 11h total duration, covering 3 annotation types, 12 action types, and 17 formation types.
πΎ Download
- Video_Frames: [Google Drive] or [Baidu Drive] (extract number: v329)
- Annotations and Split: [Google Drive] or [Baidu Drive] (extract number: ojgf)
:notebook: Data Preparation
-
The prepared dataset ([Google Drive] or [Baidu Drive] (extract number: v329) ) and annotations ([Google Drive] or [Baidu Drive] (extract number: ojgf)) are already provided in this repo.
-
The data structure should be:
$DATASET_ROOT
βββ LOGO
| βββ WorldChampionship2019_free_final
| βββ 0
| βββ 00000.jpg
| ...
| βββ 06249.jpg
| ...
| βββ 11
| βββ 00000.jpg
| ...
| βββ 06249.jpg
| ...
| βββ WorldChampionship2022_free_final
| βββ 0
| ...
| βββ 7
βββ
π» Code for Group-aware Attention (GOAT)
βοΈ Performance
βοΈ Pretrain Model
The Kinetics pretrained I3D downloaded from the repository kinetics_i3d_pytorch
model_rgb.pth
ποΈ Requirement
- torch_videovision
pip install git+https://github.com/hassony2/torch_videovision
π Training
USDL + GOAT + I3D
cd ./MUSDL-GOAT/MTL-AQA
python main.py --lr=7e-06 --weight_decay=0.001 --use_i3d_bb=1 --use_swin_bb=0
USDL +GOAT + Video Swin-Transformer
cd ./MUSDL-GOAT/MTL-AQA
python main.py --lr=1e-05 --weight_decay=0.0001 --use_i3d_bb=0 --use_swin_bb=1
CORE + GOAT + I3D
cd ./CoRe-GOAT/MTL-AQA
python main.py --lr=1e-06 --warmup=0 --use_i3d_bb=1 --use_swin_bb=0 --bs_train=2 --weight_decay=1e-5
CORE + GOAT + Video Swin-Transformer
cd ./CoRe-GOAT/MTL-AQA
python main.py --lr=3e-07 --warmup=0 --use_i3d_bb=0 --use_swin_bb=1 --bs_train=1 --weight_decay=1e-5
π§ Contact
E-mail: sy-zhang23@mails.tsinghua.edu.cn
WeChat: ZSYi-408