Home

Awesome

<p align="center">LOGO: A Long-Form Video Dataset for Group Action Quality Assessment (CVPR 2023)</p>

<p align="center">Shiyi Zhang, Wenxun Dai, Sujia Wang, Xiangwei Shen, Jiwen Lu, Jie Zhou, Yansong Tang†</p>

<p align="center">[Paper] [Google Drive] [Baidu Drive] (extract number: v329)</p>

This repository contains the LOGO dataset and PyTorch implementation for the paper "LOGO: A Long-Form Video Dataset for Group Action Quality Assessment" (CVPR 2023)


πŸ’‘ LOGO Dataset and GOAT Pipeline

LOGO is a multi-person long-form video dataset with frame-wise annotations on both action procedures (as shown in the second line) and formations (as shown in the third line, which reflects relations among actors) based on artistic swimming scenarios. It also contains score annotations for AQA.

GOAT (short for GrOup-aware ATtention)

πŸ“‹ To-Do List

:books: Dataset

πŸ—’οΈ Lexicon

LOGO is organized by temporal structure, which contains action and formation manual annotations. Herein, we design the labeling system with professional artistic swimming athletes to construct a lexicon for annotation, considering FINA rules and the actual scenario of the competitions. In the Technical event, the group size is eight people, the video length is $170Β±15s$, and the actions include Upper, Lower, Float, None, Acrobatic, Cadence, and five Required Elements. Each competition cycle needs to complete five Required Elements, at least two Acrobatic movements, and at least one Cadence action. In the Free events, there are 8 people, the video length is $240Β±15s$, and the actions include Upper, Lower, Float, None, Acrobatic, Cadence, and Free elements. When performing Required, Upper, Lower, and Float, the athletes form neat polygons.

:pen: Annotation

Given an RGB artistic swimming video, the annotator utilizes our defined lexicon to label each frame with its action and formation. We accomplish the 25fps frame-wise action annotation stage utilizing the COIN Annotation Toolbox and the 1fps frame-wise formation labels using Labelme. Specifically, we set strict rules defining the boundaries between artistic swimming sequences and the formation marking position and employ eight workers with prior knowledge in the artistic swimming domain to label the dataset frame by frame following the rules. The annotation results of one worker are checked and adjusted by another, which ensures annotation results are double-checked.

The annotation information is saved in [Google Drive] or [Baidu Drive] (extract number: ojgf)

The annotation information contained in anno dict.pkl for each sample is:

List Num.TypeDescriptionExample
0stringEvent type.'tech'
1floatThe score of the video.90.25
2float//
3listEnd frame of the action instance.[76, 141, 187, 246, 263, Β·Β·Β·]
4listAction type of each frame.[12, 12, 12, 12, 12, Β·Β·Β·]

:chart_with_upwards_trend: Statistics

The LOGO dataset consists of 200 video samples from 26 events with 204.2s average duration and above 11h total duration, covering 3 annotation types, 12 action types, and 17 formation types.

πŸ’Ύ Download

:notebook: Data Preparation

$DATASET_ROOT
β”œβ”€β”€ LOGO
|  β”œβ”€β”€ WorldChampionship2019_free_final
|     β”œβ”€β”€ 0
|        β”œβ”€β”€ 00000.jpg
|        ...
|        └── 06249.jpg
|     ...
|     └── 11
|        β”œβ”€β”€ 00000.jpg
|        ...
|        └── 06249.jpg
|  ...
|  └── WorldChampionship2022_free_final
|     β”œβ”€β”€ 0
|     ...
|     └── 7 
└──

πŸ’» Code for Group-aware Attention (GOAT)

⭐️ Performance

βš™οΈ Pretrain Model

The Kinetics pretrained I3D downloaded from the repository kinetics_i3d_pytorch

model_rgb.pth

πŸ—‚οΈ Requirement

pip install git+https://github.com/hassony2/torch_videovision

πŸ“Š Training

USDL + GOAT + I3D

cd ./MUSDL-GOAT/MTL-AQA
python main.py --lr=7e-06 --weight_decay=0.001 --use_i3d_bb=1 --use_swin_bb=0

USDL +GOAT + Video Swin-Transformer

cd ./MUSDL-GOAT/MTL-AQA
python main.py --lr=1e-05 --weight_decay=0.0001 --use_i3d_bb=0 --use_swin_bb=1

CORE + GOAT + I3D

cd ./CoRe-GOAT/MTL-AQA
python main.py --lr=1e-06 --warmup=0 --use_i3d_bb=1 --use_swin_bb=0 --bs_train=2 --weight_decay=1e-5

CORE + GOAT + Video Swin-Transformer

cd ./CoRe-GOAT/MTL-AQA
python main.py --lr=3e-07 --warmup=0 --use_i3d_bb=0 --use_swin_bb=1 --bs_train=1 --weight_decay=1e-5

πŸ“§ Contact

E-mail: sy-zhang23@mails.tsinghua.edu.cn

WeChat: ZSYi-408