Home

Awesome

<p align=center> A Decoupled Spatio-Temporal Framework for <br>Skeleton-based Action Segmentation</p>

Authors: Yunheng Li, Zhongyu Li, Shanghua Gao, Qilong Wang, Qibin Hou*, Ming-Ming Cheng (*Corresponding author)

[paper] [github] [dataset] [pretrained models] [Parameter&Flops] [visualization]


Abstract: Effectively modeling discriminative spatio-temporal information is essential for segmenting activities in long action sequences. However, we observe that existing methods are limited in weak spatio-temporal modeling capability due to two forms of decoupled modeling: (i) cascaded interaction couples spatial and temporal modeling, which over-smooths motion modeling over the long sequence, and (ii) joint-shared temporal modeling adopts shared weights to model each joint, ignoring the distinct motion patterns of different joints. We propose a Decoupled Spatio-Temporal Framework (DeST) to address the above issues. Firstly, we decouple the cascaded spatio-temporal interaction to avoid stacking multiple spatio-temporal blocks, while achieving sufficient spatio-temporal interaction. Specifically, DeST performs once unified spatial modeling and divides the spatial features into different groups of subfeatures, which then adaptively interact with temporal features from different layers. Since the different sub-features contain distinct spatial semantics, the model could learn the optimal interaction pattern at each layer. Meanwhile, inspired by the fact that different joints move at different speeds, we propose joint-decoupled temporal modeling, which employs independent trainable weights to capture distinctive temporal features of each joint. On four large-scale benchmarks of different scenes, DeST significantly outperforms current state-of-the-art methods with less computational complexity.

<p align="center"> <img src="imgs/framework.png" width="900" width="1200"/> <br /> <em> Figure 1: Overview of the DeST. </em> </p>

Introduction

This repo is the official PyTorch implementation for ''A Decoupled Spatio-Temporal Framework for Skeleton-based Action Segmentation''.

Dependencies and Installation:

The code requires python>=3.7, as well as pytorch>=1.7 and torchvision>=0.8.

For example:

  1. Clone Repository
git clone https://github.com/lyhisme/DeST.git
  1. Create Conda Environment and Install Dependencies
conda env create -f environment.yml
conda activate DeST

Preparation

Datasets

All datasets can be downloaded from GoogleDrive or BaiduNetdisk. (~4.3GB)

Pretrained models:

DatasetModelF1@10F1@25F1@50EditAccModel Zoo
MCFS-22DeST (tcn)86.683.573.282.378.7[Google Drive] [BaiduNetdisk]
MCFS-22DeST (linearformer)87.484.575.085.280.4[Google Drive] [BaiduNetdisk]
MCFS-130DeST (tcn)74.070.761.973.870.5[Google Drive] [BaiduNetdisk]
MCFS-130DeST (linearformer)76.372.963.576.272.0[Google Drive] [BaiduNetdisk]
PKU-MMD (sub)DeST (tcn)71.768.055.566.367.6[Google Drive] [BaiduNetdisk]
PKU-MMD (sub)DeST (linearformer)75.372.260.270.570.8[Google Drive] [BaiduNetdisk]
PKU-MMD (view)DeST (tcn)63.259.247.658.262.4[Google Drive] [BaiduNetdisk]
PKU-MMD (view)DeST (linearformer)69.365.652.064.767.3[Google Drive] [BaiduNetdisk]
LARADeST (tcn)69.766.755.863.772.6[Google Drive] [BaiduNetdisk]
LARADeST (linearformer)70.368.057.864.275.1[Google Drive] [BaiduNetdisk]

Orgnize the checkpoints and dataset folder in the following structure (Note: please check it carefully):

|-- config
|   |-- DeST_linearformer
|   |   |-- MCFS-130
|   |   |   `-- config.yaml
├─ dataset
|   |-- MCFS-130
|   |   |-- features/
│   |   |-- groundTruth/
│   |   |-- gt_arr/
│   |   |-- gt_boundary_arr/
│   |   |-- splits/
|   |   |-- mapping.txt
|-- csv/
|-- result
|   |-- MCFS-130
|   |   |-- DeST_linearformer
|   |   |   |-- split1
|   |   |   |   `-- best_test_model.prm
|-- train.py
|-- evaluate.py
|-- libs/
|-- save_pred.py
`-- utils/

Get Started

Training

You can train a model by changing the settings of the configuration file.

python train.py ./config/xxx/xxx/config.yaml

Example:

python train.py ./config/DeST_linearformer/MCFS-130/config.yaml

Evaluation

You can evaluate the performance of result after running.

python evaluate.py ./config/xxx/xxx/config.yaml

Example:

python evaluate.py ./config/DeST_linearformer/MCFS-130/config.yaml 

average cross validation results

python utils/average_cv_results.py [result_dir]

Example:

python utils/average_cv_results.py ./result/MCFS-130/DeST_linearformer/

Citation

If you find our repo useful for your research, please consider citing our paper:

@article{DeST,
  title={A Decoupled Spatio-Temporal Framework for Skeleton-based Action Segmentation},
  author={Yunheng Li, Zhongyu Li, Shanghua Gao, Qilong Wang, Qibin Hou, Ming-Ming Cheng},
  journal={arXiv preprint arXiv:2312.05830},
  year={2023}
}

Acknowledgement

Our work is closely related to the following assets that inspire our implementation. We gratefully thank the authors.

<!-- We appreciate [MS-TCN](https://github.com/yabufarha/ms-tcn) for backbone network and evaluation code. Appreciating [Yuchi Ishikawa](https://github.com/yiskw713 ) shares the re-implementation of [MS-TCN](https://github.com/yiskw713/ms-tcn) with pytorch. -->

License

Licensed under a Creative Commons Attribution-NonCommercial 4.0 International for Non-commercial use only. Any commercial use should get formal permission first.

Contact

If you have any other questions, please email yunhengli AT mail.nankai.edu.cn and I typically respond within a few days.