Home

Awesome

assembly101-temporal-action-segmentation

This repository contains code and model for the Temporal Action Segmentation benchmark of Assembly101.

If you use our dataset and model, kindly cite:

@article{sener2022assembly101,
    title = {Assembly101: A Large-Scale Multi-View Video Dataset for Understanding Procedural Activities},
    author = {F. Sener and D. Chatterjee and D. Shelepov and K. He and D. Singhania and R. Wang and A. Yao},
    journal = {CVPR 2022},
}

@article{singhania2021coarse,
  title={Coarse to fine multi-resolution temporal convolutional network},
  author={Singhania, Dipika and Rahaman, Rahul and Yao, Angela},
  journal={arXiv preprint arXiv:2105.10859},
  year={2021}
}

Contents

Overview

This repository provides codes to train and validate for the task of coarse temporal action segmentation of the Assembly101 dataset. C2F-TCN is used here.

Data

Per-frame features are required as input. TSM (8-frame input) has been used for extracting 2048-D per-frame features which can be downloaded from our Gdrive. Please follow this for requesting drive access to download the .lmdb TSM features.

The action segmentation annotations and can be found here. Only the coarse-annotations are used.

Run data/data_stat.py to generate data statistics for each video after downloading the .lmdb features.

python data/data_stat.py lmdb_path

Training

To train our model, run

python main.py --action train --feature_path lmdb_path --split train

set --split train_val to use both train and val data for training.

Evaluate

To evaluate our model, run

python main.py --action predict --feature_path lmdb_path --test_aug 0
Splittest_augF1@10F1@25F1@50EditMoF
TrainFalse33.328.620.631.737.8

set --test_aug 1 to use data augmentation for evaluation.

The pre-trained model can be found here

License

Assembly101 is licensed by us under the Creative Commons Attribution-NonCommerial 4.0 International License, found here. The terms are :