Home

Awesome

[CVPR 2022] MS-TCT

[Paper Link]

In this repository, we provide an implementation of "MS-TCT: Multi-Scale Temporal ConvTransformer for Action Detection" on Charades dataset (Localization setting, i.e., Charades_v1_localize). If you want to train and evaluate MS-TCT, you can follow the following steps. For MultiTHUMOS, you can follow the training process here.

Prepare the I3D feature

Like the previous works (e.g. TGM, PDAN), MS-TCT is built on top of the pre-trained I3D features. Thus, feature extraction is needed before training the network.

  1. Please download the Charades dataset (24 fps version) from this link.
  2. Follow this repository to extract the snippet-level I3D feature.

Dependencies

Please satisfy the following dependencies to train MS-TCT correctly:

Quick Start

  1. Change the rgb_root to the extracted feature path in the train.py.
  2. Use ./run_MSTCT_Charades.sh for training on Charades-RGB. The best logits will be saved automatically in ./save_logit.
  3. Use python Evaluation.py -pkl_path /best_logit_path/ to evaluate the model with the per-frame mAP and the action-conditional metrics.

Remarks

Reference

If you find our repo or paper useful, please cite us as

  @inproceedings{dai2022mstct,
    title={{MS-TCT: Multi-Scale Temporal ConvTransformer for Action Detection}},
    author={Dai, Rui and Das, Srijan and Kahatapitiya, Kumara and Ryoo, Michael and Bremond, Francois},
    booktitle={CVPR},
    year={2022}
  }

Contact: rui.dai@inria.fr