

[CVPR 2022] MS-TCT

[Paper Link]

In this repository, we provide an implementation of "MS-TCT: Multi-Scale Temporal ConvTransformer for Action Detection" on Charades dataset (Localization setting, i.e., Charades_v1_localize). If you want to train and evaluate MS-TCT, you can follow the following steps. For MultiTHUMOS, you can follow the training process here.

Prepare the I3D feature

Like the previous works (e.g. TGM, PDAN), MS-TCT is built on top of the pre-trained I3D features. Thus, feature extraction is needed before training the network.

  1. Please download the Charades dataset (24 fps version) from this link.
  2. Follow this repository to extract the snippet-level I3D feature.


Please satisfy the following dependencies to train MS-TCT correctly:

Quick Start

  1. Change the rgb_root to the extracted feature path in the train.py.
  2. Use ./run_MSTCT_Charades.sh for training on Charades-RGB. The best logits will be saved automatically in ./save_logit.
  3. Use python Evaluation.py -pkl_path /best_logit_path/ to evaluate the model with the per-frame mAP and the action-conditional metrics.



If you find our repo or paper useful, please cite us as

    title={{MS-TCT: Multi-Scale Temporal ConvTransformer for Action Detection}},
    author={Dai, Rui and Das, Srijan and Kahatapitiya, Kumara and Ryoo, Michael and Bremond, Francois},

Contact: rui.dai@inria.fr