Home

Awesome

Proposal-based Multiple Instance Learning for Weakly-supervised Temporal Action Localization (CVPR 2023)

Huan Ren, Wenfei Yang, Tianzhu Zhang, Yongdong Zhang (USTC)

arxiv CVPR2023 project

Requirements

Required packages are listed in requirements.txt. You can install by running:

conda create -n P-MIL python=3.8
conda activate P-MIL
conda install pytorch==1.8.0 torchvision==0.9.0 torchaudio==0.8.0 cudatoolkit=11.1 -c pytorch -c conda-forge
pip3 install -r requirements.txt

Data Preparation

  1. Prepare THUMOS14 dataset.

    • We recommend using features and annotations provided by W-TALC or CO2-Net.
    • You can also get access of it from Google Drive.
  2. Prepare proposals generated from pre-trained S-MIL model.

    • We recommend using their official codes (such as CO2-Net) to generate proposals.
    • You can just download the proposals used in our paper from Google Drive.
  3. Place the features and annotations inside a data/Thumos14reduced/ folder and proposals inside a proposals folder. Make sure the data structure is as below.

    ├── data
        └── Thumos14reduced
            ├── Thumos14reduced-I3D-JOINTFeatures.npy
            └── Thumos14reduced-Annotations
                ├── Ambiguous_test.txt
                ├── classlist.npy
                ├── duration.npy
                ├── extracted_fps.npy
                ├── labels_all.npy
                ├── labels.npy
                ├── original_fps.npy
                ├── segments.npy
                ├── subset.npy
                └── videoname.npy
    ├── proposals
        ├── detection_result_base_test.json
        ├── detection_result_base_train.json

Running

Training

CUDA_VISIBLE_DEVICES=0 python main.py --run_type train

Testing

The pre-trained model can be downloaded from Google Drive, which is then placed inside a checkpoints folder.

CUDA_VISIBLE_DEVICES=0 python main.py --run_type test --pretrained_ckpt checkpoints/best_model.pkl

Results

The experimental results on THUMOS14 are as below. Note that the performance of checkpoints we provided is slightly different from the orignal paper!

Method \ mAP@IoU (%)0.10.20.30.40.50.60.7AVG
P-MIL70.866.557.848.639.827.014.346.4

Citation

@InProceedings{Ren_2023_CVPR,
    author    = {Ren, Huan and Yang, Wenfei and Zhang, Tianzhu and Zhang, Yongdong},
    title     = {Proposal-Based Multiple Instance Learning for Weakly-Supervised Temporal Action Localization},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    month     = {June},
    year      = {2023},
    pages     = {2394-2404}
}

Acknowledgement

We referenced the repos below for the code.