Home

Awesome

Dual-Evidential Learning for Weakly-supervised Temporal Action Localization

Paper

Mengyuan Chen, Junyu Gao, Shicai Yang, Changsheng Xu

European Conference on Computer Vision (ECCV), 2022.

Update:2024/04/19

We have further optimized the code, and the provided pre-trained model can now achieve the following performance on THUMOS14:

@0.1@0.2@0.3@0.4@0.5@0.6@0.70.1-0.50.1-0.7
DELU (Paper)71.566.256.547.740.527.215.356.546.4
DELU (Latest)72.166.557.048.140.827.815.656.946.8

Table of Contents

  1. Introduction
  2. Preparation
  3. Testing
  4. Training
  5. Citation

Introduction

Weakly-supervised temporal action localization (WS-TAL) aims to localize the action instances and recognize their categories with only video-level labels. Despite great progress, existing methods suffer from severe action-background ambiguity, which mainly comes from background noise introduced by aggregation operations and large intra-action variations caused by the task gap between classification and localization. To address this issue, we propose a generalized evidential deep learning (EDL) framework for WS-TAL, called Dual-Evidential Learning for Uncertainty modeling (DELU), which extends the traditional paradigm of EDL to adapt to the weakly-supervised multi-label classification goal. Specifically, targeting at adaptively excluding the undesirable background snippets, we utilize the video-level uncertainty to measure the interference of background noise to video-level prediction. Then, the snippet-level uncertainty is further induced for progressive learning, which gradually focuses on the entire action instances in an ``easy-to-hard'' manner. Extensive experiments show that DELU achieves state-of-the-art performance on THUMOS14 and ActivityNet1.2 benchmarks.

avatar

Prerequisites

Requirements and Dependencies:

Here we list our used requirements and dependencies.

THUMOS-14 Dataset:

We use the 2048-d features provided by MM 2021 paper: Cross-modal Consensus Network for Weakly Supervised Temporal Action Localization. You can get access of the dataset from Google Drive or Baidu Disk. The annotations are included within this package.

ActivityNet-v1.2 Dataset:

We also use the features provided in MM2021-CO2-Net. The features can be obtained from here. The annotations are included within this package.

Testing

Download the pretrained models from Google Drive, and put them into "./download_ckpt/".

Test on THUMOS-14

Change "path/to/CO2-THUMOS-14" in the script into your own path to the dataset, and run:

cd scripts/
./test_thumos.sh

Test on ActivityNet-v1.2

Change "path/to/CO2-ActivityNet-12" in the script into your own path to the dataset, and run:

cd scripts/
./test_activitynet.sh

Training

Change the dataset paths as stated above, and run:

cd scripts/
./train_thumos.sh

or

cd scripts/
./train_activitynet.sh

Citation

If you find the code useful in your research, please cite:

@inproceedings{mengyuan2022ECCV_DELU,
  author = {Chen, Mengyuan and Gao, Junyu and Yang, Shicai and Xu, Changsheng},
  title = {Dual-Evidential Learning for Weakly-supervised Temporal Action Localization},
  booktitle = {European Conference on Computer Vision (ECCV)},
  year = {2022}
}

License

See MIT License

Acknowledgement

This repo contains modified codes from:

We sincerely thank the owners of all these great repos!