Awesome
Dual-Evidential Learning for Weakly-supervised Temporal Action Localization
Mengyuan Chen, Junyu Gao, Shicai Yang, Changsheng Xu
European Conference on Computer Vision (ECCV), 2022.
Update:2024/04/19
We have further optimized the code, and the provided pre-trained model can now achieve the following performance on THUMOS14:
@0.1 | @0.2 | @0.3 | @0.4 | @0.5 | @0.6 | @0.7 | 0.1-0.5 | 0.1-0.7 | |
---|---|---|---|---|---|---|---|---|---|
DELU (Paper) | 71.5 | 66.2 | 56.5 | 47.7 | 40.5 | 27.2 | 15.3 | 56.5 | 46.4 |
DELU (Latest) | 72.1 | 66.5 | 57.0 | 48.1 | 40.8 | 27.8 | 15.6 | 56.9 | 46.8 |
Table of Contents
Introduction
Weakly-supervised temporal action localization (WS-TAL) aims to localize the action instances and recognize their categories with only video-level labels. Despite great progress, existing methods suffer from severe action-background ambiguity, which mainly comes from background noise introduced by aggregation operations and large intra-action variations caused by the task gap between classification and localization. To address this issue, we propose a generalized evidential deep learning (EDL) framework for WS-TAL, called Dual-Evidential Learning for Uncertainty modeling (DELU), which extends the traditional paradigm of EDL to adapt to the weakly-supervised multi-label classification goal. Specifically, targeting at adaptively excluding the undesirable background snippets, we utilize the video-level uncertainty to measure the interference of background noise to video-level prediction. Then, the snippet-level uncertainty is further induced for progressive learning, which gradually focuses on the entire action instances in an ``easy-to-hard'' manner. Extensive experiments show that DELU achieves state-of-the-art performance on THUMOS14 and ActivityNet1.2 benchmarks.
Prerequisites
Requirements and Dependencies:
Here we list our used requirements and dependencies.
- Linux: Ubuntu 20.04 LTS
- GPU: GeForce RTX 3090
- CUDA: 11.1
- Python: 3.7.11
- PyTorch: 1.11.0
- Numpy: 1.21.2
- Pandas: 1.3.5
- Scipy: 1.7.3
- Wandb: 0.12.11
- Tqdm: 4.64.0
THUMOS-14 Dataset:
We use the 2048-d features provided by MM 2021 paper: Cross-modal Consensus Network for Weakly Supervised Temporal Action Localization. You can get access of the dataset from Google Drive or Baidu Disk. The annotations are included within this package.
ActivityNet-v1.2 Dataset:
We also use the features provided in MM2021-CO2-Net. The features can be obtained from here. The annotations are included within this package.
Testing
Download the pretrained models from Google Drive, and put them into "./download_ckpt/".
Test on THUMOS-14
Change "path/to/CO2-THUMOS-14" in the script into your own path to the dataset, and run:
cd scripts/
./test_thumos.sh
Test on ActivityNet-v1.2
Change "path/to/CO2-ActivityNet-12" in the script into your own path to the dataset, and run:
cd scripts/
./test_activitynet.sh
Training
Change the dataset paths as stated above, and run:
cd scripts/
./train_thumos.sh
or
cd scripts/
./train_activitynet.sh
Citation
If you find the code useful in your research, please cite:
@inproceedings{mengyuan2022ECCV_DELU,
author = {Chen, Mengyuan and Gao, Junyu and Yang, Shicai and Xu, Changsheng},
title = {Dual-Evidential Learning for Weakly-supervised Temporal Action Localization},
booktitle = {European Conference on Computer Vision (ECCV)},
year = {2022}
}
License
See MIT License
Acknowledgement
This repo contains modified codes from:
- MM2021-CO2-Net: for implementation of the backbone CO2-Net (MM2021).
- DEAR: for implementation of the EDL loss utilized in DEAR.
We sincerely thank the owners of all these great repos!