Home

Awesome

Cross-modal Consensus Network for Weakly Supervised Temporal Action Localization (ACMM MM 2021)

Fa-Ting Hong^, Jia-Chang Feng^ Dan Xu, Ying Shan, and Wei-Shi Zheng. ^Equation Contribution

<img src='./misc/framework.png' width=800>

Project | Paper

We propose CrOss-modal cOnsensus NETwork (CO2-Net), which introduces two identical proposed cross-modal consensus modules (CCM) that design across-modal attention mechanism to filter out the task-irrelevantinformation redundancy using the global information from themain modality and the cross-modal local information from theauxiliary modality.

Requirements

conda env create -f environment.yaml
pip install -r requirements.txt

Quick Start

python main.py --max-seqlen 500 --lr 0.00005 --k 7 --dataset-name Thumos14reduced --path-dataset path/to/Thumos14 --num-class 20 --use-model CO2  --max-iter 5000  --dataset SampleDataset --weight_decay 0.001 --model-name CO2_3552 --seed 3552 --AWM BWA_fusion_dropout_feat_v2

Prepare DataSet

The features for Thumos14 and ActivityNet1.2 dataset can be downloaded here. The annotations are included with this package.

Train Your Own Model

python main.py --max-seqlen 500 --lr 0.00005 --k 7 --dataset-name Thumos14reduced --num-class 20 --use-model CO2  --max-iter 20000  --dataset SampleDataset --weight_decay 0.001 --model-name CO2 --seed 3552 --AWM BWA_fusion_dropout_feat_v2

Citation

@InProceedings{hong2021cross,
author = {Hong, Fa-Ting and Feng, Jia-Chang and Xu, Dan and Shan, Ying and Zheng, Wei-Shi},
title = {Cross-modal Consensus Network for Weakly Supervised Temporal Action Localization},
booktitle = {ACM International Conference on Multimedia (ACM MM)},
year = {2021}
}