Home

Awesome

DyFADet: Dynamic Feature Aggregation for Temporal Action Detection (ECCV2024)

arXiv preprint PWCPWC

This repository contains the implementation of the paper, 'DyFADet: Dynamic Feature Aggregation for Temporal Action Detection'.

<div align=center><img width="900" height="280" src="https://github.com/yangle15/DyFADet-pytorch/blob/main/pics/fig1.png"/></div>

Installation

  1. Please ensure that you have installed PyTorch and CUDA. (We use Pytorch=1.13.0 and CUDA=11.6 in our experiments.)

  2. After you download the Repo, you need to install the required packages by running the following command:

pip install  -r requirements.txt
  1. Install NMS
cd ./libs/utils
python setup.py install --user
cd ../..

Data Preparation

HACS (SF features)

HACS (VideoMAEv2-g features)

THUMOS14 (I3D features)

THUMOS14 (VideoMAEv2-g features).

ActivityNet 1.3

FineAction

Training

You can train your own model with the provided CONFIG files. The command for train is

CUDA_VISIBLE_DEVICES=0 python train.py ./configs/CONFIG_FILE --output OUTPUT_PATH

You need to select a specific config files corresponding to different datasets. For the config json file, you need to further change the json_file variable to the path of your annotation file, and the feat_folder variable to the path of the downloaded dataset.

All the model can be trained on a single Nvidia RTX 4090 GPU (24GB).

Evaluation

After training, you can test the obtained model by the following command:

CUDA_VISIBLE_DEVICES=0 python eval.py ./configs/CONFIG_FILE PATH_TO_CHECKPOINT

The mean average precision (mAP) results with the pre-trained models (BaiduYun Link) are :

Dataset0.3 /0.5/0.10.7 /0.95AvgConfig
THUMOS14-I3D84.047.969.2thumos_i3d.yaml
THUMOS14-VM2-g84.350.270.5thumos_mae.yaml
ActivityNet-TSP58.18.438.5anet_tsp.yaml
HACS-SF57.811.839.2hacs_slowfast.yaml
HACS-VM2-g64.014.144.3hacs_mae.yaml
FineAction-VM2-g37.15.923.8fineaction.yaml
EPIC-KITCHEN-n28.020.825.0epic_slowfast_noun.yaml
EPIC-KITCHEN-v26.818.523.4epic_slowfast_verb.yaml

Citation

If you find this work useful or use our codes in your own research, please use the following bibtex:

@inproceedings{yang2024dyfadet,
  title={DyFADet: Dynamic Feature Aggregation for Temporal Action Detection},
  author={Yang, Le and Zheng, Ziwei and Han, Yizeng and Cheng, Hao and Song, Shiji and Huang, Gao and Li, Fan},
  booktitle={European Conference on Computer Vision (ECCV)},
  year={2024}
}

Contact

If you have any questions, please feel free to contact the authors.

Ziwei Zheng: ziwei.zheng@stu.xjtu.edu.cn

Le Yang: yangle15@xjtu.edu.cn

Acknowledgments

Our code is built upon the codebase from ActionFormer, TriDet, Detectron2, and many other great Repos, we would like to express our gratitude for their outstanding work.