Home

Awesome

[CVPR 2021] Actor-Context-Actor Relation Network for Spatio-temporal Action Localization

This repository gives the official PyTorch implementation of Actor-Context-Actor Relation Network for Spatio-temporal Action Localization (CVPR 2021) - 1st place solution of AVA-Kinetics Crossover Challenge 2020. This codebase also provides a general pipeline for training and evaluation on AVA-style datasets, as well as state-of-the-art action detection models.

Junting PanSiyu ChenZheng ShouJing ShaoHongsheng Li
Junting PanSiyu ChenZheng ShouJing ShaoHongsheng Li

Requirements

Some key dependencies are listed below, while others are given in requirements.txt.

Usage

Default values for arguments nproc_per_node, backend and master_port are 8, nccl and 31114 respectively.

python main.py --config CONFIG_FILE [--nproc_per_node N_PROCESSES] [--backend BACKEND] [--master_addr MASTER_ADDR] [--master_port MASTER_PORT]

Running with Multiple Machines

In this case, the master_addr argument must be provided. Moreover, arguments nnodes and node_rank can be additionally specified (similar to torch.distributed.launch), otherwise the program will try to obtain their values from environment variables. See distributed_utils.py for details.

Model Zoo

Trained models are provided in model_zoo/README.md.

To-Do List

License

ACAR-Net is released under the Apache 2.0 license.

CVPR 2020 AVA-Kinetics Challenge

Find slides and video presentation of our winning solution on [Google Slides] [Youtube Video] [Bilibili Video] (Starting from 18:20).

About Our Paper

Find our work on arXiv. architecture-fig

Please cite with the following Bibtex code:

@inproceedings{pan2021actor,
  title={Actor-context-actor relation network for spatio-temporal action localization},
  author={Pan, Junting and Chen, Siyu and Shou, Mike Zheng and Liu, Yu and Shao, Jing and Li, Hongsheng},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages={464--474},
  year={2021}
}

You may also want to refer to our publication with the more human-friendly Chicago style:

Pan, Junting, Siyu Chen, Mike Zheng Shou, Yu Liu, Jing Shao, and Hongsheng Li. "Actor-context-actor relation network for spatio-temporal action localization." In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 464-474. 2021.

Contact

If you have any general question about our work or code which may be of interest to other researchers, please use the public issues section of this repository. Alternatively, drop us an e-mail at siyuchen@pku.edu.cn and junting.pa@gmail.com .