Home

Awesome

Uni-AdaFocus (TPAMI'24 & ICCV'21/CVPR'22/ECCV'22)

This repo contains the official code and pre-trained models for "Uni-AdaFocus: Spatial-temporal Dynamic Computation for Video Recognition".

Uni-AdaFocus is the latest version of the AdaFocus series.

Contents

Introduction

We explore the phenomenon of spatial redundancy, temporal redundancy and sample-wise redundancy in video understanding and propose Uni-AdaFocus, an efficient end-to-end video recognition framework. Uni-AdaFocus is built on top of AdaFocus, which employs a lightweight encoder and policy network to identify and process the most informative spatial regions in each video frame. Uni-AdaFocus extends AdaFocus by dynamically allocating computation to the most task-relevant frames and minimizing the computational resources spent on easier videos. Uni-AdaFocus is compatible with off-the-shelf efficient backbones (e.g. TSM and X3D), and can markedly improve their inference efficiency. Extensive experiments on seven benchmark datasets (i.e, ActivityNet, FCVID, Mini-Kinetics, Something-Something V1&V2, Jester, and Kinetics-400) and three real-world application scenarios (i.e, fine-grained diving action classification, Alzheimer's and Parkinson's diseases diagnosis with brain magnetic resonance images (MRI), and violence recognition for online videos) substantiate that Uni-AdaFocus is considerably more efficient than the competitive baselines.

<p align="center"> <img src="./figure/intro.png" height= "600"> </p>

Get Started

Setup environment:

conda create -n adafocus python=3.9
conda activate adafocus
conda install pytorch=1.12.1 torchvision=0.13.1 -c pytorch
pip install numpy==1.26.0 tensorboardX
# if you are trying Uni-AdaFocus-X3D, run the following line
pip install iopath simplejson fvcore pytorchvideo psutil matplotlib opencv-python scipy pandas

For reproducing our experimental results, please go to following folders for specific instructions:

For applying Uni-AdaFocus to your own tasks, check this tutorial:

Results

<p align="center"> <img src="./figure/table2.png" width= "850"> </p> <p align="center"> <img src="./figure/fig9.png" width= "850"> </p> <p align="center"> <img src="./figure/fig10.png" width= "850"> </p> <p align="center"> <img src="./figure/sthsth.png" width= "850"> </p> <p align="center"> <img src="./figure/sthsth2.png" width= "850"> </p> <p align="center"> <img src="./figure/k400.png" width= "450"> </p> <p align="center"> <img src="./figure/visual.png" width= "450"> </p> <p align="center"> <img src="./figure/fig15.png" width= "450"> </p> <p align="center"> <img src="./figure/fig14.png" width= "450"> </p>

Reference

If you find our code or papers useful for your research, please cite:

@article{wang2024uniadafocus,
     title = {Uni-AdaFocus: Spatial-temporal Dynamic Computation for Video Recognition},
    author = {Wang, Yulin and Zhang, Haoji and Yue, Yang and Song, Shiji and Deng, Chao and Feng, Junlan and Huang, Gao},
   journal = {IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI)},
      year = {2024},
}

@inproceedings{wang2021adafocus,
     title = {Adaptive Focus for Efficient Video Recognition},
    author = {Wang, Yulin and Chen, Zhaoxi and Jiang, Haojun and Song, Shiji and Han, Yizeng and Huang, Gao},
 booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
      year = {2021}
}

@inproceedings{wang2022adafocusv2,
     title = {AdaFocus V2: End-to-End Training of Spatial Dynamic Networks for Video Recognition},
    author = {Wang, Yulin and Yue, Yang and Lin, Yuanze and Jiang, Haojun and Lai, Zihang and Kulikov, Victor and Orlov, Nikita and Shi, Humphrey and Huang, Gao},
 booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
      year = {2022}
}

@inproceedings{wang2022adafocusv3,
     title = {AdaFocusV3: On Unified Spatial-temporal Dynamic Video Recognition},
    author = {Wang, Yulin and Yue, Yang and Xu, Xinhong and Hassani, Ali and Kulikov, Victor and Orlov, Nikita and Song, Shiji and Shi, Humphrey and Huang, Gao},
 booktitle = {European Conference on Computer Vision (ECCV)},
      year = {2022},
}

Contact

If you have any question, feel free to contact the authors or raise an issue.

Yulin Wang: wang-yl19@mails.tsinghua.edu.cn

Haoji Zhang: zhj24@mails.tsinghua.edu.cn