Home

Awesome

ViTCoD: Vision Transformer Acceleration via Dedicated Algorithm and Accelerator Co-Design

License: Apache 2.0

Haoran You, Zhanyi Sun, Huihong Shi, Zhongzhi Yu, Yang Zhao, Yongan Zhang, Chaojian Li, Baopu Li and Yingyan Lin

Accepted by HPCA 2023. More Info: [ Paper | Slide | Youtube | Github ]


Why We Consider ViTCoD Given NLP Transformer Accelerators?

This is because there is a large difference between ViTs and Transformers for natural language processing (NLP) tasks: ViTs have a relatively fixed number of input tokens, whose attention maps can be pruned by up to 90% even with fixed sparse patterns, without severely hurting the model accuracy (e.g., <=1.5% under 90% pruning ratio); while NLP Transformers need to handle input sequences of varying numbers of tokens and rely on on-the-fly predictions of dynamic sparse attention patterns for each input to achieve a decent sparsity (e.g., >=50%).

<p align="center"> <img src="./Figures/Vision_vs_NLP.png" width="420"> </p>

Overview of Our ViT Co-Design Framework

We propose a dedicated algorithm and accelerator co-design framework dubbed ViTCoD for accelerating ViTs, i.e., Vision Tranformers.

<p align="center"> <img src="./Figures/overview.png" width="800"> </p>

Usage of the Provided Codebase

For reproducing the results, we provides three kinds of codebases:


Citation

If you find this codebase is useful for your research, please cite:

@inproceedings{you2022vitcod,
  title={ViTCoD: Vision Transformer Acceleration via Dedicated Algorithm and Accelerator Co-Design},
  author={You, Haoran and Sun, Zhanyi and Shi, Huihong and Yu, Zhongzhi and Zhao, Yang and Zhang, Yongan and Li, Chaojian and Li, Baopu and Lin, Yingyan},
  booktitle={The 29th IEEE International Symposium on High-Performance Computer Architecture (HPCA-29)},
  year={2023}
}