Home

Awesome

ViTALiTy: Unifying Low-rank and Sparse Approximation for Vision Transformer Acceleration with a Linear Taylor Attention

License: Apache 2.0

Jyotikrishna Dass*, Shang Wu*, Huihong Shi*, Chaojian Li, Zhifan Ye, Zhongfeng Wang and Yingyan Lin

(*Equal contribution)

Accepted by HPCA 2023. More Info: [ Paper | Slide | GitHub ]


Overview of the ViTALiTy Framework

We propose a low-rank and sparse approximation algorithm and accelerator co-design framework dubbed ViTALiTy.

<p align="center"> <img src="./figures/ViTALiTY-workflow.png" width="800"> </p> <p align = "center"> Fig.1 - ViTALiTy workflow comprising the proposed (Low-Rank) Linear Taylor attention (order, m = 1): (i) Higher-order Taylor terms (m > 1) when added results in vanilla softmax attention score, (ii) Training phase (unifying low-rank and sparse approximation) where higher-order Taylor terms are approximated as Sparse attention (computed using SANGER [28]), and (iii) Inference phase that uses only the (Low-Rank) Linear Taylor attention. </p> <p align="center"> <img src="./figures/TaylorAttentionFlow2.png" width="400"> </p> <p align = "center"> Fig.2 - Computational steps (a) vanilla Softmax Attention and (b) our Taylor attention (see Algorithm 1), where the global context matrix G provides linear computation and memory benefits over the vanilla quadratic QK^T. </p> <p align="center"> <img src="./figures/hardware_overall.png" width="800"> </p> <p align = "center"> Fig.3 - An illustration of our ViTALiTy accelerator, which adopts four memory hierarchies (i.e., DRAM, SRAM, NoC, and Regs) to enhance data locality and multiple chunks/sub-processors consisting of a few pre/post-processors and a systolic array to accelerate dedicated operations. Specifically, the pre-processors include an accumulator array for performing column(token)-wise summation, and a divider array and a adder array for conducting element-wise divisions and additions, respectively; In addition, the systolic array (SA) is partitioned into a smaller sub-array named SA-Diag to compute the matrix and diagonal matrix multiplications considering their smaller number of multiplications, and a larger sub-array dubbed SA-General to process the remaining matrix multiplications. </p>

How to run?

Environment set up

pip install -r requirement.txt

Training (DeiT-Tiny with vanilla softmax)

cd src
python -m torch.distributed.launch --nproc_per_node=8 --use_env main.py --model deit_tiny_patch16_224 --lr 1e-4 --epochs 300 --batch-size 256 --data-path YOUR IMAGENET PATH --output_dir ''

Training (DeiT-Tiny with ViTALiTy)

cd src
python -m torch.distributed.launch --nproc_per_node=8 --use_env main.py --model deit_tiny_patch16_224 --lr 1e-4 --epochs 300 --batch-size 256 --data-path YOUR IMAGENET PATH --output_dir '' --vitality

Inference (DeiT-Tiny with vanilla softmax)

cd src
python -m torch.distributed.launch --nproc_per_node=8 --use_env main.py --model deit_tiny_patch16_224 --lr 1e-4 --batch-size 256 --data-path YOUR IMAGENET PATH --output_dir '' --eval

Inference (DeiT-Tiny with ViTALiTy)

cd src
python -m torch.distributed.launch --nproc_per_node=8 --use_env main.py --model deit_tiny_patch16_224 --lr 1e-4 --batch-size 256 --data-path YOUR IMAGENET PATH --output_dir '' --vitality --eval

Acknowledgment

This codebase is inspired from https://github.com/facebookresearch/deit