Awesome
Vision Transformers with Hierarchical Attention
This work is first titled "Transformer in Convolutional Neural Networks".
Installation
This repository exactly follows the code and the training settings of PVT.
Image classification on the ImageNet-1K dataset
Methods | Size | #Params | #FLOPs | Acc@1 | Pretrained Models |
---|---|---|---|---|---|
HAT-Net-Tiny | 224 x 224 | 12.7M | 2.0G | 79.8 | Google / Github |
HAT-Net-Small | 224 x 224 | 25.7M | 4.3G | 82.6 | Google / Github |
HAT-Net-Medium | 224 x 224 | 42.9M | 8.3G | 84.0 | Google / Github |
HAT-Net-Large | 224 x 224 | 63.1M | 11.5G | 84.2 | Google / Github |
Citation
If you are using the code/models provided here in a publication, please consider citing:
@article{liu2024vision,
title={Vision Transformers with Hierarchical Attention},
author={Liu, Yun and Wu, Yu-Huan and Sun, Guolei and Zhang, Le and Chhatkuli, Ajad and Van Gool, Luc},
journal={Machine Intelligence Research},
volume={21},
pages={670--683},
year={2024},
publisher={Springer}
}
@article{liu2021transformer,
title={Transformer in Convolutional Neural Networks},
author={Liu, Yun and Sun, Guolei and Qiu, Yu and Zhang, Le and Chhatkuli, Ajad and Van Gool, Luc},
journal={arXiv preprint arXiv:2106.03180},
year={2021}
}