Home

Awesome

HiViT (ICLR2023, notable-top-25%)

<div align=center><img src="hivit.png", width="60%"></div>

This is the official implementation of the paper HiViT: A Simple and More Efficient Design of Hierarchical Vision Transformer.

Results

ModelPretraining dataImageNet-1KCOCO DetADE Seg
MAE-baseImageNet-1K83.651.248.1
SimMIM-baseImageNet-1K84.052.352.8
HiViT-baseImageNet-1K84.653.352.8

Pre-training Models

mae_hivit_base_1600ep.pth

mae_hivit_base_1600ep_ft100ep.pth

Usage

1. Supervised learning on ImageNet-1K.: See supervised/get_started.md for a quick start.

2. Self-supervised learning on ImageNet-1K.: See self_supervised/get_started.md.

3. Object detection: See detection/get_started.md.

4. Semantic segmentation: See segmentation/get_started.md.

Bibtex

Please consider citing our paper in your publications if the project helps your research.

@inproceedings{zhanghivit,
  title={HiViT: A Simpler and More Efficient Design of Hierarchical Vision Transformer},
  author={Zhang, Xiaosong and Tian, Yunjie and Xie, Lingxi and Huang, Wei and Dai, Qi and Ye, Qixiang and Tian, Qi},
  booktitle={International Conference on Learning Representations},
  year={2023},
}