Awesome

HiViT (ICLR2023, notable-top-25%)

This is the official implementation of the paper HiViT: A Simple and More Efficient Design of Hierarchical Vision Transformer.

Results

Model	Pretraining data	ImageNet-1K	COCO Det	ADE Seg
MAE-base	ImageNet-1K	83.6	51.2	48.1
SimMIM-base	ImageNet-1K	84.0	52.3	52.8
HiViT-base	ImageNet-1K	84.6	53.3	52.8

Pre-training Models

mae_hivit_base_1600ep.pth

mae_hivit_base_1600ep_ft100ep.pth

Usage

1. Supervised learning on ImageNet-1K.: See supervised/get_started.md for a quick start.

2. Self-supervised learning on ImageNet-1K.: See self_supervised/get_started.md.

3. Object detection: See detection/get_started.md.

4. Semantic segmentation: See segmentation/get_started.md.

Bibtex

Please consider citing our paper in your publications if the project helps your research.

@inproceedings{zhanghivit,
  title={HiViT: A Simpler and More Efficient Design of Hierarchical Vision Transformer},
  author={Zhang, Xiaosong and Tian, Yunjie and Xie, Lingxi and Huang, Wei and Dai, Qi and Ye, Qixiang and Tian, Qi},
  booktitle={International Conference on Learning Representations},
  year={2023},
}