Home

Awesome

Prompt Pre-Training with Over Twenty-Thousand Classes for Open-Vocabulary Visual Recognition

<h5 align="center"><i>"Scaling up prompt learning on ImageNet-21K achieves SOTA on 21 downstream datasets."</i></h5>

Prompt Pre-Training with Over Twenty-Thousand Classes for Open-Vocabulary Visual Recognition<br> Shuhuai Ren, Aston Zhang, Yi Zhu, Shuai Zhang, Shuai Zheng, Mu Li, Alex Smola, Xu Sun

paper Colab

PWC PWC PWC PWC PWC PWC PWC

:rocket: News

<hr />

Highlights

main figure

Main Contributions

  1. We introduce a prompt pre-training method POMP, which fisrt enables prompt learning on large-scale datasets like ImageNet-21K with over twenty-thousand classes.
  2. POMP is memory and computation efficient. Compared with previous methods like CoOp, it achieves comparable accuracy on ImageNet-1K with only 19% GPU memory and 50% training time.
  3. POMP achieves new SOTAs on various open-vocabulary visual recognition datasets and tasks.

Installation

For installation and other package requirements, please follow the instructions detailed in INSTALL.md.

Data preparation

Please follow the instructions at DATASETS.md to prepare all datasets.

Pre-trained Models

Please follow the instructions at MODELS.md to prepare all pre-trained models.

Training and Evaluation

Please refer to the RUN.md for detailed instructions on training, evaluating and reproducing the results.

<hr />

Contact

If you have any questions, please feel free to create an issue on this repository.

Citation

If you find this code useful for your research, please consider citing:

@article{ren2023pomp,
  title={Prompt Pre-Training with Twenty-Thousand Classes for Open-Vocabulary Visual Recognition},
  author={Ren, Shuhuai and Zhang, Aston and Zhu, Yi and Zhang, Shuai and Zheng, Shuai and Li, Mu and Smola, Alex and Sun, Xu},
  journal={arXiv preprint arXiv:2304.04704},
  year={2023}
}

Acknowledgements

Our code is based on CoOp, MaPLe, Dassl, Detic and ZSSeg repositories. We thank the authors for releasing their code.