Home

Awesome

<div align="center">

【ICCV2023】ALIP: Adaptive Language-Image Pre-training with Synthetic Caption

Author: Kaicheng Yang, Jiankang Deng, Xiang An, Jiawei Li, Ziyong Feng, Jia Guo, Jing Yang, Tongliang Liu

arXiv <img alt="PyTorch" src="https://img.shields.io/badge/PyTorch-ee4c2c?logo=pytorch&logoColor=white"> arXiv

</div>

Introduction

Adaptive Language-Image Pre-training (ALIP) is a bi-path model that integrates supervision from both raw text and synthetic caption. As the core components of ALIP, the Language Consistency Gate (LCG) and Description Consistency Gate (DCG) dynamically adjust the weights of samples and image-text/caption pairs during the training process. Meanwhile, the adaptive contrastive loss can effectively reduce the impact of noise data and enhances the efficiency of pre-training data. teaser

📣 News

Instructions

Acknowledgement

This project is based on open_clip and OFA, thanks for their works.

License

This project is released under the MIT license. Please see the LICENSE file for more information.

Citation

If you find this repository useful, please use the following BibTeX entry for citation.

@misc{yang2023alip,
      title={ALIP: Adaptive Language-Image Pre-training with Synthetic Caption}, 
      author={Kaicheng Yang and Jiankang Deng and Xiang An and Jiawei Li and Ziyong Feng and Jia Guo and Jing Yang and Tongliang Liu},
      year={2023},
      eprint={2308.08428},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

🌟Star History

Star History Chart