Home

Awesome

<p align="center"> <img src="asset/vary-600k.jpg" style="width: 200px" align=center> </p> <p align="center"> <a href="https://pan.baidu.com/s/18Rh53JxvbYYl9BPHoFvWcQ">Vary-600k</a> </p>

Background

Release

Contents

Code License Data License Usage and License Notices: The data, code, and checkpoint are intended and licensed for research use only.

Install

  1. Clone this repository and navigate to the Vary-tiny-600k folder
git clone https://github.com/Ucas-HaoranWei/Vary-tiny-600k.git
cd LAVIS-main
  1. Install Package
pip install -e .
  1. Prepare Pretrain Weights and Data
    • download the OPT-125M here and the SAM-b weights here
    • download the Vary-600k here with code "vary"
    • prepare the dirs as follows:
    image

Train

python -m torch.distributed.run --nproc_per_node=8 --master_port=29501 train.py --cfg-path lavis/projects/varytiny/train/pretrain.yaml

or multi machines

python -m torch.distributed.run --master_addr xxx --master_port xxx --node_rank xxx --nnodes xxx --nproc_per_node xxx  train.py --cfg-path lavis/projects/varytiny/train/pretrain.yaml

If your training goes smoothly, your loss (end of each epoch) will be similar to the following (2×8 H800):

image

Demo

  1. change the "pretrained" and "finetuned" path with your checkpoints in ``LAVIS-main/lavis/configs/models/varytiny/varytiny_inference.yaml'', such as:
  2. image
python tests/models/test_varytiny.py  --image-file  xxx.jpg
  1. We also provide the model weights we trained Vary-tiny upon Vary-600k from scratch: Vary-tiny-600k.pth. Code: "Vary". You can use it and directly run the inference.

Vary-600k

Acknowledgement

Citation

If you find our work useful in your research, please consider citing Vary:

@article{wei2023vary,
  title={Vary: Scaling up the Vision Vocabulary for Large Vision-Language Models},
  author={Wei, Haoran and Kong, Lingyu and Chen, Jinyue and Zhao, Liang and Ge, Zheng and Yang, Jinrong and Sun, Jianjian and Han, Chunrui and Zhang, Xiangyu},
  journal={arXiv preprint arXiv:2312.06109},
  year={2023}
}

@article{wei2024small,
  title={Small Language Model Meets with Reinforced Vision Vocabulary},
  author={Wei, Haoran and Kong, Lingyu and Chen, Jinyue and Zhao, Liang and Ge, Zheng and Yu, En and Sun, Jianjian and Han, Chunrui and Zhang, Xiangyu},
  journal={arXiv preprint arXiv:2401.12503},
  year={2024}
}