Awesome
VisInContext
- VisInContext is a easy way to increase the in-context text length in Multi-modality Learning.
- This work is also complement with existing works to increase in-context text length like FlashAttn, Memory Transformer.
Install
pip install -r requirement.txt
For H100 GPUS, run the following dependencies:
pip install -r requirements_h100.txt
Dataset Preparation
See DATASET.md.
Pre-training
See PRETRAIN.md.
Few-shot Evaluation
See Evaluation.md
Citation
If you find our work helps, please consider cite the following work
@article{wang2024visincontext,
title={Leveraging Visual Tokens for Extended Text Contexts in Multi-Modal Learning},
author={Wang, Alex Jinpeng and Li, Linjie and Lin, Yiqi and Li, Min and Wang, Lijuan and Shou, Mike Zheng},
journal={NeurIPS},
year={2024}
}
Contact
Email: awinyimgprocess at gmail dot com
Acknowledgement
Thanks for these good works. Open-flamingo, Open-CLIP and WebDataset.