Awesome
[NeurIPS'24]Q-VLM: Post-training Quantization for Large Vision-Language Models
Efficient and accurate memory saving method towards W4A4 large multi-modal models. [Paper]
Q-VLM: Post-training Quantization for Large Vision-Language Models
Changyuan Wang, Ziwei Wang, Xiuwei Xu, Yansong Tang, Jie Zhou, Jiwen Lu
Install
- Clone this repository and navigate to QVLM folder
git clone https://github.com/ChangyuanWang17/QVLM.git
cd QVLM
- Install Package
conda create -n QVLM python=3.10 -y
conda activate QVLM
pip install --upgrade pip # enable PEP 660 support
pip install -e .
- Install additional packages for Q-VLM
pip uninstall bitsandbytes
cd custom_bitsandbytes
python setup.py install
Generate and evaluate SQA response
The following experiments were performed in GeForce RTX 3090 with 24GB memory.
sh scripts/generate_sqa_response.sh
sh scripts/evaluate_sqa_response.sh
Generate and evaluate with multi GPUs
sh scripts/generate_sqa_response_multi.sh
sh scripts/evaluate_sqa_response_multi.sh
Pretrained LVLM Weights
Please check out Model Zoo for all public LLaVA checkpoints, and the instructions of how to use the weights.
ScienceQA
Please check out the documentation here.
Acknowledgement
We thank the authors of following works for opening source their excellent codes.