Awesome

LLaVA-PruMerge: Adaptive Token Reduction for Efficient Large Multimodal Models

*Equal Contribution, ^Equal Advising

Note that the core of our proposed module is here in the CLIP image encoder.

Download the checkpoints (LoRA Version) from Yuzhang's Huggingface Homepage to checkpoints/llava-v1.5-7b-lora-prunemerge.

Change the call function of token reduction from here in the CLIP image encoder.

For example, the evaluation for TextVQA is:

CUDA_VISIBLE_DEVICES=7 XDG_CACHE_HOME='/data/shangyuzhang/' bash scripts/v1_5/eval/testvqa.sh

For other inference scripts, refer to LLaVA Evaluation.