

Transferable Visual Prompting for Multimodal Large Language Models


  1. Create the virtual environment for the project.
cd Transferable_VP_MLLM
conda create -n transvp python=3.11
pip install -r requirements.txt
  1. Prepare the model weights

Put the model weights under ./model_weights

To Reproduce Reproduced Results

  1. On CIFAR10
python transfer_cls.py --dataset cifar10 --model_name minigpt-4 --target_models instructblip blip2 --learning_rate 10 --fca 0.005 --tse 0.001 --epochs 1
  1. Inference with a model Specify the path to checkpoint if you want to evaluate on the dataset with trained prompt. A reproducible checkpoint is placed in save/checkpoint_best.pth.
python transfer_cls.py --dataset cifar10 --model_name minigpt-4 --evaluate --checkpoint $PATH_TO_PROMPT