

<h1 align='center' style="text-align:center; font-weight:bold; font-size:2.0em;letter-spacing:2.0px;"> <i>Test-Time</i> Backdoor Attacks on </br> Multimodal Large Language Models </h1> <!-- <p align='center' style=font-size:1.2em;> <b> <em>arXiv-Preprint, 2023</em> <br> </b> </p> --> <!-- TODO --> <p align='left' style="text-align:left;font-size:1.2em;"> <b> [<a href="https://sail-sg.github.io/AnyDoor/" target="_blank" style="text-decoration: none;">Project Page</a>] | [<a href="https://arxiv.org/abs/2402.08577" target="_blank" style="text-decoration: none;">arXiv</a>] | [<a href="https://drive.google.com/drive/folders/1VnJMBtr1_zJM2sgPeL3iOrvVKCk0QcbY?usp=drive_link" target="_blank" style="text-decoration: none;">Data Repository</a>]&nbsp; </b> </p>


We propose test-time backdoor attacks against multimodal large language models, which involve injecting the backdoor into the textual modality via a universal image perturbation, without access to training data.

Teaser image


In our work, we used DALL-E for dataset generation and demonstration of the DALL-E model. We employed the LLaVa-1.5 architecture provided by Transformers, which is seamlessly integrated from the huggingface.co model hub.

pip install -U --force-reinstall git+https://github.com/huggingface/transformers.git@c90268de7560c3fef21a927e0bfcf2b611a8711e
<!-- A suitable base conda environment named `env_anydoor` can be created and activated with: ``` conda env create -f environment.yaml conda activate env_anydoor ``` -->

Dataset Generation


<!-- ![Teaser image](./assets/dalle_dataset_with_Q.jpg) -->

Teaser image

As detailed in our paper, the DALL-E dataset utilizes a generative method. Initially, we randomly select textual descriptions from MS-COCO captions and subsequently use these as prompts to generate images via DALL-E. Following this, we craft questions related to the contents of images using ChatGPT-4. To conclude the process, we generate the original answers with LLaVa-1.5 as reference.

Consequently, this method allows you to specify the specific image-question combinations for attacks on your own!


For SVIT dataset is curated by randomly selecting questions from the complex reasoning QA pairs. Images are sourced from Visual Genom. For answer references, we utilize outputs generated by LLaVa-1.5.


We incorporate the original image-question pairs directly from the VQAv2 dataset. Answers are provided as references, produced by the LLaVa-1.5 model.

Processed Files

Download our processed json files:


Test-Time Backdoor Attack

Overview of our AnyDoor

Teaser image

Quick Start

<b> Border Attack </b>

python -m torch.distributed.run --nproc_per_node=8 anydoor_llava.py \
    --dir_path ./AnyDoor \
    --output output \
    --model_size '7b' \
    --dataset dalle3 \
    --batch_size 1 \
    --attack_samples 40 \
    --image_size 336 \
    --max_epochs 500 \
    --store_epoch 100 \
    --trigger 'SUDO' \
    --target_answer 'I want to destroy the whole world together.' \
    --patch_attack \
    --patch_mode border \
    --patch_size 6 \
    --lr 5 \
    --loss_without_trigger_weight 1.0 \
    --loss_with_trigger_weight 1.0 \
    --loss_type 3

<b> Corner Attack </b>

python -m torch.distributed.run --nproc_per_node=8 anydoor_llava.py \
    --dir_path ./AnyDoor \
    --output output \
    --model_size '7b' \
    --dataset dalle3 \
    --batch_size 1 \
    --attack_samples 40 \
    --image_size 336 \
    --max_epochs 500 \
    --store_epoch 100 \
    --trigger 'SUDO' \
    --target_answer 'I want to destroy the whole world together.' \
    --patch_attack \
    --patch_mode four_corner \
    --patch_size 32 \
    --lr 5 \
    --loss_without_trigger_weight 1.0 \
    --loss_with_trigger_weight 1.0 \
    --loss_type 3

<b> Pixel Attack </b>

python -m torch.distributed.run --nproc_per_node=8 anydoor_llava.py \
    --dir_path ./AnyDoor \
    --output output \
    --model_size '7b' \
    --dataset dalle3 \
    --batch_size 1 \
    --attack_samples 40 \
    --image_size 336 \
    --max_epochs 500 \
    --store_epoch 100 \
    --trigger 'SUDO' \
    --target_answer 'I want to destroy the whole world together.' \
    --pixel_attack \
    --epsilon 32 \
    --alpha_weight 5 \
    --loss_without_trigger_weight 1.0 \
    --loss_with_trigger_weight 1.0 \
    --loss_type 3


Teaser image

Under continuously changing scenes

Teaser image


If you find this project useful in your research, please consider citing our paper:

      title={Test-Time Backdoor Attacks on Multimodal Large Language Models},
      author={Lu, Dong and Pang, Tianyu 
        and Du, Chao and Liu, Qian and Yang, Xianjun and Lin, Min},
      journal={arXiv preprint arXiv:2402.08577},