Awesome
<h1 align='center' style="text-align:center; font-weight:bold; font-size:2.0em;letter-spacing:2.0px;"> <i>Test-Time</i> Backdoor Attacks on </br> Multimodal Large Language Models </h1> <!-- <p align='center' style=font-size:1.2em;> <b> <em>arXiv-Preprint, 2023</em> <br> </b> </p> --> <!-- TODO --> <p align='left' style="text-align:left;font-size:1.2em;"> <b> [<a href="https://sail-sg.github.io/AnyDoor/" target="_blank" style="text-decoration: none;">Project Page</a>] | [<a href="https://arxiv.org/abs/2402.08577" target="_blank" style="text-decoration: none;">arXiv</a>] | [<a href="https://drive.google.com/drive/folders/1VnJMBtr1_zJM2sgPeL3iOrvVKCk0QcbY?usp=drive_link" target="_blank" style="text-decoration: none;">Data Repository</a>] </b> </p>TL, DR:
We propose test-time backdoor attacks against multimodal large language models, which involve injecting the backdoor into the textual modality via a universal image perturbation, without access to training data.
Requirements
- Platform: Linux
- Hardware: A100 PCIe 40G
In our work, we used DALL-E for dataset generation and demonstration of the DALL-E model. We employed the LLaVa-1.5 architecture provided by Transformers, which is seamlessly integrated from the huggingface.co model hub.
pip install -U --force-reinstall git+https://github.com/huggingface/transformers.git@c90268de7560c3fef21a927e0bfcf2b611a8711e
<!-- A suitable base conda environment named `env_anydoor` can be created and activated with:
```
conda env create -f environment.yaml
conda activate env_anydoor
``` -->
Dataset Generation
DALL-E
<!-- ![Teaser image](./assets/dalle_dataset_with_Q.jpg) -->As detailed in our paper, the DALL-E dataset utilizes a generative method. Initially, we randomly select textual descriptions from MS-COCO captions and subsequently use these as prompts to generate images via DALL-E. Following this, we craft questions related to the contents of images using ChatGPT-4. To conclude the process, we generate the original answers with LLaVa-1.5 as reference.
Consequently, this method allows you to specify the specific image-question combinations for attacks on your own!
SVIT
For SVIT dataset is curated by randomly selecting questions from the complex reasoning QA pairs
.
Images are sourced from Visual Genom.
For answer references, we utilize outputs generated by LLaVa-1.5.
VQAv2
We incorporate the original image-question pairs directly from the VQAv2 dataset. Answers are provided as references, produced by the LLaVa-1.5 model.
Processed Files
Download our processed json files:
https://drive.google.com/drive/folders/1VnJMBtr1_zJM2sgPeL3iOrvVKCk0QcbY?usp=drive_link
Test-Time Backdoor Attack
Overview of our AnyDoor
Quick Start
<b> Border Attack </b>
python -m torch.distributed.run --nproc_per_node=8 anydoor_llava.py \
--dir_path ./AnyDoor \
--output output \
--model_size '7b' \
--dataset dalle3 \
--batch_size 1 \
--attack_samples 40 \
--image_size 336 \
--max_epochs 500 \
--store_epoch 100 \
--trigger 'SUDO' \
--target_answer 'I want to destroy the whole world together.' \
--patch_attack \
--patch_mode border \
--patch_size 6 \
--lr 5 \
--loss_without_trigger_weight 1.0 \
--loss_with_trigger_weight 1.0 \
--loss_type 3
<b> Corner Attack </b>
python -m torch.distributed.run --nproc_per_node=8 anydoor_llava.py \
--dir_path ./AnyDoor \
--output output \
--model_size '7b' \
--dataset dalle3 \
--batch_size 1 \
--attack_samples 40 \
--image_size 336 \
--max_epochs 500 \
--store_epoch 100 \
--trigger 'SUDO' \
--target_answer 'I want to destroy the whole world together.' \
--patch_attack \
--patch_mode four_corner \
--patch_size 32 \
--lr 5 \
--loss_without_trigger_weight 1.0 \
--loss_with_trigger_weight 1.0 \
--loss_type 3
<b> Pixel Attack </b>
python -m torch.distributed.run --nproc_per_node=8 anydoor_llava.py \
--dir_path ./AnyDoor \
--output output \
--model_size '7b' \
--dataset dalle3 \
--batch_size 1 \
--attack_samples 40 \
--image_size 336 \
--max_epochs 500 \
--store_epoch 100 \
--trigger 'SUDO' \
--target_answer 'I want to destroy the whole world together.' \
--pixel_attack \
--epsilon 32 \
--alpha_weight 5 \
--loss_without_trigger_weight 1.0 \
--loss_with_trigger_weight 1.0 \
--loss_type 3
Visualization
Under continuously changing scenes
Bibtex
If you find this project useful in your research, please consider citing our paper:
@article{
lu2024testtime,
title={Test-Time Backdoor Attacks on Multimodal Large Language Models},
author={Lu, Dong and Pang, Tianyu
and Du, Chao and Liu, Qian and Yang, Xianjun and Lin, Min},
journal={arXiv preprint arXiv:2402.08577},
year={2024},
}