Awesome

VLAttack: Multimodal Adversarial Attacks on Vision-Language Tasks via Pre-trained Models

😎This is an official repository of VLAttack: Multimodal Adversarial Attacks on Vision-Language Tasks via Pre-trained Models by Ziyi Yin, Muchao Ye, Tianrong Zhang, Tianyu Du, Jinguo Zhu, Han Liu, Jinghui Chen, Ting Wang and Fenglong Ma.

Release

We now release the attacking codes of the BLIP and CLIP models [12/10]. More codes for attacking different models are coming soon!

Installation

To recreate the environment, run the following command:

$ conda env create -f environment.yaml
$ conda activate VLAttack
$ pip install torch==1.10.0+cu111 torchvision==0.11.0+cu111 torchaudio==0.10.0 -f https://download.pytorch.org/whl/torch_stable.html

Evaluation

We independently constructed codes for attacking each model. For example, if you want to test VLAttack on the BLIP model, you need first enter the corresponding directory:

cd BLIP_attack

and then run the commands according to the README.md in the directory.

Citation

@InProceedings{VLAttack,
author = {Ziyi Yin and Muchao Ye and Tianrong Zhang and Tianyu Du and Jinguo Zhu and Han Liu and Jinghui Chen and Ting Wang and Fenglong Ma},
title = {VLAttack: Multimodal Adversarial Attacks on Vision-Language Tasks via Pre-trained Models},
booktitle = {NeurIPS},
year = {2023}
}

License

VLAttack is released under BSD 3-Clause License. Please see LICENSE file for more information.

Acknowledgements

BLIP

CLIP

Cleverhans