Home

Awesome

arXiv

Image Hijacks: Adversarial Images can Control Generative Models at Runtime

This is the code for Image Hijacks: Adversarial Images can Control Generative Models at Runtime.

Setup

The code can be run under any environment with Python 3.9 and above.

We use poetry for dependency management, which can be installed following the instructions here.

To build a virtual environment with the required packages, simply run

poetry install

Notes

Training

The images used in our demo were trained using the config in experiments/exp_results_tables/config.py (specifically runs #1 llava1_att_leak.pat_full.eps_8.lr_3e-2 and #5 llava1_att_spec.pat_full.eps_8.lr_3e-2).

To train these images, first download the relevant LLaVA checkpoint:

poetry run python download.py models llava-v1.3-13b-336px

To get the list of jobs (with their job IDs) specified by this config file:

poetry run python experiments/exp_demo_imgs/config.py

To run job ID N without wandb logging:

poetry run python run.py train \
--config_path experiments/exp_demo_imgs/config.py \
--log_dir experiments/exp_demo_imgs/logs \
--job_id N \
--playground

To run job ID N with wandb logging to YOUR_WANDB_ENTITY/YOUR_WANDB_PROJECT:

poetry run python run.py train \
--config_path experiments/exp_results_tables/config.py \
--log_dir experiments/exp_results_tables/logs \
--job_id N \
--wandb_entity YOUR_WANDB_ENTITY \
--wandb_project YOUR_WANDB_PROJECT \
--no-playground

Notes:

Tests

This codebase advocates for expect tests in machine learning, and as such uses @ezyang's expecttest library for unit and regression tests.

To run tests,

poetry run python download.py models blip2-flan-t5-xl
poetry run pytest .

Citation

To cite our work, you can use the following BibTeX entry:

@misc{bailey2023image,
  title={Image Hijacks: Adversarial Images can Control Generative Models at Runtime}, 
  author={Luke Bailey and Euan Ong and Stuart Russell and Scott Emmons},
  year={2023},
  eprint={2309.00236},
  archivePrefix={arXiv},
  primaryClass={cs.LG}
}