Awesome

Black Box Adversarial Prompting for Foundation Models

Introduction

This project is a replication and modification of experiments from the paper "Black Box Adversarial Prompting for Foundation Models". It focuses on exploring adversarial prompting in foundation models using Google Colab for its GPU capabilities, as the experiments require significant computational resources.

Installation

Prerequisites

Create a Huggingface account for access tokens here.
Obtain a W&B access token.

Environment Setup

Use Google Colab to leverage GPU support for running experiments. Check if GPU is available with:

import torch

# Check if GPU is available
if torch.cuda.is_available():
    gpu_name = torch.cuda.get_device_name(0)
    print(f"GPU: {gpu_name}")
else:
    print("CPU is being used.")

Install required dependencies:

!pip install transformers torch nltk pandas wandb gpytorch botorch diffusers torchvision

Replication Steps

Text-to-Text Generation

Run the following commands for text-to-text generation with smaller models that are suitable for lower RAM systems:

!time python3 run_text_exp.py --loss_type perplexity --seed 0 --language_model facebook/opt-350m --embedding_model tinybert --seed_text "Explain list comprehension in Python."
!time python3 run_text_exp.py --loss_type perplexity --seed 0 --language_model facebook/opt-125m --embedding_model tinybert --seed_text "Explain list comprehension in Python."

To run the adversarial prompt: It is related on the output observed from the previous command

!time python3 run_text_exp.py --loss_type perplexity --seed 0 --language_model facebook/opt-350m --embedding_model tinybert --seed_text "usc consumer hen finals Explain list comprehension in Python."

Text-to-Image Generation

For text-to-image generation, adjust the query size based on PC requirements. The optimal class (e.g., 'bus') can be changed based on specific needs or as mentioned in the paper:

Unrestricted Prompts

!time python3 image_optimization.py --optimal_class bus --max_allowed_calls_without_progress 1000 --max_n_calls 5000 --seed 0

Restricted Prompts

!time python3 image_optimization.py --optimal_class bus --max_allowed_calls_without_progress 1000 --max_n_calls 5000 --seed 0 --exclude_high_similarity_tokens True

Restricted Prepending Prompts

!time python3 image_optimization.py --optimal_class bus --max_allowed_calls_without_progress 3000 --max_n_calls 10000 --seed 0 --exclude_high_similarity_tokens True --prepend_task True --prepend_task_version 1

To use the Square Attack optimization method, add --square_attack True to the command.

Troubleshooting

Fixing Bugs in Code

Remove PerplexityWithSeedLoss from run_text_exp.py. Replace .cuda() with .cpu() in various files for non-NVIDIA GPUs.

Avoiding W&B Login Prompt

! wandb disabled

Conclusion

Methodology alterations may lead to different results from the original paper. This README provides a guide for replicating experiments under specific technical constraints and computational resources.