Home

Awesome

Data Poisoning Attacks Against Multimodal Encoders

This is the official repository of the paper ''Data Poisoning Attacks Against Multimodal Encoders'' accepted by ICML 2023.

Environment Setting

pip install opencv-python
pip install ftfy regex tqdm
pip install git+https://github.com/openai/CLIP.git
pip install ruamel_yaml pytz python-dateutil matplotlib seaborn

Dataset

Flickr30k, PASCAL, and COCO

First download these datasets from the official website.

Then, you can preprocess the dataset by running python preprocess.py in data folder. To make it easier, you can also use the dataset split here for reproducing the paper: https://drive.google.com/drive/folders/1tUEv-Q0-nLs6WpRttC_f4vvPLO9zliiY?usp=drive_link

Constructing poisoned dataset

python poisoning/dirty_label_poison.py --dataset DATASET --target_txt_label TXT_LABEL --target_img_label IMG_LABEL

Example:

### Flickr-PASCAL dataset ###
python poisoning/dirty_label_poison.py --dataset pascal --target_txt_label sheep --target_img_label aeroplane

Train data on the poisoned dataset

python -m torch.distributed.launch --nproc_per_node=1 --master_port PORT --use_env retrieval_by_CLIP.py --distributed --config CONFIG --poisoned --overload_config --output_dir OUTPUT_DIR --poisoned_file POISONED_FILE --target_txt_cls TXT_LABEL --target_img_cls IMG_LABEL --poisoned_goal POISONED_GOAL

Example:

### Flickr-PASCAL dataset ###
python -m torch.distributed.launch --nproc_per_node=1 --master_port 61201 --use_env retrieval_by_CLIP.py --distributed --config configs/clip_poison_pascal.yaml --poisoned --overload_config --output_dir output/pascal_sheep2aeroplane_1.0/ --poisoned_file poisoned_data/pascal_train_sheep2aeroplane_1.0.json --target_txt_cls sheep --target_img_cls aeroplane --poisoned_goal sheep2aeroplane 

Evaluation

python poisoning/attack_eval_pipeline.py --config CONFIG --dataset DATASET --poisoned_goal POISONED_GOAL -f FEATURE_BASE_PREFIX --poisoned_path POISONED_PATH --output_path OUTPUT_PATH --target_txt_cls TXT_LABEL --target_img_cls IMG_LABEL --output_dir OUTPUT_DIR

Example:

### Flickr-PASCAL dataset ###
python poisoning/attack_eval_pipeline.py --config ./configs/clip_poison_pascal.yaml --dataset pascal --poisoned_goal sheep2aeroplane --poisoned_ratio 1.0 -f pascal_sheep2aeroplane_1.0 --poisoned_path poisoned_data/pascal_train_sheep2aeroplane_1.0.json --output_path ./results/poison_result.csv --target_txt_cls sheep --target_img_cls aeroplane --output_dir output/pascal_sheep2aeroplane_1.0