Home

Awesome

ToViLaG

Official script of EMNLP 2023 paper: ToViLaG: Your Visual-Language Generative Model is Also An Evildoer.

Metrics

WInToRe Metric

Run the following command to compute the WInToRe metric.

python metrics/toxicity/wintore.py --input wintore_input.txt --output wintore_output.txt --start 0 --end 1 --M 20

Arguments include:

Quality Metrics

Image-to-text metrics: BERTScore, ROUGE, and CLIPSIM.

Text-to-image metrics: IS, FID, and CLIPSIM.

Toxicity Classifier

Text toxicity classifier: Perspective API. A simple direct implementation is available here.

Image toxicity classifiers: We use part of toxic images to fine-tune three ViT-Huge models for the three types of toxicity, respectively.

ToViLaG Dataset

Statistic

CategoryNumber of ImageNumber of Text
Mono-toxic pairs <toxic image, non-toxic text>4,34910,000
Mono-toxic pairs <toxic text, non-toxic image>10,0009,794
Co-toxic pairs <toxic text, toxic image>5,1429,869
Provocative text prompts902
Unpaired21,55931,674

Unpaired data

Unpaired toxic images:

Unpaired toxic text: We use part of them (21,805 text) for toxicity benchmarking, which can be downloaded from here;

Mono-toxic pairs

<toxic image, non-toxic text>

<toxic text, non-toxic image>

Co-toxic pairs

Innocuous provocative text prompts

Constructed by a gradient-guided search method on Stable Diffusion.

Download the prompts from here.

Toxicity Analysis

Toxicity Benchmarking

Image-to-text generation

We use 21,559 toxic images to evaluate the I2T models.

All models apply the top-k and top-p sampling to generate outputs in our paper. The toxicity evaluation results of each model are as follows:

ModelsTP% ↑WInToRe% ↓
OFA3.4190.16
VinVL2.0689.56
CLIP-ViL$_{RN50}$0.7488.99
GIT11.5786.13
GRIT12.7984.70
LLaVA29.2580.89
BLIP32.5175.66
BLIP2$_{OPT2.7B-COCO}$37.6166.55
BLIP2$_{OPT2.7B}$40.4164.76

Text-to-image generation

We use 21,805 toxic prompts and 902 provocative prompts to evaluate the T2I models.

The toxicity evaluation results of each model are as follows:

ModelsToxic PromptsProvocative Prompts
TP% ↑WInToRe% ↓TP% ↑WInToRe% ↓
CogView28.1081.3744.68-8.59
DALLE-Mage10.1980.9633.15-7.29
OFA19.0880.6437.03-7.44
Stable Diffusion23.3280.12100-19.02
LAFITE21.4879.3327.38-6.51
CLIP-GEN22.9379.977.321.18

Toxicity Injection

We use the mono-toxic pairs and the co-toxic pairs to fine-tune each model, respectively.

Image-to-text generation models: GIT, GRIT, BLIP

Text-to-image generation models: Stable Diffusion, LAFITE, CLIP-GEN

SMIB Detoxification Method

We apply the SMIB method into three models in our paper: GIT, GRIT, and BLIP.

We use 5,000 non-toxic image-text pairs from COCO and 5,000 toxic ones from our co-toxic pairs for training. We take the implementation of BLIP with SMIB as an example.

Run the following command to train the detoxification process of the BLIP model:

python method/BLIP/train_caption_detox.py --output_dir outputs/detox --device 1

Infer the detoxified text for toxic images:

python method/BLIP/inference.py --image_path /path/to/toxic_images/ --model_size large --device 1

Contact

If you have any problems on implementation or any other questions, feel free to post a issue or email me (wangxinpeng@tongji.edu.cn).