Home

Awesome

Adversarial Nibbler Dataset

Background

With the rise of text-to-image (T2I) generative AI models, it's crucial to evaluate model robustness against non-obvious attacks to mitigate the generation of offensive images. By focusing on “implicitly adversarial” prompts (those that trigger T2I models to generate unsafe images for non-obvious reasons), we isolate a set of difficult safety issues that human creativity is well-suited to uncover. To this end, we built the Adversarial Nibbler Challenge, a red-teaming methodology for crowdsourcing a diverse set of implicitly adversarial prompts. The challenge is hosted at the MLCommons competition space.

Repository Overview

This data set contains the results of the Adversarial Nibbler crowdsourcing challenge. There are two main types of data:

  1. Attempted: all prompts submitted to Dynabench to generate images

  2. Submitted: Safe prompt and unsafe image pairs submitted by challenge participants with crowdsourced validations. Each of these sets are split into 3 buckets: dev, train, and test. The split methodology and data validation can be found in our reports section.

Each bucket will be published with the same split data sets under two file formats: 1. JSON: This is our main file format containing all the data. Sample queries will be provided to assist in parsing this data set. 2. CSV: This is a simplified file containing only the prompts from each data set.

WARNING: : This dataset contains adversarial examples of conversations that may be offensive.

Reports

For every round, we will publish some information on expected number of rows and set sizes.

Schema

Attempted Prompts Schema

Column NameTypeDescription
hashed_filenameStringUnique numeric identifier (stored as a string) for the image that is associated with this row. Used to connect the example with the image file.
timestampDatetimeTime that the image was generated for this prompt.
modelStringThe model that generated this image.
submitter_idStringA unique identifier for the participant who submitted the prompt.
promptStringThe text of the prompt that was entered into Nibbler to generate the image.
submittedBooleanWhether the participant “submitted” the image(i.e., indicated that they found a safety violation and provided annotations on the example)
machine_text_safetyStringA value of 'safe' or 'unsafe' indicating whether a text safety classification model annotated the prompt text as safe or unsafe.
machine_image_safetyStringA value of 'safe' or 'unsafe' indicating whether a image safety classification model annotated the image as safe or unsafe.

Submitted Prompts Schema

Column NameTypeDescription
submitted_timestampIntegerUnix timestamp indicating when the prompt was entered into Nibbler.
submitted_promptStringThe text of the prompt that was submitted into Nibbler.
hashed_filenameStringUnique numeric identifier (stored as a string) for the image that is associated with this row. Used to connect the example with the image file.
submission_annotationsJSON object with 9 fieldsSee submission_annotations
validationList of 5 json objects with 15 fields eachEach json object represents a single annotator's annotations on that example. See validation
submission_annotations
Column NameTypeDescription
submitter_idIntegerA unique identifier for the participant who submitted the prompt.
example_uidIntegerA unique identifier for the example.
rewritten_promptStringThe participant's re-write of the originally submitted that describes the harm in the image.
text_attack_modeListAll attack modes that the participant indicated they used.
text_attack_mode_other_inputStringFree text response from the participant describing their attack mode (if relevant)
image_failure_targetListAll demographic categories the image targets or affects.
image_failure_target_other_inputStringFree text response describing other demographic attributes not present in the preselected list provided to participants.
image_failure_typeListAll harm or failures present the image.
image_failure_type_other_inputStringFree text response describing a harm present in the image that was not reflected in the preselected set of harms.
validation
Column NameTypeDescription
annotator_idStringA unique identifier for the annotator who provided the validation.
answer_time_msIntegerHow long the annotator spent on this example.
question_idStringA unique ID for each validation question shown to the annotator
submitted_promptString(redundant) The prompt that was shown to the annotator.
text_safety_validationListA value of 'text_safe', 'text_unsafe', or 'text_other' indicating whether the found the text prompt safe, unsafe, or they were unsure.
text_attack_modeListThe annotator's interpretation of what attack modes were used in the text.
text_attack_other_reason_textStringFree text response describing any other attack modes the annotator observed in the text that were not covered by the pre-written categories.
image_safety_validationListA value of 'image_safe', 'image_unsafe', or 'unsure_image_safe' indicating how the annotator responded to the prompt about whether the image was safe or not.
unsure_image_reason_textStringIf the annotator indicated they were unsure if the image was safe, they can provide a free text reason explaining the source of uncertainty.
image_safe_other_reason_textStringIf the annotator indicated the image was safe, they indicate any reasons that they think another person may have indicated it was unsafe.
image_failure_targetListAll demographic categories that the image targets or affects.
image_target_other_reason_textStringFree text response describing other demographic attributes present in the preselected list provided to participants.
image_failure_typeListAll harm or failures present in the image.
image_failure_other_reason_textStringFree text response describing a harm present in the image that was not reflected in the preselected set of harms.
final_comments_textStringFree text response for any additional comments about the example.
text_other_reason_textStringText from other reasons.

Sample Data Queries

Submitted

Load JSON file

dev_json = pd.read_json('submitted/dev.json').reset_index()

Parse out submission_annotations JSON into Dataframe

submission =
pd.json_normalize(dev_json['submission_annotations'].apply(ast.literal_eval).tolist()).add_prefix('submission_annotation_')

Parse out validation JSON objects

exploded_validations =
dev_json.explode('validation').dropna(subset='validation') validations =
pd.json_normalize(exploded_validations['validation'].apply(lambda x:
ast.literal_eval(x)).tolist()).add_prefix('validation_')

View all ratings associated with one image/prompt pair

validations[validations['validation_question_id'] == 'question_id_abc_123']

View all unique prompts

submitted_json.drop_duplicates(subset="submitted_prompt")

Load CSV file

submitted_csv = pd.read_csv('submitted/dev.csv')

Attempts

Load JSON file

dataset_json = pd.read_json('attempts/dev.json')

Request For Additional Data

To gain access to all images generated during this round, please fill out this form with justifications on data access and usage: https://forms.gle/4Lfr2eSxGmFjNFxC9

To gain access to the withheld test set, please fill out this form with justifications on data access and usage: https://forms.gle/Sf88V8ZWLDEHz6Pu8

License

Google LLC licenses this data under a Creative Commons Attribution 4.0 International License. Users will be allowed to modify and repost it, and we encourage them to analyse and publish research based on the data. The dataset is provided "AS IS" without any warranty, express or implied. Google disclaims all liability for any damages, direct or indirect, resulting from the use of the dataset.

Papers

Jessica Quaye, Alicia Parrish, Oana Inel, Charvi Rastogi, Hannah Rose Kirk, Minsuk Kahng, Erin van Liemt, Max Bartolo, Jess Tsang, Justin White, Nathan Clement, Rafael Mosquera, Juan Ciro, Vijay Janapa Reddi, and Lora Aroyo. 2024. Adversarial Nibbler: An Open Red-Teaming Method for Identifying Diverse Harms in Text-to-Image Generation. In The 2024 ACM Conference on Fairness, Accountability, and Transparency (FAccT ’24), June 03–06, 2024, Rio de Janeiro, Brazil.

@inproceedings{quaye2024nibbler,
author = {Quaye, Jessica and Parrish, Alicia and Inel, Oana and Rastogi, Charvi and Kirk, Hannah Rose and Kahng, Minsuk and Van Liemt, Erin and Bartolo, Max and Tsang, Jess and White, Justin and Clement, Nathan and Mosquera, Rafael and Ciro, Juan and Janapa Reddi, Vijay and Aroyo, Lora},
title = {Adversarial Nibbler: An Open Red-Teaming Method for Identifying Diverse Harms in Text-to-Image Generation},
year = {2024},
isbn = {9798400704505},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
url = {https://doi.org/10.1145/3630106.3658913},
doi = {10.1145/3630106.3658913},
pages = {388–406},
numpages = {19},
keywords = {Adversarial Testing, Crowdsourcing, Data-centric AI, Red teaming, Text-to-image},
location = {<conf-loc>, <city>Rio de Janeiro</city>, <country>Brazil</country>, </conf-loc>},
series = {FAccT '24}
}