Home

Awesome

RedditBias

This repository contains the code and data for bias evaluation with RedditBias (to appear at ACL21). The code for the debiasing approaches and the conversational downstream evaluation can be found here: https://github.com/umanlp/redditbias_debias_conv_ai.

Privacy & Ethics

RedditBias is created from real-world conversations. To protect the users whose comments are included in our data set, we have removed all identifying information, e.g., user names, and kept only the text needed for our analysis. However, if you find your text in our data set and you feel misrepresented being included in this data set, please reach out to us with the following information: comment to be removed & reddit username. Thank you!

How to Use this Code

For bias evaluation with RedditBias, please use Evaluation/measure_bias.py. The rest of the code you can find in this repository documents the data set creation and offers other useful functions.

Data Preparation

The data preparation code is included in the directory - DataPreparation

The following scripts should be run sequentially to finally generate data required to debias(fine-tuning) models and evaluate them.

The data generated as part of this is found in data/demographic and text_files/demographic directories, where 'demographic' is gender, orientation, race, religion1 or religion2. The txt files in folder text_files/ are used for train, validation and evaluation during fine-tuning the DialoGPT model using Debiasing methods.

A brief description of files in data/religion1 is:

Note: The unprocessed reddit comment files could not be uploaded to GitHub due to size constraints. Find it on https://drive.google.com/drive/folders/1FC79WZyuVJRGXf4OzGoX4z84wvwhBxgh?usp=sharing

Language Model Bias (Significance test Bias evaluation)

Generate response from models