Awesome

Detoxifying Language Models Risks Marginalizing Minority Voices

This repository contains the official code for our paper appearing in NAACL 2021.

Read our paper for more information about the experimental setup.

Dependencies

The experiments depend on Pytorch and HuggingFace's Transformer repo. We use the official code of respective papers to replicate their results (e.g., GeDi and PPLM]).

Setup

Create a new Anaconda environment and run the following:

./setup.sh

This will clone the PPLM and GeDi submodules, and install their dependencies.

As PPLM and GeDi require different HuggingFace Transformers versions, this script will also install both version 2.8 and version 3.4 as different pip packages.

Then, add your Perspecitve API key to scripts/score_generations.py if you need to score data/generations.

Getting Started

Each of the controllable generation methods are placed in separate submodule/folders. Specifics of note:

FT contains all of the code for pretraining and DAPT finetuning.
transformers2 is a clone of Transformers 2.8 which is a GeDi dependency.

Examples of how to run training, generation, and evaluation for all the methods are available in the Makefile. Each of these commands references scripts in the scripts/ folder.

scripts/ is organized as follows:

scripts/data-processing contains the scripts used to generate and/or filter training/evaluation data.
scripts/generation conatins the scripts used to perform both prompted and unprompted generation with each of the controllable generation methods.
scripts/ppl contains the scripts used for automated evaluation of model toxicity (perplexity)
scripts/train contains the scripts used to train all of the controllable generation methods.

score_generations.py can be flexibly used on any .txt file with the Perspective API and automatically resumes scoring if an error occurs.

References

Please consider citing our work if you found this code or our paper beneficial to your research.

@inproceedings{Xu2021Detoxifying,
      Title = {Detoxifying Language Models Risks Marginalizing Minority Voices}, 
      Author = {Albert Xu and Eshaan Pathak and Eric Wallace and Suchin Gururangan and Maarten Sap and Dan Klein},
      Booktitle = {North American Chapter of the Association for Computational Linguistics}
      year={2021}
}

Contributions and Contact

This code was developed by Albert Xu, Eric Wallace, and Eshaan Pathak. Contact us at albertxu3@berkeley.edu, ericwallace@berkeley.edu and eshaanpathak@berkeley.edu, respectively.