Home

Awesome

<div align="center">

🙊 Detoxify

Toxic Comment Classification with ⚡ Pytorch Lightning and 🤗 Transformers

PyPI version GitHub all releases CI testing Lint

</div>

Examples image

News & Updates

22-10-2021: New improved multilingual model & standardised class names

03-09-2021: New improved unbiased model

15-02-2021: Detoxify featured in Scientific American!

14-01-2021: Lightweight models

Description

Trained models & code to predict toxic comments on 3 Jigsaw challenges: Toxic comment classification, Unintended Bias in Toxic comments, Multilingual toxic comment classification.

Built by Laura Hanu at Unitary, where we are working to stop harmful content online by interpreting visual content in context.

Dependencies:

ChallengeYearGoalOriginal Data SourceDetoxify Model NameTop Kaggle Leaderboard Score %Detoxify Score %
Toxic Comment Classification Challenge2018build a multi-headed model that’s capable of detecting different types of of toxicity like threats, obscenity, insults, and identity-based hate.Wikipedia Commentsoriginal98.8698.64
Jigsaw Unintended Bias in Toxicity Classification2019build a model that recognizes toxicity and minimizes this type of unintended bias with respect to mentions of identities. You'll be using a dataset labeled for identity mentions and optimizing a metric designed to measure unintended bias.Civil Commentsunbiased94.7393.74
Jigsaw Multilingual Toxic Comment Classification2020build effective multilingual modelsWikipedia Comments + Civil Commentsmultilingual95.3692.11

It is also noteworthy to mention that the top leadearboard scores have been achieved using model ensembles. The purpose of this library was to build something user-friendly and straightforward to use.

Multilingual model language breakdown

Language SubgroupSubgroup sizeSubgroup AUC Score %
🇮🇹 it849489.18
🇫🇷 fr1092089.61
🇷🇺 ru1094889.81
🇵🇹 pt1101291.00
🇪🇸 es843892.74
🇹🇷 tr1400097.19

Limitations and ethical considerations

If words that are associated with swearing, insults or profanity are present in a comment, it is likely that it will be classified as toxic, regardless of the tone or the intent of the author e.g. humorous/self-deprecating. This could present some biases towards already vulnerable minority groups.

The intended use of this library is for research purposes, fine-tuning on carefully constructed datasets that reflect real world demographics and/or to aid content moderators in flagging out harmful content quicker.

Some useful resources about the risk of different biases in toxicity or hate speech detection are:

Quick prediction

The multilingual model has been trained on 7 different languages so it should only be tested on: english, french, spanish, italian, portuguese, turkish or russian.

# install detoxify

pip install detoxify


from detoxify import Detoxify

# each model takes in either a string or a list of strings

results = Detoxify('original').predict('example text')

results = Detoxify('unbiased').predict(['example text 1','example text 2'])

results = Detoxify('multilingual').predict(['example text','exemple de texte','texto de ejemplo','testo di esempio','texto de exemplo','örnek metin','пример текста'])

# to specify the device the model will be allocated on (defaults to cpu), accepts any torch.device input

model = Detoxify('original', device='cuda')

# optional to display results nicely (will need to pip install pandas)

import pandas as pd

print(pd.DataFrame(results, index=input_text).round(5))

For more details check the Prediction section.

Labels

All challenges have a toxicity label. The toxicity labels represent the aggregate ratings of up to 10 annotators according the following schema:

More information about the labelling schema can be found here.

Toxic Comment Classification Challenge

This challenge includes the following labels:

Jigsaw Unintended Bias in Toxicity Classification

This challenge has 2 types of labels: the main toxicity labels and some additional identity labels that represent the identities mentioned in the comments.

Only identities with more than 500 examples in the test set (combined public and private) are included during training as additional labels and in the evaluation calculation.

Identity labels used:

A complete list of all the identity labels available can be found here.

Jigsaw Multilingual Toxic Comment Classification

Since this challenge combines the data from the previous 2 challenges, it includes all labels from above, however the final evaluation is only on:

How to run

First, install dependencies

# clone project

git clone https://github.com/unitaryai/detoxify

# create virtual env

python3 -m venv toxic-env
source toxic-env/bin/activate

# install project
pip install -e detoxify

# or for training
pip install -e 'detoxify[dev]'

cd detoxify

Prediction

Trained models summary:

Model nameTransformer typeData from
originalbert-base-uncasedToxic Comment Classification Challenge
unbiasedroberta-baseUnintended Bias in Toxicity Classification
multilingualxlm-roberta-baseMultilingual Toxic Comment Classification

For a quick prediction can run the example script on a comment directly or from a txt containing a list of comments.


# load model via torch.hub

python run_prediction.py --input 'example' --model_name original

# load model from from checkpoint path

python run_prediction.py --input 'example' --from_ckpt_path model_path

# save results to a .csv file

python run_prediction.py --input test_set.txt --model_name original --save_to results.csv

# to see usage

python run_prediction.py --help

Checkpoints can be downloaded from the latest release or via the Pytorch hub API with the following names:

model = torch.hub.load('unitaryai/detoxify','toxic_bert')

Importing detoxify in python:


from detoxify import Detoxify

results = Detoxify('original').predict('some text')

results = Detoxify('unbiased').predict(['example text 1','example text 2'])

results = Detoxify('multilingual').predict(['example text','exemple de texte','texto de ejemplo','testo di esempio','texto de exemplo','örnek metin','пример текста'])

# to display results nicely

import pandas as pd

print(pd.DataFrame(results,index=input_text).round(5))

Training

If you do not already have a Kaggle account:


# create data directory

mkdir jigsaw_data
cd jigsaw_data

# download data

kaggle competitions download -c jigsaw-toxic-comment-classification-challenge

kaggle competitions download -c jigsaw-unintended-bias-in-toxicity-classification

kaggle competitions download -c jigsaw-multilingual-toxic-comment-classification

Start Training

Toxic Comment Classification Challenge


# combine test.csv and test_labels.csv
python preprocessing_utils.py --test_csv jigsaw_data/jigsaw-toxic-comment-classification-challenge/test.csv --update_test

python train.py --config configs/Toxic_comment_classification_BERT.json

Unintended Bias in Toxicicity Challenge


python train.py --config configs/Unintended_bias_toxic_comment_classification_RoBERTa_combined.json

Multilingual Toxic Comment Classification

The translated data (source 1 source 2) can be downloaded from Kaggle in french, spanish, italian, portuguese, turkish, and russian (the languages available in the test set).


# combine test.csv and test_labels.csv
python preprocessing_utils.py --test_csv jigsaw_data/jigsaw-multilingual-toxic-comment-classification/test.csv --update_test

python train.py --config configs/Multilingual_toxic_comment_classification_XLMR.json

Monitor progress with tensorboard


tensorboard --logdir=./saved

Model Evaluation

Toxic Comment Classification Challenge

This challenge is evaluated on the mean AUC score of all the labels.


python evaluate.py --checkpoint saved/lightning_logs/checkpoints/example_checkpoint.pth --test_csv test.csv

Unintended Bias in Toxicicity Challenge

This challenge is evaluated on a novel bias metric that combines different AUC scores to balance overall performance. More information on this metric here.


python evaluate.py --checkpoint saved/lightning_logs/checkpoints/example_checkpoint.pth --test_csv test.csv

# to get the final bias metric
python model_eval/compute_bias_metric.py

Multilingual Toxic Comment Classification

This challenge is evaluated on the AUC score of the main toxic label.


python evaluate.py --checkpoint saved/lightning_logs/checkpoints/example_checkpoint.pth --test_csv test.csv

Citation

@misc{Detoxify,
  title={Detoxify},
  author={Hanu, Laura and {Unitary team}},
  howpublished={Github. https://github.com/unitaryai/detoxify},
  year={2020}
}