Awesome
Towards Knowledge-Grounded Counter Narrative Generation for Hate Speech
This work aims at generating knowledge-bound counter narratives, using 2 modules, knowledge retrieval module and counter narrative generation module.
Requirements:
Java 1.8+
Solr
Keyphrase digger
transformers
rouge_score
spaCy
Knowledge Retrieval Module
Under KN_CONAN_final_data, we provide final CONAN dataset paired with corresponding silver knowledge. If you wish to prepare your own knowledge repository, check the steps below. Otherwise, skip this section.
- Download CONAN dataset and knowledge repository
- Prepare queries
- Retrieve relevant knowledge
- Select knowledge sentences
1. Download Data
1.1 Hate countering dataset
1.2. Knowledge Repository
We use the following datasets for creating relevant knowledge.
-
Newsroom: Newsroom: A Dataset of 1.3 Million Summaries with Diverse Extractive Strategies.
-
WikiText-103: Pointer sentinel mixture models.
2. Prepare Queries
2.1. Query extraction
We use Keyphrase Digger to extract keyphrase queries for both hate speech and counter narratives in CONAN.
-
- create a txt file for each HS and CN in CONAN, run create_text_file.py
-
- Make sure that the resulting files from i. are stored in the same directory of
run_kd.sh
andKD.jar
from your keyphrase Digger repository after compiling (e.g.KD/KD-Runner/data/CN/
ifrun_kd.sh
andKD.jar
are underKD/KD-Runner/
)
- Make sure that the resulting files from i. are stored in the same directory of
-
- Retrieve keyphrases for HS and CN using Keyphrase Digger, store and run
run_kd.sh
.
- Retrieve keyphrases for HS and CN using Keyphrase Digger, store and run
-
- Extract retrieved keyphrases from iii. and add them in CONAN data using extract_keyphrase.py
2.2. Query generation
We use transformer implementation to train and generate keyphrase queries.
3. Retrieve relevant knowledge
Retrieve relevant knowledge using Solr, run retrieve_kn_solr.py)
Solr is used to index articles in knowledge repository and retrieve relevant knowledge given a query.
Some solr commands:
-
Launch solr: run
solr-8.8.1/bin/solr restart
or./bin/solr restart
-
Index data (e.g., index all articles under
datasets/wikitext/
to knowledge repository called knowledgecollection):bin/post -c knowledgecollection -p 8989 datasets/wikitext/*
-
An example of searching information about islamic faith in the field content from knowledge repository called knowledgecollection:
curl "http://localhost:8989/solr/knowledgecollection/select?q=(content:islamic faith)&rows=10&wt=json"
Check this tutorial on how to install solr, index data and advanced methods for searching data in detail.
4. Select knowledge sentences
- Apply knowledge sentence selector to get the top-N knowledge sentences and save it in a single file, 1 entry per line, run kn_sentence_retriever.py
- Create train, valid, and test data, run create_modelling_data.py.
Counter Narrative Generation Module
- Transformer
- GPT2 (check CN_generation)
- XNLG
- Candela
Multi-domain Knowledge-grounded hate countering dataset
The Gold Knowledge Test Set can be downloaded here, containing hate speech, counter-narrative pairs coupled with relevant backgroud knowledge. It consists of 195 pairs covering multiple hate targets (islamophobia, misogyny, antisemitism, racism, and homophobia).
Citation
For more details on data partition procedure, please see our paper.
@inproceedings{chung-etal-2021-towards,
title = "Towards Knowledge-Grounded Counter Narrative Generation for Hate Speech",
author = "Chung, Yi-Ling and
Tekiro{\u{g}}lu, Serra Sinem and
Guerini, Marco",
booktitle = "Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021",
month = aug,
year = "2021",
address = "Online",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2021.findings-acl.79",
doi = "10.18653/v1/2021.findings-acl.79",
pages = "899--914",
}