Awesome
The Moral Integrity Corpus (MIC)
This repository contains source code for the The Moral Integrity Corpus: A Benchmark for Ethical Dialogue Systems by Caleb Ziems, Jane A. Yu, Yi-Chia Wang, Alon Y. Halevy, Diyi Yang
[Paper] | [Data Use Agreement] | [Data]
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
What is MIC?
Open-domain or "chit-chat" conversational agents often reflect insensitive, hurtful, or contradictory viewpoints that erode a user’s trust in the integrity of the system. Moral integrity is one important pillar for building trust.
MIC
is a dataset that can help us understand chatbot behaviors through their latent values and moral statements. MIC
contains 114k annotations, with 99k distinct "Rules of Thumb" (RoTs) that capture the moral assumptions of 38k chatbot replies to open-ended prompts. These RoTs represent diverse moral viewpoints, with the following distribution of underlying moral foundations:
- Care: wanting someone or something to be safe, healthy, and happy. (58k chatbot replies)
- Fairness: wanting to see individuals or groups treated equally or equitably. (24k)
- Liberty: wanting people to be free to make their own decisions. (22k)
- Loyalty: wanting unity and seeing people keep promises or obligations to an in-group. (22k)
- Authority: wanting to respect social roles, duties, privacy, peace, and order. (20k)
- Sanctity: wanting people and things to be clean, pure, innocent, and holy. (13k)
We can train encoder-decoder models to automatically generate RoT explanations for chatbot behaviors. This could facilitate explainable downstream applications. For example, we could train RL systems that demote chatbot replies which fall into certain moral classes or train safety classifiers that guide systems towards the desired behaviors, with sensitivity towards ideological and political difference.
In the RoT generation task, we find our best models match the quality, fluency, and relevance of human annotations, but they still generate irrelevant RoTs nearly 28% of the time. This suggests that the proposed generation task is not yet solved and that MIC
can continue to serve as resource for ongoing work in developing morally-consistent conversational agents.
Where can I download the data?
If you have not already, please complete our short Data Use Agreement. Then follow this link to download MIC.csv
.
Full Project Pipeline (annotation + experiments + analysis)
0. Project Environment
- CUDA, cudnn
- anaconda
- Prepare separate environment for running GPT-Neo (requires old version of PyTorch) and that for generation
conda create --name neo python=3.7
conda activate neo
pip install -r requirements_neo.txt
conda deactivate
conda activate summarize
pip install -r requirements_summarize.txt
conda deactivate
- Create main project environment
conda create --name morals python=3.7
conda activate morals
pip install -r requirements_morals.txt
- Intstall PyTorch (be sure
cudatoolkit
matches your version of CUDA)
conda install pytorch==1.7.1 torchvision==0.8.2 torchaudio==0.7.2 cudatoolkit=11.0 -c pytorch
1. Build a Collection of Moral QA Pairs
Download data and resources
Download r/AskReddit Q/A data from the public Reddit-QA-Corpus as well as the Expanded Morality Lexicon by running
bash get_data.sh
Filter questions via lexicons
- Extract moral foundations keywords from original Q/A pairs
python find_morals.py --column "questions"
python find_morals.py --column "answers"
- Filter Q/A pairs so that both Q and the Reddit answer have at least one word in the Expanded Morality Lexicon, the Q has a question mark and does not contain the token 'reddit'
python filter_questions.py
Use chatbots to answer AskReddit prompts
Generate answers using BlenderBot. You can use the --start_idx and --end_idx flags to run the generations on only a subset of the prompts and parallelize.
python chatbot_generation/generate_answer.py --model_name blenderbot --indir "data/raw/Reddit-QA-Corpus/filtered/" --outdir "data/prepared/blenderbot_generations/" --use_gpu
Generate answers using DialoGPT. You can use the --start_idx and --end_idx flags to run the generations on only a subset of the prompts and parallelize.
python chatbot_generation/generate_answer.py --model_name dialoGPT --indir "data/raw/Reddit-QA-Corpus/filtered/" --outdir "data/prepared/dialoGPT_generations/" --use_gpu
Generate answers using GPT-Neo. You can use the --start_idx and --end_idx flags to run the generations on only a subset of the prompts and parallelize.
conda activate neo
python generate_answer.py --model_name "gpt-neo" --indir "data/raw/Reddit-QA-Corpus/filtered/" --outdir "data/prepared/gpt-neo_generations/" --use_gpu
Filter QA pairs via lexicons
- Find moral words in the chatbot responses
python find_morals.py --column "blenderbot_A0" --indir "data/prepared/blenderbot_generations"
python find_morals.py --column "dialoGPT_A0" --indir "data/prepared/dialoGPT_generations"
python find_morals.py --column "gpt-neo_A0" --indir "data/prepared/gpt-neo_generations"
- Filter via lexicons
python filter_qa.py --column "blenderbot_A0" --indir "data/prepared/blenderbot_generations"
python filter_qa.py --column "dialoGPT_A0" --indir "data/prepared/dialoGPT_generations"
python filter_qa.py --column "gpt-neo_A0" --indir "data/prepared/gpt-neo_generations"
Run the QAFiltering Task
Prepare input files for HIT using data from data/prepared/{model}_agg/filtered.csv
, here swapping {model}
for "blenderbot" or "dialoGPT" or "gpt-neo", adjusting also the start and end flags.
hit/Task_QAFiltering/prepare_hit.py --model {model} --start_idx 0 --end_idx 100
Train classifiers to filter QA pairs
- Create train / dev / test splits for each of the chatbot distributions by running
hit/Task_QAFiltering/make_splits.py --model {model} --output "qa_filtering_clf/splits"
- Tune classifiers via
tuning_script.sh
bash tuning_script.sh 0 "sufficient" "qa_filtering_clf/splits/blenderbot" &
bash tuning_script.sh 1 "moral" "qa_filtering_clf/splits/blenderbot" &
bash tuning_script.sh 2 "sufficient" "qa_filtering_clf/splits/dialoGPT" &
bash tuning_script.sh 3 "moral" "qa_filtering_clf/splits/dialoGPT" &
bash tuning_script.sh 4 "sufficient" "qa_filtering_clf/splits/gpt-neo" &
bash tuning_script.sh 5 "moral" "qa_filtering_clf/splits/gpt-neo" &
- Train the best performing model using
qa_filtering_clf/sentence_pairs_clf.py
and run it usingpython run_sentence_pairs_clf.py
. For example:
qa_filtering_clf/sentence_pairs_clf.py --batchsize 16 --epochs 5 --lr "5e-5" --gpu 0 --label "dialoGPT_0" --input "qa_filtering_clf/splits/dialoGPT" --output "qa_filtering_clf/results_dialoGPT"
python qa_filtering_clf/run_sentence_pairs_clf.py --input "data/prepared/dialoGPT_generations_agg/filtered.csv" --output "qa_filtering_clf/dialoGPT_filtered_moral_predictions.txt" --sent2 "dialoGPT_A0" --path_to_model "qa_filtering_clf/results_dialoGPT/albert-base-v2-moral-16-5-5.pt"
Use classifier outputs to filter QA pairs
qa_filtering_clf/clf_filtering.py --model "blenderbot"
qa_filtering_clf/clf_filtering.py --model "dialoGPT"
qa_filtering_clf/clf_filtering.py --model "gpt-neo"
2. Run the Main RoT Annotation Task
Prepare hit files (choosing the start_idx
and end_idx
to indicate which portions of the )
python ./hit/Task_QAFiltering/prepare_hit.py --model "blenderbot" --start_idx 0 --end_idx 100
Set up the Political Leaning Personality Test (after you modify the file with the proper API keys)
python ./hit/Qual_PoliticalLeaning/qual_request.py
Set up main Qualification Test (after you modify the file with the proper API keys)
python ./hit/Task_MoralEvaluation/qual_request.py
Run the MoralEvaluation.html
task on MTurk, using input from ./hit/Task_QAFiltering/input
, and saving output with the format ./hit/output/filtered_clf_sufficient_moral_blenderbot_0_100_FINISHED.csv
After running all annotation batches, prepare MIC
python ./hit/Task_MoralEvaluation/prepare_MIC.py --output "./data/mic/MIC.csv"
3. Train RoT Generation Models
Note:
To start at this step, download the MIC Data MIC.csv
from the following link and place it in a new ./data/mic/
directory.
Train the three generative models
CUDA_VISIBLE_DEVICES=0 python -m rot_generation.generate_rots --input "./data/mic/MIC.csv" --output "./rot_generation/output" --epochs 5 --model_checkpoint "gpt2" --gpu 0 --format_string "Q [answ] A [rot] ~ rot"
CUDA_VISIBLE_DEVICES=1 python -m rot_generation.generate_rots --input "./data/mic/MIC.csv" --output "./rot_generation/output" --epochs 5 --model_checkpoint "t5-small" --gpu 1 --format_string "Q [answ] A [rot] ~ rot"
CUDA_VISIBLE_DEVICES=2 python -m rot_generation.generate_rots --input "./data/mic/MIC.csv" --output "./rot_generation/output" --epochs 5 --model_checkpoint "facebook/bart-large" --gpu 2 --format_string "Q [answ] A [rot] ~ rot"
Decode the generative models
bash rot_generation/decode_all.sh "rot_generation/output/*" "./data/mic/MIC.csv" "Q [answ] A [rot] ~ rot" 1 "QA"
Run baseline retrieval generations
python rot_generation/retrieve_rots.py --input "./data/mic/MIC.csv" --output "./rot_generation/output/retrieval_Q_A_TARGET_rot/"
Run social-chem models
First, clone social-chemistry-101 into the root directory and follow instructions to prepare the GPT2-XL RoT model
Then run the following code
python -m rot_generation.generate_rots_decoder --model_directory "social-chemistry-101/output/gpt2-xl_rot_64_5epochs" --input "./data/mic/MIC.csv" --output "rot_generation/output_sc" --format_string "QA [rot] ~ rot" --gpu 3 --top_p 0.9
Compute all results
python -m rot_generation.metrics --input "rot_generation/output/*Q_A_TARGET_rot*" --output "all_results.csv"
4. Run Human Evaluation for RoT Generation
Prepare generations for human evaluation
python ./hit/Task_GeneratedRoTEvaluation/prepare_hit.py --mic "./data/mic/MIC.csv" --input_glob "./rot_generation/output/*Q_A_TARGET_rot*/test_generation*.csv" --output "./hit/Task_GeneratedRoTEvaluation/input/human_eval_25.csv" --nsamples 25
Set up the Qualification Test (after you modify the file with the proper API keys)
python ./hit/Task_GeneratedRoTEvaluation/qual_request.py
Run the GeneratedRoTEvaluation.html
task on MTurk, using input from ./hit/Task_GeneratedRoTEvaluation/input
and save file to ./hit/Task_GeneratedRoTEvaluation/output/human_eval.csv
Print inter-annotator agreements and evaluation results
python ./hit/Task_GeneratedRoTEvaluation/print_eval_results.py --input "./hit/Task_GeneratedRoTEvaluation/output/human_eval.csv"
5. Train RoT Attribute Classification Models
Generate backtranslations
- Use different temperatures and different languages: German (de) and Russion (ru)
Note: this will not work unless you first install
fairseq
according to these instructions
python attribute_clf/backtranslate_to_en.py --temperature 1.2 --k 5 --language "de" --target "rot" --input "data/mic/MIC.csv"
python attribute_clf/backtranslate_to_en.py --temperature 0.7 --k 5 --language "de" --target "rot" --input "data/mic/MIC.csv"
python attribute_clf/backtranslate_to_en.py --temperature 0.9 --k 5 --language "de" --target "rot" --input "data/mic/MIC.csv"
python attribute_clf/backtranslate_to_en.py --temperature 1.2 --k 5 --language "ru" --target "rot" --input "data/mic/MIC.csv"
python attribute_clf/backtranslate_to_en.py --temperature 0.7 --k 5 --language "ru" --target "rot" --input "data/mic/MIC.csv"
python attribute_clf/backtranslate_to_en.py --temperature 0.9 --k 5 --language "ru" --target "rot" --input "data/mic/MIC.csv"
- Join backtranslations
python attribute_clf/join_backtranslations.py
Run For Hyperparameter Tunings
- Run for all learning rates and epochs
CUDA_VISIBLE_DEVICES=0 bash attribute_clf/tuning_script_ordinal.sh 0 "violation-severity" "./data/mic/MIC.csv" "rot" "bert-base-uncased"
CUDA_VISIBLE_DEVICES=0 bash attribute_clf/tuning_script_ordinal.sh 0 "violation-severity" "./data/mic/MIC.csv" "rot" "albert-base-v2"
CUDA_VISIBLE_DEVICES=1 bash attribute_clf/tuning_script_ordinal.sh 1 "rot-agree-collapsed" "./data/mic/MIC.csv" "rot" "bert-base-uncased"
CUDA_VISIBLE_DEVICES=1 bash attribute_clf/tuning_script_ordinal.sh 1 "rot-agree-collapsed" "./data/mic/MIC.csv" "rot" "albert-base-v2"
CUDA_VISIBLE_DEVICES=2 bash attribute_clf/tuning_script_multilabel.sh 2 "moral-vector" "./data/mic/MIC.csv" "rot" "bert-base-uncased"
CUDA_VISIBLE_DEVICES=2 bash attribute_clf/tuning_script_multilabel.sh 2 "moral-vector" "./data/mic/MIC.csv" "rot" "albert-base-v2"
CUDA_VISIBLE_DEVICES=3 bash attribute_clf/tuning_script.sh 3 "A_agrees" "./data/mic/MIC.csv" "QA" "rot" "albert-base-v2"
CUDA_VISIBLE_DEVICES=3 bash attribute_clf/tuning_script.sh 3 "A_agrees" "./data/mic/MIC.csv" "QA" "rot" "bert-base-uncased"
- Find and run the best model for each task (running scripts are just given as an example, but you should swap the hyperparameters to match the best model output from
find_best_model.py
)
Violation Severity
python attribute_clf/find_best_model.py --dir "attribute_clf/cross_validation_violation-severity/" --metric "MSE" --negate --model "bert-base"
python -m attribute_clf.clf_sentence_pairs_ordinal --batchsize 16 --epochs 2 --lr "5e-5" --gpu 0 --label "violation-severity" --input "./data/mic/MIC.csv" --output "attribute_clf/results_violation-severity" --sent1 "rot" --bert_model "bert-base-uncased" --test
Moral Vector
python attribute_clf/find_best_model.py --dir "attribute_clf/cross_validation_moral-vector/" --metric "f1_macro" --model "bert-base"
python -m attribute_clf.clf_sentence_pairs_multilabel --batchsize 16 --epochs 8 --lr "5e-5" --gpu 2 --label "moral-vector" --input "./data/mic/MIC.csv" --output "attribute_clf/results_moral-vector" --sent1 "rot" --label_classes 0 1 2 3 4 5 --bert_model "bert-base-uncased" --test
Answer Alignment
python attribute_clf/find_best_model.py --dir "attribute_clf/cross_validation_A_agrees/" --metric "f1_2" --model "bert-base"
python -m attribute_clf.clf_sentence_pairs --batchsize 8 --epochs 3 --lr "5e-5" --gpu 2 --label "A_agrees" --input "./data/mic/MIC.csv" --output "attribute_clf/results_A_agrees" --sent1 "QA" --sent2 "rot" --bert_model "albert-base-v2" --test
6. Run Human Benchmark for RoT Classification
Prepare labels for human classification
python ./hit/Task_SecondaryMoralEvaluation/prepare_human_benchmark.py --input "./data/mic/MIC.csv" --nsamples 300
7. Compute Inter-annotator agreement for Moral Evaluation
Prepare generations for human evaluation [here {input}
is the name of a file from the primary MoralEvaluation task (e.g. filtered_clf_sufficient_moral_blenderbot_0_100_FINISHED.csv
)]
python ./hit/Task_SecondaryMoralEvaluation/prepare_hit.py --input "{input}"
Set up the Qualification Test (after you modify the file with the proper API keys)
python ./hit/Task_SecondaryMoralEvaluation/qual_request.py
Run the SecondaryMoralEvaluation.html
task on MTurk, using input from ./hit/Task_SecondaryMoralEvaluation/input
and save files to ./hit/Task_SecondaryMoralEvaluation/output
Finally, create cleaned output file
python ./hit/Task_SecondaryMoralEvaluation/prepare_outfile.py --output "./hit/Task_SecondaryMoralEvaluation/secondary_moral_eval_oct23.csv"
8. Generate plots for the paper
Retrieve the political leanings of all workers, here replacing {qual_id}
with the appropriate MTurk QualificationTypeId for the Qual_PoliticalLeaning
task
python ./hit/Qual_PoliticalLeaning/lookup_worker_leanings.py --qual_id "{qual_id}"
Run everything in the jupyter notebook, dataset-statistics.ipynb