Home

Awesome

Social Chemistry 101

Project

For an overview of the Social Chemistry project, a live demo of the model, and an interactive dataset browser, check out our project webpage.

Paper

This repository is for code accompanying the paper:

Social Chemistry 101: Learning to Reason about Social and Moral Norms <br/> Maxwell Forbes, Jena D. Hwang, Vered Shwartz, Maarten Sap, Yejin Choi <br/> EMNLP 2020

Data

Download the Social-Chem-101 dataset here.

The dataset schema is given in detail below. See the README in the dataset, as well as the appendix of the paper, for substantially more information about the dataset and its collection.

The dataset is licensed under the CC BY-SA 4.0 license.

Pretrained Models

We provide two pretrained Neural Norm Transformers using the GPT-2 architecture: one for rules-of-thumb (RoTs), and one for actions.

ArchitectureRoTAction
GPT2-XLmodelmodel

Here are some example commands to download and extract the RoT model.

# Start from repo root. Model checkpoints are conventionally saved in "output/", though
# the downloaded models will also have an "output/" directory, so we'll pull the files
# of them.
mkdir output/
cd output/
wget https://storage.googleapis.com/ai2-mosaic-public/projects/social-chemistry/models/gpt2-xl_rot_64_5epochs.tar.gz
tar -xzf gpt2-xl_rot_64_5epochs.tar.gz
cd output/
mv gpt2-xl_rot_64_5epochs/ ..
cd ..
rmdir output/
rm gpt2-xl_rot_64_5epochs.tar.gz
cd ..

See below for examples of generating using the models.

Code

Installation

# 1. Setup + activate a fresh python3.7+ virtual environment with your method of choice.
#    (We used pyenv and 3.8.3.) Your particular steps will vary.

# 2. Install pytorch, with CUDA if possible. (We tested with pytorch 1.5.1 and 1.7.0
#    using CUDA 10.1.) Follow https://pytorch.org/get-started/locally/

# 3. Install python dependencies
pip install -r requirements.txt

# 4. Download and extract the dataset .tsv file into `data/dataset/`. Here are some
#    example commands for Linux.
mkdir -p data/dataset
cd data/dataset
wget https://storage.googleapis.com/ai2-mosaic-public/projects/social-chemistry/data/social-chem-101.zip
unzip social-chem-101.zip
mv social-chem-101/* .
rmdir social-chem-101
rm social-chem-101.zip
rm -rf __MACOSX  # extra cruft folder
cd ../..

# NOTE: There will also now be a dataset readme .md file in that folder. It contains
#       detailed  information about the dataset schema, collection, splitting, and more.

Training

The Python code is in the sc/ directory. An example script is given in scripts/train_generative_models.sh, which illustrates how to train models.

By default, the following output locations are used:

... and as such, all are ignored by version control (.gitignore).

Example: Generation

Before you generate, you need pretrained models. You can train the models yourself (previous section), or you can download models we have trained (see above).

We've provided some example scripts for generating. Let's assume you are interested in generating RoTs, and you downloaded the pretrained GPT2-XL RoT model, and it lives at output/gpt2-xl_rot_64_5epochs/. Then you can run:

# See all options for specifying GPU and sampling procedure.
python -m sc.examples.generate --help

# Provide model and use default settings (GPT 0, top-p = 0.9).
python -m sc.examples.generate --model output/gpt2-xl_rot_64_5epochs

This dynamically loads RoT examples from sc/examples/examples.py. You can edit that file to change the generation prompts. It loads from a file (rather than, say, having you type in a situation or attributes directly) to make it easy to experiment with varying an input attribute and seeing how the generations change.

If you'd like to use actions and action attributes (instead of RoTs and RoT attributes), simply specify an action model instead. The example code checks whether "rot" or "action" is in the model's path and loads examples accordingly.

Full Generation and Classification

We've released the code and scripts for training attribute classifiers and running them on model generations. The process is a bit messy. Here are the rough steps:

  1. Train generative models (scripts/train_generative_models.sh, as above)
  1. Generate (scripts/generate_all.sh)
  1. Turn generations (txt) into classifier-friendly format (table) (sc/scripts/output_to_table.py)

  2. Move classifier inputs each into their own directory (sc/scripts/move_classifier_inputs.py)

  3. Train attribute classifiers (scripts/classifier_runs.sh)

  1. Run trained classifiers on model generations (sc/scripts/run_classifier.py)

Note: sc/model/cleaning.py exists. If memory serves, its primary purpose was for generative outputs so malformed they couldn't be subsequently parsed, but it's also possible it was used more broadly to clean up occasional decoding glitches in all outputs.

Note: This was run during experiments for the paper, but unfortunately, the libraries we've used have since broken: the huggingface transformers v2.11 release hardcodes a version of their tokenizers library (v0.7) that fails to build currently because it uses rust and it contains rust code that errors with a modern rust compiler. As such, we haven't tested the above process using the latest repository structure. So, it's highly unlikely the whole pipeline "just works" without some tweaks. Please feel free to file a GitHub issue.

Citation

@conference{forbes2020social,
    title = {Social Chemistry 101: Learning to Reason about Social and Moral Norms,
    author = {Maxwell Forbes and Jena D. Hwang and Vered Shwartz and Maarten Sap and Yejin Choi},
    year = {2020},
    date = {2020-11-16},
    booktitle = {Conference on Empirical Methods in Natural Language Processing (EMNLP)},
}

Dataset format

The dataset (social-chem-101.v1.0.tsv) is tab-separated with the following columns:

columntypedescription
areastr{confessions, dearabby, rocstories, amitheasshole}
mint{1, 3, 5, 50} How many workers did the RoT Breakdown for this RoT. Roughly corresponds to the split, but not exactly. Usually you'll want to use split instead.
splitstr{train, dev, test, dev-extra, test-extra, analysis, none} Which split this RoT belongs to. Much more information on splits are given below.
rot-agreeint|null{0, 1, 2, 3, 4, ""} Worker answer to question "What portion of people probably agree that ${rot}?" If question is unanswered, this value is written as "" to indicate null. The buckets in order are {< 1%, 5% -- 25%, 50%, 75% -- 90%, > 99%}. See the Mturk UI for descriptions of these buckets.
rot-categorizationstrWorker labeled "|" separated list of 0 -- 4 RoT categorizations. Choices: {morality-ethics, social-norms, advice, description}. For example, "social-norms|description". See Mturk UI for full descriptions of these values.
rot-moral-foundationsstrWorker labeled "|" separated list of 0 -- 5 moral foundation axes. Choices: {care-harm, fairness-cheating, loyalty-betrayal, authority-subversion, sanctity-degradation}. For example: "care-harm|fairness-cheating".
rot-char-targetingstr|null{char-none, char-N, ""} where N is in 0 -- 5 (inclusive). Worker answer to the question, "Who is the RoT most likely targeting in the following situation?" Value key: "" means null and the question was not answered; char-none means the worker picked "no one listed;" char-N means that the worker picked character N, a 0-index into the characters column (above).
rot-badint{0, 1} Whether the worker labeled the RoT as "confusing, extremely vague, very low quality, or can't be split into action and judgment."
rot-judgmentstr|nullWorker-written string representing the judgment portion of the RoT. We intended to throw this away; it was used for priming. "" means null; question not answered. For example, "it's bad".
actionstr|nullThe action (conjugated / tweaked substring of RoT), written by the worker. "" means null; question not answered. For example, "taking candy from a baby"
action-agencystr|null{agency, experience, ""} Worker answer to the question "Is the action ${action} something you do or control, or is it something you experience?" where ${action} is the action (previous column) that the worker wrote. "" means null; question not answered.
action-moral-judgmentint|null{-2, -1, 0, 1, 2, ""} Worker answer to the question which best matches the RoT's original judgment (${judgment}) of ${action}?" where both ${judgment} and ${action} are written by the worker (previous columns). "" means null; question not answered. The buckets in order are {very bad, bad, expected/OK, good, very good}. See the Mturk UI for descriptions of these buckets.
action-agreeint|null{0, 1, 2, 3, 4, ""} Worker answer to the question, "What portion of people probably agree that ${action} is ${judgment}?", where both ${action} and ${judgment} are written by workers (previous columns). "" means null; question not answered. The buckets in order are {< 1%, 5% -- 25%, 50%, 75% -- 90%, > 99%}. See the Mturk UI for descriptions of these buckets.
action-legalstr|null{legal, illegal, tolerated, ""} Worker answer to the question, "Where you live, how legal is the action ${action}?" where ${action} is the action written by a Worker (previous column). See Mturk UI for descriptions of these buckets. "" means null; question not answered.
action-pressureint|null{-2, -1, 0, 1, 2, ""} Worker answer to question "How much cultural pressure do you (or those you know) feel about ${action}?" where ${action} was written by the worker (previous column). "" means null; question not answered. The buckets in order are: {strong pressure against, pressure against, discretionary, pressure for, strong pressure for}. See the Mturk UI for descriptions of these buckets.
action-char-involvedstr|null{char-none, char-N, ""} where N is in 0 -- 5 (inclusive). Worker answer to the question, "In this situation, who is most likely to do the action ${action} or its opposite?" where ${action} was written by the worker (previous column). Value key: "" means null and the question was not answered; char-none means the worker picked "no one listed;" char-N means that the worker picked character N, a 0-index into the characters column (above).
action-hypotheticalstr|null{explicit-no, probable-no, hypothetical, probable, explicit, ""}. Worker answer to question "Is that character explicitly doing the action ${action}? Or is it that the action might happen (maybe the RoT was advice)?" "" means null; the question was not answered. Null is provided if they pick "char-none" to the previous question (action-char-involved), because this question is then skipped. See the Mturk UI for descriptions of these buckets.
situationstrText of the situation
situation-short-idstrUnique ID for the situation, shorter and more convenient
rotstrThe rule of thumb written by the worker
rot-idstrID of the rule of thumb. Includes worker ID of RoT author and which RoT it was (1 -- 5).
rot-worker-idstrThe worker who wrote this rule of thumb. (No relation to worker did this RoT breakdown, though it could be the same by coincidence.)
breakdown-worker-idstrThe worker who did this RoT breakdown. (No relation to worker who wrote this RoT, though it could be the same by coincidence.)
n-charactersint1 -- 10 (10 max I've seen; no upper limit). How many characters were identified in the story during the NER mturk task. Minimum is 1, because 1 is the "narrator" who we assume said/wrote the situation. Maximum 6 characters are displayed during this HIT and available for selection (including "narrator").
charactersstr"|" separated list of characters that appeared. 1 -- 6 characters will be shown. For example, "narrator|a family member"

More information about split and m columns:

TODO

Docs, code, and data to port over: