Awesome

NaturalProofs: Mathematical Theorem Proving in Natural Language

NaturalProofs: Mathematical Theorem Proving in Natural Language
Sean Welleck, Jiacheng Liu, Ronan Le Bras, Hannaneh Hajishirzi, Yejin Choi, Kyunghyun Cho

This repo contains:

The NaturalProofs Dataset
Tokenized task data for mathematical reference retrieval and generation.
Preprocessing NaturalProofs and the task data.
Training and evaluation for mathematical reference retrieval and generation.
Pretrained models for mathematical reference retrieval and generation.

Please cite our work if you found the resources in this repository useful:

@inproceedings{welleck2021naturalproofs,
  title={NaturalProofs: Mathematical Theorem Proving in Natural Language},
  author={Sean Welleck and Jiacheng Liu and Ronan Le Bras and Hannaneh Hajishirzi and Yejin Choi and Kyunghyun Cho},
  booktitle={Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 1)},
  year={2021},
  url={https://openreview.net/forum?id=Jvxa8adr3iY}
}

Quick download

To download and unpack NaturalProofs, use:

pip install gdown
python download.py --naturalproofs --savedir /path/to/savedir

To download and unpack all files that we describe below, use:

python download.py --naturalproofs --tokenized --checkpoint --other --savedir /path/to/savedir

This creates the following file structure:

{savedir}/data   # contains NaturalProofs base data (.json files) and tokenized task data (.pkl files)
{savedir}/ckpt   # contains pretrained model checkpoints
{savedir}/other  # contains precomputed files for evaluation (ref encodings, etc.)

NaturalProofs Dataset

We provide the NaturalProofs Dataset (JSON per domain):

NaturalProofs Dataset [zenodo]	Domain
naturalproofs_proofwiki.json	ProofWiki
naturalproofs_stacks.json	Stacks
naturalproofs_trench.json	Real Analysis textbook
naturalproofs_stein.json (script)	Number Theory textbook

To download NaturalProofs, use:

python download.py --naturalproofs --savedir /path/to/savedir

Combined ProofWiki+Stacks

The download includes an extra combined ProofWiki+Stacks file made with notebooks/merge.ipynb.

Preprocessing

To see the steps used to create each domain of NaturalProofs from raw data, see the following notebooks.
This preprocessing is not needed if you are using a preprocessed dataset provided above.

Domain
ProofWiki	notebook
Stacks	notebook
Real Analysis textbook	notebook
Number Theory textbook	notebook

Mathematical Reference Retrieval and Generation

To use NaturalProofs for the reference retrieval and generation tasks described in the paper, the first step is tokenization.

Tokenized dataset

We tokenize the raw NaturalProofs Dataset into two different formats:

Pairwise: (x, r)
- x theorem (sequence of tokens)
- r reference (sequence of tokens)
- This version is used to train and evaluate the pairwise model.
Sequence: (x, [rid_1, ..., rid_Tx])
- x theorem (sequence of tokens)
- rid_i reference id
- This version is used to train and evaluate the autoregressive and joint models.

We provide the following versions used in the paper (bert-based-cased tokenizer):

Type	Domain	Splits
Pairwise, `bert-base-cased`	Proofwiki	train,valid,test
	Stacks	train,valid,test
	Real Analysis (textbook))	test
	Number Theory (textbook)	test
Sequence, `bert-base-cased`	Proofwiki	train,valid,test
	Stacks	train,valid,test
	Real Analysis (textbook)	test
	Number Theory (textbook)	test

To download and unpack them, use:

python download.py --tokenized --savedir /path/to/savedir

Or use google drive link.

Pretrained Models

We provide the following models used in the paper:

Type		Domain
Pairwise	`bert-base-cased`	Proofwiki
Pairwise	`bert-base-cased`	Stacks
Pairwise	`bert-base-cased`	Proofwiki+Stacks
Joint	`bert-base-cased`	Proofwiki
Joint	`bert-base-cased`	Stacks
Joint	`bert-base-cased`	Proofwiki+Stacks
Autoregressive	`bert-base-cased`	Proofwiki
Autoregressive	`bert-base-cased`	Stacks

To download and unpack them, use:

python download.py --checkpoint --savedir /path/to/savedir

Or use google drive link.

Creating your own tokenized dataset

This step is not needed if you are using a tokenized dataset provided above.
First, setup the code:

python setup.py develop

To create your own tokenized versions:

Pairwise: python naturalproofs/tokenize_pairwise.py
Sequence: python naturalproofs/encoder_decoder/utils.py

Evaluation

We will show you how to run evaluation on the pretrained model checkpoints & associated files.

Setup

We will assume the file structure given by using the download script.

python download.py --naturalproofs --tokenized --checkpoint --other --savedir <SAVE-DIR>

We provide a script which assembles an evaluation command for (model type, domain, task) combinations.
We show example commands below.

python run_analysis.py \
--train-ds-names {proofwiki stacks}+ \              # one or more training domains to choose a model
--eval-ds-names {proofwiki stacks stein trench}+ \  # one or more evaluation domains
--model {pairwise, joint, autoregressive} \ 
--generation \                                      # for generation task (autoregressive or joint models only)
--split {valid, test} \
--gpu <integer GPU id> \
--codedir /path/to/naturalproofs_code \
--datadir <SAVE-DIR>/data \
--ckptdir <SAVE-DIR>/ckpt \
--outdir <SAVE-DIR>/output

To make sure your filepaths line up, please look inside run_analysis.py to see how the --{}dir arguments are used.

Example: pairwise retrieval

python run_analysis.py --train-ds-names proofwiki \
--eval-ds-names proofwiki stein trench \
--model pairwise \
--gpu 1 \
--split test

Example: joint retrieval

python run_analysis.py --train-ds-names proofwiki \
--eval-ds-names proofwiki \
--model joint \
--gpu 1 \
--split test

Example: joint retrieval OOD

For OOD evaluation on stein and trench textbooks, provide reference embeddings from the pairwise model.
These are the __encs.pt files from running the pairwise retrieval evaluation (we provide an example in other/).

python run_analysis.py --model joint \
--train-ds-names proofwiki \
--eval-ds-names stein trench \
--stein-rencs <SAVE-DIR>/other/pairwise__train_proofwiki__eval_stein__test__encs.pt \
--trench-rencs <SAVE-DIR>/other/pairwise__train_proofwiki__eval_trench__test__encs.pt \
--gpu 1 \
--split test

Example: joint retrieval proofwiki+stacks model

To align the model's combined output space with the individual dataset used for evaluation, give a tok2tok.pkl map (we provide an example in other/):

python run_analysis.py --model joint \
--train-ds-names both \
--eval-ds-names proofwiki stacks \
--modeltok2datatok <SAVE-DIR>/other/tok2tok.pkl \
--gpu 1 \
--split test

Note that OOD evaluation (stein or trench) is not implemented for the combined model.

Example: autoregressive retrieval

Without the --generation flag, adjusts settings for retrieval evaluation:

python run_analysis.py --model autoregressive \
--train-ds-names proofwiki \
--eval-ds-names proofwiki \
--gpu 1 \
--split valid

Note that OOD evaluation (stein or trench) is not implemented for the autoregressive model.

Example: autoregressive generation

python run_analysis.py --model autoregressive --generation \
--train-ds-names proofwiki \
--eval-ds-names proofwiki \
--gpu 1 \
--split valid

Training

The provided code supports:

Training a pairwise model
Training an autoregressive or joint model, initialized with pairwise model components (parameters, reference embeddings)

Training a pairwise model

python naturalproofs/model.py --expr-name pairwise \
--datapath /path/to/<DOMAIN>_tokenized__bert-base-cased.pkl \
--default-root-dir /path/to/output

Training a joint model

The joint (and autoregressive) model uses a pairwise checkpoint, and reference encodings for initialization.

The pairwise checkpoint is saved during pairwise training.
The reference encodings are saved in a encs.pt file during pairwise Evaluation.

python naturalproofs/encoder_decoder/model.py \
--datapath /path/to/sequence_<DOMAIN>_tokenized__bert-base-cased.pkl \
--default-root-dir /path/to/output
--pretrained-retrieval-checkpoint /path/to/pairwise__<DOMAIN>.ckpt \
--encs-file /path/to/train_<DOMAIN>__eval_<DOMAIN>__valid__encs.pt \  # obtained from running evaluation on trained pairwise model 
--parallel 1 \
--set-mode 1   # discard duplicates

Our implementation uses the same encoder-decoder architecture for the autoregressive and joint models, considering the joint model as a one-step special case (with KL-div loss). See the Appendix for a discussion on this design decision and technical details.

Training an autoregressive model

python naturalproofs/encoder_decoder/model.py \
--datapath /path/to/sequence_<DOMAIN>_tokenized__bert-base-cased.pkl \
--default-root-dir /path/to/output
--pretrained-retrieval-checkpoint /path/to/pairwise__<DOMAIN>.ckpt \
--encs-file /path/to/train_<DOMAIN>__eval_<DOMAIN>__valid__encs.pt \  # obtained from running evaluation on trained pairwise model 
--parallel 0 \
--set-mode 0   # keep duplicates

Non-neural baselines

TF-IDF example:

python naturalproofs/baselines.py \
--method tfidf \
--datapath /path/to/<DOMAIN>_tokenized__bert-base-cased_200.pkl \
--datapath-base /path/to/naturalproofs_<DOMAIN>.json \
--savedir /path/to/out/

==> /path/to/out/tfidf__eval.pkl

Then use analyze.py to compute metrics:

python naturalproofs/analyze.py \
--method tfidf \
--eval-path /path/to/out/tfidf__eval.pkl \
--datapath-base /path/to/naturalproofs_<DOMAIN>.json 

==> /path/to/out/tfidf__analysis.pkl