Home

Awesome

README

Code for: Calibrated Interpretation: Confidence Estimation in Semantic Parsing

Author: Elias Stengel-Eskin

Personal Email: elias.stengel@gmail.com

About the repo

This repo is a fork of this repo, which is itself a fork of a fork of MISO which is a semantic parsing codebase that was released with Joint Universal Syntactic and Semantic Parsing.

MISO was built over the course of the following publications:

It is a flexible sequence-to-graph parsing framework built on top of allennlp.

BenchClamp experiments

Most models in Calibrated Interpretation: Confidence Estimation in Semantic Parsing were run via BenchClamp. This repo contains analysis scripts and MISO, which was one model considered. All other scripts and model code is in this fork of BenchClamp: https://github.com/esteng/semantic_parsing_with_constrained_lm/tree/nl2sql.

Easy and Hard splits

The directory data_subsets contains the easy and hard splits of TreeDST and SMCalFlow described in the paper.

MISO Documentation

Installation

All dependencies can be installed with ./install_requirements.sh

Downloading Data

The first step to replicating experiments is to download the data and glove embeddings.

From the project home directory:

mkdir -p data 
cd data
# This may take some time 
wget https://veliass.blob.core.windows.net/ifl-data/data_clean.tar.gz
tar -xzvf data_clean.tar.gz 
mv data_clean/* .
rm -r data_clean 

File Organization

Important directories:

The main change between different .jsonnet files is the data path at the top. This points the model to the correct data split to use, e.g. data/smcalflow_samples_curated/FindManager/5000_100/ points the model to the 5000 train sample subset with 100 FindManager examples. The assumption is that each experiment has a jsonnet file. For example, the experiment which trains a transformer model with the seed=12 for the 5000-100 FindManager corresponds to the .jsonnet file miso/training_configs/calflow_transformer/FindManager/12_seed/5000_100.jsonnet. In the released configs, the data dir argument is an environment variable

Important Scripts

Training Models

Models can be trained locally using experiments/calflow.sh. experiments/calflow.sh expects the following environment variables to be set: CHECKPOINT_DIR, TRAINING_CONFIG, and DATA_ROOT. DATA_ROOT is the location where you downloaded the data. The former points to a directory where the model will store checkpoints. The latter is a .jsonnet config that will be read by AllenNLP. Optionally, the FXN variable can also be set, for function-specific evaluation.

Model checkpoints and logs will be written to CHECKPOINT_DIR/ckpt. Decoded outputs will be written to CHECKPOINT_DIR/translate_output/<split}>.tgt

For additional details, see miso_docs/TRAINING.md

Testing models

The following environment variables need to set:

  1. CHECKPOINT_DIR: the directory containing a subdirectory ckpt, which contains an archive model.tar.gz. If training is interrupted or canceled, the archive may be missing. It can be created manually by the following commands:
cp best.th weights.th 
tar -czvf model.tar.gz weights.th config.json vocabulary
  1. TEST_DATA is the path to the test data without the extension. An example would be TEST_DATA=data/smcalflow.agent.data/dev_valid.
  2. FXN is the function of interest. Example: FXN=FindManager

The model can then be tested using ./experiments/calflow.sh -a eval_fxn

The output at the end will have the following rows:

Exact Match: The overall exact match accuracy of produced and reference programs. 
FXN Coarse: The percentage of programs for which, if FXN is in the reference, it is also in the predicted program. It doesn't matter if the programs match or not. 
FindManager Fine: The percentage of programs with FXN in the reference where the predicted program is an exact match. 
FindManager Precision: The percentage of predicted programs that have FXN in them and also have FXN in the reference program. 
FindManager Recall: Same as Coarse 
FindManager F1: Harmonic mean of precision and recall 

Getting logits

To get the predicted token logits under a forced decode, see the log_losses function in experiments/calflow.sh. To get token-level predicted probabilities without a forced decode, use eval_calibrate.