Home

Awesome

Data and Code Release for "On the Potential of Lexico-logical Alignments for Semantic Parsing to SQL Queries"

What are included

Licenses

Our code has MIT license. The evalutor contains modified code from mistic-sql-parser by Damien "Mistic" Sorel and Andrew Kent.

The Squall dataset has CC BY-SA 4.0 license and is build upon WikiTableQuestions by Panupong Pasupat and Percy Liang.

Requirements

Setting Up

After cd scripts, run python make_splits.py to generate the train-dev splits used in our experiments; ./download_corenlp.sh will download and unzip the corresponding CoreNLP version.

To set up the evaluator, cd eval, and then run npm install file:sql-parser and npm install express.

To set up the python dependencies, run pip install -r requirements.txt.

Model Training and Testing

Make sure the evaluator service is running before performing any model training or testing. To do so, cd eval and run node evaluator.js. This will spawn a local service (default port 3000) that allows communication with the python model code to convert the (slightly) underspecified SQL queries into SQL queries fully-executable on our pre-processed databases.

Next, cd model and then run python main.py to train a baseline model with LSTM encoder, additional options to include our model variations:

Once the model is trained, run python main.py --test to make predictions on the WTQ test set.

See model/main.py for command-line arguments to specify training file, dev file, test file, model saving location, etc.

Squall Dataset Format

The dataset is located at data/squall.json as a single JSON file. The file is a list of dictionaries, each corresponding to one annotated data instance with the following fields:

Release History

Reference

If you make use of our code or data for research purposes, we'll appreciate your citing the following:

@inproceedings{Shi:Zhao:Boyd-Graber:Daume-III:Lee-2020,
	Title = {On the Potential of Lexico-logical Alignments for Semantic Parsing to {SQL} Queries},
	Author = {Tianze Shi and Chen Zhao and Jordan Boyd-Graber and Hal {Daum\'{e} III} and Lillian Lee},
	Booktitle = {Findings of EMNLP},
	Year = {2020},
}