Awesome

🟣 PURPLE

Code for the paper PURPLE: Making a Large Language Model a Better SQL Writer.

Dataset Download

Spider: ./datasets/spider
Spider-DK: ./datasets/spider_dk
Spider-SYN: ./datasets/spider_syn
Spider-Realistic: ./datasets/spider_realistic

Unzip the data and organize into the following format:

spider
├── database
├── dev.json
├── train_spider_pruned.json
└── tables.json

Environment Build

We publish our docker image for easier experiments reproduction, you can achieve such a image by:

docker pull thren20/purple:v2
docker run -itd --rm --name YOUR_CONTAINER_NAME --mount type=bind,source=PATH_TO_YOUR_CODE,target=/workspace/ thren20/purple:v2

NOTE: The trained models are also included in the docker image.

Of course, you can build such an environment without docker, the packages are included in the requirements.txt. We offer an environment building script as env.sh for you:

chmod 744 env.sh
bash env.sh

Pipeline

To reproduce the experiments in the paper, we prepare a script for that.

chmod 744 script/infer_pipeline.sh
bash script/infer_pipeline.sh