Awesome
š£ PURPLE
Code for the paper PURPLE: Making a Large Language Model a Better SQL Writer.
Dataset Download
- Spider:
./datasets/spider
- Spider-DK:
./datasets/spider_dk
- Spider-SYN:
./datasets/spider_syn
- Spider-Realistic:
./datasets/spider_realistic
Unzip the data and organize into the following format:
spider
āāā database
āāā dev.json
āāā train_spider_pruned.json
āāā tables.json
Environment Build
We publish our docker image for easier experiments reproduction, you can achieve such a image by:
docker pull thren20/purple:v2
docker run -itd --rm --name YOUR_CONTAINER_NAME --mount type=bind,source=PATH_TO_YOUR_CODE,target=/workspace/ thren20/purple:v2
NOTE: The trained models are also included in the docker image.
Of course, you can build such an environment without docker, the packages are included in the requirements.txt. We offer an environment building script as env.sh
for you:
chmod 744 env.sh
bash env.sh
Pipeline
To reproduce the experiments in the paper, we prepare a script for that.
chmod 744 script/infer_pipeline.sh
bash script/infer_pipeline.sh