Home

Awesome

paper-PTSKC

This repository contains code and access to the dataset used for our paper titled Prompt-Time Symbolic Knowledge Capture with Large Language Models. This document is intended for researchers, developers and those who would like to build, run, and experiment with paper-PTSKC.

Prerequisites and Dependencies

Installation

mlx-lm is available on PyPI. Please refer to the official MLX documentation and MLX examples for more details on the MLX platform.
To install the Python API, run:

pip install mlx-lm

How To Use

Generating test, train, and validation files

To generate the data/test.jsonl, data/train.jsonl, and data/valid.jsonl files, run the following command:

python scripts/generateTestTrainValid.py

Details about the dataset generation are as follows:

Generating ground-truth file

To generate the results/test_ground_truth.jsonlfile, run the following command:

python scripts/generateGroundTruth.py 

generateGroundTruth.py script processes the data/test.jsonl file and writes the expected prompt response for each user input. The generated ground-turth file will be used in performance evaluations.

Model file

In our work, we utilize the 4-bit quantized and mlx-converted version of the Mistral-7B-Instruct-v0.2 model. All model files must be placed under the Mistral-7B-Instruct-v0.2-4bit-mlx folder located in the main directory of our repository. To replicate our test results accurately, please download the mlx-community/Mistral-7B-Instruct-v0.2-4bit-mlx file from the mlx-community on Hugging Face and ensure it is placed in the specified path.

Fine-tuning

In our paper, we run QLoRA finetuning with following parameters and generated the adapter file adapters_b4_l16_1000.npz. Please use the same naming for the adapter file to be able to run following scripts without any change.

python -m mlx_lm.lora --train --model Mistral-7B-Instruct-v0.2-4bit-mlx --iters 1000 --data ./data --batch-size 4 --lora-layers 16 --adapter-file adapters_b4_l16_1000.npz

Running the benchmarks

The proposed zero-shot prompting, few-shot prompting and fine-tuning methods are implemented in files zeroShot.py, fewShot.py, and fineTunedShot.py, respectively.
The runBenchmarks.py script calls these methods, reading input from data/test.jsonl and writing the results to the results directory.

python scripts/runBenchmarks.py

Evaluation

calculateF1Score.py script compares each method's result file with the ground-truth file and calculates precision, recall and f1-score. All results are written to the evaluation_results.txt file under results directory.

python scripts/calculateF1Score.py

Troubleshooting

pip install jinja2

Cite

@misc{coplu2024prompttime,
      title={Prompt-Time Symbolic Knowledge Capture with Large Language Models}, 
      author={Tolga Çöplü and Arto Bendiken and Andrii Skomorokhov and Eduard Bateiko and Stephen Cobb and Joshua J. Bouw},
      year={2024},
      eprint={2402.00414},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}