Awesome
<h1 style="text-align: center"> PECC - Problem Extraction and Coding Challenges </h1>Complementary repository for the paper "PECC: A Dataset for Problem Extraction and Coding Challenges" by Patrick Haller, Jonas Golde and Alan Akbik.
<p align="center" style="font-style: italic"> Our paper got accepted at LREC-Coling 2024! 🥳 </p><p align="center"> <a href="https://huggingface.co/datasets/PatrickHaller/pecc"> 🤗 Dataset</a> </br> <a href="https://huggingface.co/spaces/PatrickHaller/pecc-leaderboard">🏅 Leaderboard</a> </br> <a href="https://hallerpatrick.github.io/pecc/"> 📝 Blog Post</a> </br> <a href="https://arxiv.org/abs/2404.18766">📄 Paper</a> </p>
Setup
Create a virtual environment and install the requirements.
python3 -m venv venv
source venv/bin/activate
python -m pip install -r requirements.txt
Depending on the model in use, you will need to provide the respective API keys, e.g. for the OpenAI model.
export OPENAI_API_KEY="your-api-key"
Usage
The evalation script provides several arguments to configure the evaluation.
usage: main.py [-h] [--model MODEL] [--subset SUBSET] [--story] [--venv-path VENV_PATH] [--output-file OUTPUT_FILE] [--instruct] [--kpass KPASS]
optional arguments:
-h, --help show this help message and exit
--model MODEL Model to use ['gpt-3.5-turbo-16k', 'gpt-3.5-turbo-turbo', 'vertex_ai/chat-bison', 'vertex_ai/codechat-bison', 'vertex_ai/gemini-pro', 'vertex_ai/gemini-1.5-pro', 'WizardCoder-34B', 'mixtral', 'claude-3-opus', 'claude-3-sonnet', 'claude-3-haiku']
--subset SUBSET Subset of the dataset to use (euler, aoc)
--story Use the story subset of the dataset
--venv-path VENV_PATH
Path to the venv
--output-file OUTPUT_FILE
Path to output file
--instruct Only run the instruction
--kpass KPASS Number of passes to use
[!NOTE] The generated code will be executed by the python environment provide, while we didn't experience any issues, we cannot guarantee the safety of the code.
Download original AoC subset
Due to licensing restrictions, we cannot provide the original data from the AoC dataset. However, you can download the original AoC dataset with a bash script we provide.
Following requirements are needed:
- First, you need to have a registered account in the AoC website.
- You need to complete the AoC challenges from 2015 to 2022 to download the respective challenges
- You need to install the
aoc
CLI tool from here and have thjq
andsponge
tool installed. - Following the
aoc
documentation, a session token is needed which can be obtained from the AoC website after login.
export ADVENT_OF_CODE_SESSION="your-session-token"
bash tools/download_puzzles.sh
Per default, the script will download the AoC challenges from 2015 to 2022 and merge it into
the dataset/aoc_lite
directory. Refer to the script for more details.
Self-Hosting for Evaluation
The pipeline uses LiteLLM and Langchain and the OpenAI Completions API.
To use a custom hostet model update the model
map in src/llm.py
. For self-hosting we used vLLM.
- Run the model with vLLM:
python -m vllm.entrypoints.openai.api_server --model googel/gemma-7b-it
# Running on http://0.0.0.0:8000
- Setup client in
src/llm.py
return VLLMOpenAI(
openai_api_key="EMPTY",
openai_api_base="http://0.0.0.0:8000/v1",
model_name=model,
max_tokens=2048,
)
Reported Results
All results reported in the paper can be found in the paper_results
folder. Which contains the
raw output of the evaluation script for all models.
To produce a LaTeX table, run the following command:
python src.eval.py --results-folder paper_results