Awesome

Usage

Environment

build image

cd ./docker
docker build -t cruxeval_x .

run container

cd ./docker
bash run_docker.bash
docker exec -it cruxeval_x_env /bin/bash

Benchmark Construction

before run the benchmark construction, you need to download the deepseekcoder-33b-instruct model to ./model, and replace "your api key", "your base url" and "your model name" with your own.

if you want to run the full pipeline

cd ./cruxeval-x
bash ./script/benchmark_construction.sh

if you want to run only one step, find the script for the specific step in ./script and run it.

Dataset

all the dataset is in ./data, data dir start with "example" is the examples used for few-shot inferences. The final data is in ./data/cruxeval_preprocessed, which you can also download in hugging face.

the data is in the format of json, each line is a json object, the format is:

{
    "id": "The id of each problem, which is in consistent with the cruxeval benchmark. Different languanges with the same id means the same problem.",
    "code": "The code which model need to understand the execution process",
    "input_reasoning": "the check function which replace the input to '????'",
    "output_reasoning": "the check function which replace the output to '????'",
}

Inference

The script for inference is in ./script

for open-source models, you can first download the model to ./model, and then run the script.

cd ./cruxeval-x
bash ./script/inference_vllm.bash

for close-source models, you need to provide the model name, api key and base url, and then run the script.

cd ./cruxeval-x
bash ./script/inference_openai.bash

Submission

Now you have the results of your model on the dataset.

./cruxeval-x/infer_results/${model_name}/: The Result of your LLM.

The next step is to submit a pull request for the project:

Fork the repository into your own GitHub account.
Clone the repository to your local.
Checkout a new branch from main.
Make the results directories above (i.e. ./cruxeval-x/infer_results/${model_name}/).
Submit the Pull Request.
The maintainers will review your Pull Request soon.

./cruxeval-x/infer_results/phi-1 is an example for you to reference.

Once your pull request is accepted, we will update the Leaderboard with your results.