Awesome
Usage
Environment
build image
cd ./docker
docker build -t cruxeval_x .
run container
cd ./docker
bash run_docker.bash
docker exec -it cruxeval_x_env /bin/bash
Benchmark Construction
before run the benchmark construction, you need to download the deepseekcoder-33b-instruct model to ./model, and replace "your api key", "your base url" and "your model name" with your own.
if you want to run the full pipeline
cd ./cruxeval-x
bash ./script/benchmark_construction.sh
if you want to run only one step, find the script for the specific step in ./script and run it.
Dataset
all the dataset is in ./data, data dir start with "example" is the examples used for few-shot inferences. The final data is in ./data/cruxeval_preprocessed, which you can also download in hugging face.
the data is in the format of json, each line is a json object, the format is:
{
"id": "The id of each problem, which is in consistent with the cruxeval benchmark. Different languanges with the same id means the same problem.",
"code": "The code which model need to understand the execution process",
"input_reasoning": "the check function which replace the input to '????'",
"output_reasoning": "the check function which replace the output to '????'",
}
Inference
The script for inference is in ./script
for open-source models, you can first download the model to ./model, and then run the script.
cd ./cruxeval-x
bash ./script/inference_vllm.bash
for close-source models, you need to provide the model name, api key and base url, and then run the script.
cd ./cruxeval-x
bash ./script/inference_openai.bash
Submission
Now you have the results of your model on the dataset.
./cruxeval-x/infer_results/${model_name}/
: The Result of your LLM.
The next step is to submit a pull request for the project:
- Fork the repository into your own GitHub account.
- Clone the repository to your local.
- Checkout a new branch from main.
- Make the results directories above (i.e.
./cruxeval-x/infer_results/${model_name}/
). - Submit the Pull Request.
- The maintainers will review your Pull Request soon.
./cruxeval-x/infer_results/phi-1
is an example for you to reference.
Once your pull request is accepted, we will update the Leaderboard with your results.