Home

Awesome

Repo of DebugBench

<img src="figs/icon.png" style="width: 20vw; height: auto;" alt="icon">

Overview

Implementation for paper DebugBench: Evaluating Debugging Capabilities of Large Language Models with datasets, prompts, model outputs.

Benchmark

Please refer to the Hugging Face Dataset for the data source and evaluation script if you want to use the benchmark.

DebugBench is a Large Language Model (LLM) debugging benchmark introduced in the paper "DebugBench: Evaluating Debugging Capability of Large Language Models" [url]. We collect code snippets from the LeetCode community and implant bugs into source data with GPT-4.

Repo Content

This repository contains the implementation for benchmark construction and evaluation.

More elements will be added to the repository soon.

Citations

Please cite the paper and star the repo if you use DebugBench and find it helpful.

Feel free to contact trc20@mails.tsinghua.edu.cn or open an issue if you have any questions.

@misc{tian2024debugbench,
      title={DebugBench: Evaluating Debugging Capability of Large Language Models}, 
      author={Runchu Tian and Yining Ye and Yujia Qin and Xin Cong and Yankai Lin and Zhiyuan Liu and Maosong Sun},
      year={2024},
      eprint={2401.04621},
      archivePrefix={arXiv},
      primaryClass={cs.SE}
}