Home

Awesome

Object-Oriented Programming Evaluation Benchmark for LLMs.

OOP Benchmark

OOP is a code generation benchmark to <b>quantify the object-oriented programming ability</b> of language Large Language Models (LLMs), and the details can be seen in our paper "OOP: Object-Oriented Programming Evaluation Benchmark for Large Language Models | [HuggingFace Link]". We collect code snippets from the LeetCode, open-source repositories on GitHub, Stack Overflow, and Codewars, and all the test samples have undergone carefully designed post-processing.

We show that 🔎:

Basic Statistics

Performance of widely-used LLMs

<div align="center"> <img width="80%" alt="image" src="https://github.com/alphadl/OOP-eval/blob/main/img/results.jpg"> </div>

Citations

Please cite the paper and star this repo if you use OOP and find it helpful. Feel free to contact wangshuai123@whu.edu.cn or open an issue if you have any questions.

@article{wang2024oop,
  title={OOP: Object-Oriented Programming Evaluation Benchmark for Large Language Models},
  author={Wang, Shuai and Ding, Liang and Shen, Li and Luo, Yong and Du, Bo and Tao, Dacheng},
  journal={arXiv preprint arXiv:2401.06628},
  year={2024}
}