Awesome
<div align="center"> <h1>๐จโ๐ป Awesome Code LLM</h1> <a href="https://awesome.re"> <img src="https://awesome.re/badge.svg" alt="Awesome"> </a> <a href="https://img.shields.io/badge/PRs-Welcome-red"> <img src="https://img.shields.io/badge/PRs-Welcome-red" alt="PRs Welcome"> </a> <a href="https://img.shields.io/github/last-commit/huybery/Awesome-Code-LLM?color=green"> <img src="https://img.shields.io/github/last-commit/huybery/Awesome-Code-LLM?color=green" alt="Last Commit"> </a> </div>ย
๐ How to Contribute
Contributions are welcome! If you have any resources, tools, papers, or insights related to Code LLMs, feel free to submit a pull request. Let's work together to make this project better!
ย
News
- ๐ฅ๐ฅ๐ฅ [2024-11-12] Qwen2.5-Coder series are released, offering six model sizes (0.5B, 1.5B, 3B, 7B, 14B, 32B), with Qwen2.5-Coder-32B-Instruct now the most powerful open-source code model.
- ๐ฅ๐ฅ [2024-11-08] OpenCoder: The Open Cookbook for Top-Tier Code Large Language Models is released.
ย
๐งต Table of Contents
- ๐งต Table of Contents
- ๐ Top Code LLMs
- ๐ก Evaluation Toolkit
- ๐ Awesome Code LLMs Leaderboard
- ๐ Awesome Code LLMs Papers
- ๐ Contributors
- Cite as
- Acknowledgement
- Star History
ย
๐ Top Code LLMs
Sort by HumanEval Pass@1
ย
๐ก Evaluation Toolkit:
- bigcode-evaluation-harness: A framework for the evaluation of autoregressive code generation language models.
- code-eval: A framework for the evaluation of autoregressive code generation language models on HumanEval.
- SandboxFusion: A secure sandbox for running and judging code generated by LLMs.
ย
๐ Awesome Code LLMs Leaderboard
Leaderboard | Description |
---|---|
Evalperf Leaderboard | Evaluating LLMs for Efficient Code Generation. |
Aider Code Editing Leaderboard | Measuring the LLMโs coding ability, and whether it can write new code that integrates into existing code. |
BigCodeBench Leaderboard | BigCodeBench evaluates LLMs with practical and challenging programming tasks. |
LiveCodeBench Leaderboard | Holistic and Contamination Free Evaluation of Large Language Models for Code. |
Big Code Models Leaderboard | Compare performance of base multilingual code generation models on HumanEval benchmark and MultiPL-E. |
BIRD Leaderboard | BIRD contains over 12,751 unique question-SQL pairs, 95 big databases with a total size of 33.4 GB. It also covers more than 37 professional domains, such as blockchain, hockey, healthcare and education, etc. |
CanAiCode Leaderboard | CanAiCode Leaderboard |
Coding LLMs Leaderboard | Coding LLMs Leaderboard |
CRUXEval Leaderboard | CRUXEval is a benchmark complementary to HumanEval and MBPP measuring code reasoning, understanding, and execution capabilities! |
EvalPlus Leaderboard | EvalPlus evaluates AI Coders with rigorous tests. |
InfiBench Leaderboard | InfiBench is a comprehensive benchmark for code large language models evaluating model ability on answering freeform real-world questions in the code domain. |
InterCode Leaderboard | InterCode is a benchmark for evaluating language models on the interactive coding task. Given a natural language request, an agent is asked to interact with a software system (e.g., database, terminal) with code to resolve the issue. |
Program Synthesis Models Leaderboard | They created this leaderboard to help researchers easily identify the best open-source model with an intuitive leadership quadrant graph. They evaluate the performance of open-source code models to rank them based on their capabilities and market adoption. |
Spider Leaderboard | Spider is a large-scale complex and cross-domain semantic parsing and text-to-SQL dataset annotated by 11 Yale students. The goal of the Spider challenge is to develop natural language interfaces to cross-domain databases. |
ย
๐ Awesome Code LLMs Papers
๐ Awesome Code Pre-Training Papers
ย
๐ณ Awesome Code Instruction-Tuning Papers
Title | Venue | Date | Code | Resources |
---|---|---|---|---|
<br> Magicoder: Source Code Is All You Need <br> | ICML'24 | 2023.12 | Github | HF |
<br> OctoPack: Instruction Tuning Code Large Language Models <br> | ICLR'24 | 2023.08 | Github | HF |
<br> WizardCoder: Empowering Code Large Language Models with Evol-Instruct <br> | Preprint | 2023.07 | Github | HF |
<br> Code Alpaca: An Instruction-following LLaMA Model trained on code generation instructions <br> | Preprint | 2023.xx | Github | HF |
ย
๐ฌ Awesome Code Alignment Papers
Title | Venue | Date | Code | Resources |
---|---|---|---|---|
ProSec: Fortifying Code LLMs with Proactive Security Alignment <br> | Preprint | 2024.11 | - | - |
PLUM: Preference Learning Plus Test Cases Yields Better Code Language Models <br> | Preprint | 2024.06 | - | - |
PanGu-Coder2: Boosting Large Language Models for Code with Ranking Feedback <br> | Preprint | 2023.07 | - | - |
<br> RLTF: Reinforcement Learning from Unit Test Feedback <br> | Preprint | 2023.07 | Github | - |
<br> Execution-based Code Generation using Deep Reinforcement Learning <br> | TMLR'23 | 2023.01 | Github | - |
<br> CodeRL: Mastering Code Generation through Pretrained Models and Deep Reinforcement Learning <br> | NeurIPS'22 | 2022.07 | Github | - |
ย
๐ Awesome Code Prompting Papers
Title | Venue | Date | Code | Resources |
---|---|---|---|---|
<br> From Code to Correctness: Closing the Last Mile of Code Generation with Hierarchical Debugging <br> | Preprint | 2024.10 | Github | - |
<br> Hierarchical Context Pruning: Optimizing Real-World Code Completion with Repository-Level Pretrained Code LLMs <br> | AAAI'25 | 2024.06 | Github | - |
<br> Debug like a Human: A Large Language Model Debugger via Verifying Runtime Execution Step-by-step <br> | ACL'24 | 2024.02 | Github | - |
SelfEvolve: A Code Evolution Framework via Large Language Models <br> | Preprint | 2023.06 | - | - |
<br> Demystifying GPT Self-Repair for Code Generation <br> | ICLR'24 | 2023.06 | Github | - |
Teaching Large Language Models to Self-Debug <br> | ICLR'24 | 2023.06 | - | - |
<br> LEVER: Learning to Verify Language-to-Code Generation with Execution <br> | ICML'23 | 2023.02 | Github | - |
<br> Coder Reviewer Reranking for Code Generation <br> | ICML'23 | 2022.11 | Github | - |
<br> CodeT: Code Generation with Generated Tests <br> | ICLR'23 | 2022.07 | Github | - |
ย
๐ Awesome Code Benchmark & Evaluation Papers
ย
๐ Contributors
<a href="https://github.com/huybery"><img src="https://avatars.githubusercontent.com/u/13436140?v=4" width="50" /></a> <a href="https://github.com/Yangjiaxi"><img src="https://avatars.githubusercontent.com/u/6203054?v=4" width="50" /></a> <a href="https://github.com/GanjinZero"><img src="https://avatars.githubusercontent.com/u/19466330?v=4" width="50" /></a> <a href="https://github.com/TyDunn"><img src="https://avatars.githubusercontent.com/u/13314504?v=4" width="50" /></a> <a href="https://github.com/Hambaobao"><img src="https://avatars.githubusercontent.com/u/48345096?v=4" width="50" /></a>
This is an active repository and your contributions are always welcome! If you have any question about this opinionated list, do not hesitate to contact me huybery@gmail.com
.
ย
Cite as
@software{awesome-code-llm,
author = {Binyuan Hui, Lei Zhang},
title = {An awesome and curated list of best code-LLM for research},
howpublished = {\url{https://github.com/huybery/Awesome-Code-LLM}},
year = 2023,
}
ย
Acknowledgement
This project is inspired by Awesome-LLM.
ย