Home

Awesome

<div align="center"> <h1>๐Ÿ‘จโ€๐Ÿ’ป Awesome Code LLM</h1> <a href="https://awesome.re"> <img src="https://awesome.re/badge.svg" alt="Awesome"> </a> <a href="https://img.shields.io/badge/PRs-Welcome-red"> <img src="https://img.shields.io/badge/PRs-Welcome-red" alt="PRs Welcome"> </a> <a href="https://img.shields.io/github/last-commit/huybery/Awesome-Code-LLM?color=green"> <img src="https://img.shields.io/github/last-commit/huybery/Awesome-Code-LLM?color=green" alt="Last Commit"> </a> </div>

ย 

๐Ÿ”† How to Contribute

Contributions are welcome! If you have any resources, tools, papers, or insights related to Code LLMs, feel free to submit a pull request. Let's work together to make this project better!

ย 

News

ย 

๐Ÿงต Table of Contents

ย 

๐Ÿš€ Top Code LLMs

Sort by HumanEval Pass@1
RankModelParamsHumanEvalMBPPSource
1o1-mini-2024-09-12-97.693.9paper
2o1-preview-2024-09-12-95.193.4paper
3Qwen2.5-Coder-32B-Instruct32B92.790.2github
4Claude-3.5-Sonnet-20241022-92.191.0paper
5GPT-4o-2024-08-06-92.186.8paper
6Qwen2.5-Coder-14B-Instruct14B89.686.2github
7Claude-3.5-Sonnet-20240620-89.087.6paper
8GPT-4o-mini-2024-07-18-87.886.0paper
9Qwen2.5-Coder-7B-Instruct7B88.483.5github
10DS-Coder-V2-Instruct21/236B85.489.4github
11Qwen2.5-Coder-3B-Instruct3B84.173.6github
12DS-Coder-V2-Lite-Instruct2.4/16B81.182.8github
13CodeQwen1.5-7B-Chat7B83.570.6github
14DeepSeek-Coder-33B-Instruct33B79.370.0github
15DeepSeek-Coder-6.7B-Instruct6.7B78.665.4github
16GPT-3.5-Turbo-76.270.8github
17CodeLlama-70B-Instruct70B72.077.8paper
18Qwen2.5-Coder-1.5B-Instruct1.5B70.769.2github
19StarCoder2-15B-Instruct-v0.115B67.778.0paper
20Qwen2.5-Coder-0.5B-Instruct0.5B61.652.4github
21Pangu-Coder215B61.6-paper
22WizardCoder-15B15B57.351.8paper
23CodeQwen1.5-7B7B51.861.8github
24CodeLlama-34B-Instruct34B48.261.1paper
25Code-Davinci-002-47.0-paper

ย 

๐Ÿ’ก Evaluation Toolkit:

ย 

๐Ÿš€ Awesome Code LLMs Leaderboard

LeaderboardDescription
Evalperf LeaderboardEvaluating LLMs for Efficient Code Generation.
Aider Code Editing LeaderboardMeasuring the LLMโ€™s coding ability, and whether it can write new code that integrates into existing code.
BigCodeBench LeaderboardBigCodeBench evaluates LLMs with practical and challenging programming tasks.
LiveCodeBench LeaderboardHolistic and Contamination Free Evaluation of Large Language Models for Code.
Big Code Models LeaderboardCompare performance of base multilingual code generation models on HumanEval benchmark and MultiPL-E.
BIRD LeaderboardBIRD contains over 12,751 unique question-SQL pairs, 95 big databases with a total size of 33.4 GB. It also covers more than 37 professional domains, such as blockchain, hockey, healthcare and education, etc.
CanAiCode LeaderboardCanAiCode Leaderboard
Coding LLMs LeaderboardCoding LLMs Leaderboard
CRUXEval LeaderboardCRUXEval is a benchmark complementary to HumanEval and MBPP measuring code reasoning, understanding, and execution capabilities!
EvalPlus LeaderboardEvalPlus evaluates AI Coders with rigorous tests.
InfiBench LeaderboardInfiBench is a comprehensive benchmark for code large language models evaluating model ability on answering freeform real-world questions in the code domain.
InterCode LeaderboardInterCode is a benchmark for evaluating language models on the interactive coding task. Given a natural language request, an agent is asked to interact with a software system (e.g., database, terminal) with code to resolve the issue.
Program Synthesis Models LeaderboardThey created this leaderboard to help researchers easily identify the best open-source model with an intuitive leadership quadrant graph. They evaluate the performance of open-source code models to rank them based on their capabilities and market adoption.
Spider LeaderboardSpider is a large-scale complex and cross-domain semantic parsing and text-to-SQL dataset annotated by 11 Yale students. The goal of the Spider challenge is to develop natural language interfaces to cross-domain databases.

ย 

๐Ÿ“š Awesome Code LLMs Papers

๐ŸŒŠ Awesome Code Pre-Training Papers

TitleVenueDateCodeResources
Star <br> OpenCoder: The Open Cookbook for Top-Tier Code Large Language Models <br>Preprint2024.11GithubHF
Star <br> Qwen2.5-Coder Technical Report <br>Preprint2024.09GithubHF
Star <br> DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models in Code Intelligence <br>Preprint2024.06GithubHF
Star <br> StarCoder 2 and The Stack v2: The Next Generation <br>Preprint2024.02GithubHF
Star <br> DeepSeek-Coder: When the Large Language Model Meets Programming -- The Rise of Code Intelligence <br>Preprint2024.01GithubHF
Star <br> Code Llama: Open Foundation Models for Code <br>Preprint2023.08GithubHF
Textbooks Are All You Need <br>Preprint2023.06-HF
Star <br> CodeT5+: Open Code Large Language Models for Code Understanding and Generation <br>Preprint2023.05GithubHF
Star <br> StarCoder: may the source be with you! <br>Preprint2023.05GithubHF
Star <br> CodeGen2: Lessons for Training LLMs on Programming and Natural Languages <br>ICLR232023.05GithubHF
Star <br> CodeGeeX: A Pre-Trained Model for Code Generation with Multilingual Evaluations on HumanEval-X <br>Preprint2023.03GithubHF
SantaCoder: don't reach for the stars! <br>Preprint2023.01-HF
Star <br> CodeGen: An Open Large Language Model for Code with Multi-Turn Program Synthesis <br>ICLR'232022.03GithubHF
Star <br> Evaluating Large Language Models Trained on Code <br>Preprint2021.07Github-

ย 

๐Ÿณ Awesome Code Instruction-Tuning Papers

TitleVenueDateCodeResources
Star <br> Magicoder: Source Code Is All You Need <br>ICML'242023.12GithubHF
Star <br> OctoPack: Instruction Tuning Code Large Language Models <br>ICLR'242023.08GithubHF
Star <br> WizardCoder: Empowering Code Large Language Models with Evol-Instruct <br>Preprint2023.07GithubHF
Star <br> Code Alpaca: An Instruction-following LLaMA Model trained on code generation instructions <br>Preprint2023.xxGithubHF

ย 

๐Ÿฌ Awesome Code Alignment Papers

TitleVenueDateCodeResources
ProSec: Fortifying Code LLMs with Proactive Security Alignment <br>Preprint2024.11--
PLUM: Preference Learning Plus Test Cases Yields Better Code Language Models <br>Preprint2024.06--
PanGu-Coder2: Boosting Large Language Models for Code with Ranking Feedback <br>Preprint2023.07--
Star <br> RLTF: Reinforcement Learning from Unit Test Feedback <br>Preprint2023.07Github-
Star <br> Execution-based Code Generation using Deep Reinforcement Learning <br>TMLR'232023.01Github-
Star <br> CodeRL: Mastering Code Generation through Pretrained Models and Deep Reinforcement Learning <br>NeurIPS'222022.07Github-

ย 

๐Ÿ‹ Awesome Code Prompting Papers

TitleVenueDateCodeResources
Star <br> From Code to Correctness: Closing the Last Mile of Code Generation with Hierarchical Debugging <br>Preprint2024.10Github-
Star <br> Hierarchical Context Pruning: Optimizing Real-World Code Completion with Repository-Level Pretrained Code LLMs <br>AAAI'252024.06Github-
Star <br> Debug like a Human: A Large Language Model Debugger via Verifying Runtime Execution Step-by-step <br>ACL'242024.02Github-
SelfEvolve: A Code Evolution Framework via Large Language Models <br>Preprint2023.06--
Star <br> Demystifying GPT Self-Repair for Code Generation <br>ICLR'242023.06Github-
Teaching Large Language Models to Self-Debug <br>ICLR'242023.06--
Star <br> LEVER: Learning to Verify Language-to-Code Generation with Execution <br>ICML'232023.02Github-
Star <br> Coder Reviewer Reranking for Code Generation <br>ICML'232022.11Github-
Star <br> CodeT: Code Generation with Generated Tests <br>ICLR'232022.07Github-

ย 

๐Ÿ™ Awesome Code Benchmark & Evaluation Papers

DatasetTitleVenueDateCodeResources
CodeArenaStar <br> Evaluating and Aligning CodeLLMs on Human Preference <br>Preprint2024.12GithubHF
FullStack BenchStar <br> FullStack Bench: Evaluating LLMs as Full Stack Coders <br>Preprint2024.12GithubHF Github
GitChameleonStar <br> GitChameleon: Unmasking the Version-Switching Capabilities of Code Generation Models <br>Preprint2024.11Github-
EvalperfStar <br> Evaluating Language Models for Efficient Code Generation <br>COLM'242024.08GithubHF
LiveCodeBenchStar <br> LiveCodeBench: Holistic and Contamination Free Evaluation of Large Language Models for Code <br>Preprint2024.03GithubHF
DevBenchStar <br> DevBench: A Comprehensive Benchmark for Software Development <br>Preprint2024.03Github-
SWE-benchStar <br> SWE-bench: Can Language Models Resolve Real-World GitHub Issues? <br>ICLR'242024.03GithubHF
CrossCodeEvalStar <br> CrossCodeEval: A Diverse and Multilingual Benchmark for Cross-File Code Completion <br>NeurIPS'232023.11Github-
RepoCoderStar <br> Repository-Level Code Completion Through Iterative Retrieval and Generation <br>EMNLP'232023.10Github-
LongCoderStar <br> LongCoder: A Long-Range Pre-trained Language Model for Code Completion <br>ICML'232023.10Github-
-Can ChatGPT replace StackOverflow? A Study on Robustness and Reliability of Large Language Model Code Generation <br>Preprint2023.08--
BioCoderStar <br> BioCoder: A Benchmark for Bioinformatics Code Generation with Large Language Models <br>ISMB'242023.08Github-
RepoBenchStar <br> RepoBench: Benchmarking Repository-Level Code Auto-Completion Systems <br>ICLR'242023.06GithubHF
EvalplusStar <br> Is Your Code Generated by ChatGPT Really Correct? Rigorous Evaluation of Large Language Models for Code Generation <br>NeurIPS'232023.05GithubHF
CoeditorStar <br> Coeditor: Leveraging Contextual Changes for Multi-round Code Auto-editing <br>ICLR'242023.05Github-
DS-1000Star <br> DS-1000: A Natural and Reliable Benchmark for Data Science Code Generation <br>ICML'232022.11GithubHF
MultiPL-EStar <br> MultiPL-E: A Scalable and Extensible Approach to Benchmarking Neural Code Generation <br>Preprint2022.08GithubHF
MBPPStar <br> Program Synthesis with Large Language Models <br>Preprint2021.08GithubHF
APPSStar <br> Measuring Coding Challenge Competence With APPS <br>NeurIPS'212021.05GithubHF

ย 

๐Ÿ™Œ Contributors

<a href="https://github.com/huybery"><img src="https://avatars.githubusercontent.com/u/13436140?v=4" width="50" /></a> <a href="https://github.com/Yangjiaxi"><img src="https://avatars.githubusercontent.com/u/6203054?v=4" width="50" /></a> <a href="https://github.com/GanjinZero"><img src="https://avatars.githubusercontent.com/u/19466330?v=4" width="50" /></a> <a href="https://github.com/TyDunn"><img src="https://avatars.githubusercontent.com/u/13314504?v=4" width="50" /></a> <a href="https://github.com/Hambaobao"><img src="https://avatars.githubusercontent.com/u/48345096?v=4" width="50" /></a>

This is an active repository and your contributions are always welcome! If you have any question about this opinionated list, do not hesitate to contact me huybery@gmail.com.

ย 

Cite as

@software{awesome-code-llm,
  author = {Binyuan Hui, Lei Zhang},
  title = {An awesome and curated list of best code-LLM for research},
  howpublished = {\url{https://github.com/huybery/Awesome-Code-LLM}},
  year = 2023,
}

ย 

Acknowledgement

This project is inspired by Awesome-LLM.

ย 

Star History

Star History Chart

โฌ† Back to ToC