Awesome
ReliableLM4Code
This repository extends from our recent work, "Pitfalls in Language Models for Code Intelligence: A Taxonomy and Survey" and "Large language models for software engineering: A systematic literature review". It includes necessary information for our research and a curated collection of LM4Code papers and other resources (datasets, tutorials, etc.). The focus is primarily on papers that use pre-trained models, especially large language models, to improve the reliability of language models in Software Engineering research.
For more details, please access this site
Modern language models (LMs) have been successfully employed in source code generation and understanding, leading to a significant increase in research focused on learning-based code intelligence, such as automated bug repair, and test case generation. Despite their great potential, language models for code intelligence (LM4Code) are susceptible to potential pitfalls, which hinder realistic performance and further impact their reliability and applicability in real-world deployment. Such challenges drive the need for a comprehensive understanding - not just identifying these issues but delving into their possible implications and existing solutions to build more reliable language models tailored to code intelligence. Based on a well-defined systematic research approach, we conducted an extensive literature review to uncover the pitfalls inherent in LM4Code. Finally, 67 primary studies from top-tier venues have been identified. After carefully examining these studies, we designed a taxonomy of pitfalls in LM4Code research and conducted a systematic study to summarize the issues, implications, current solutions, and challenges of different pitfalls for LM4Code systems. We developed a comprehensive classification scheme that dissects pitfalls across four crucial aspects: data collection and labeling, system design and learning, performance evaluation, and deployment and maintenance. Through this study, we aim to provide a roadmap for researchers and practitioners, facilitating their understanding and utilization of LM4Code in reliable and trustworthy ways.
Please feel free to send a pull request to add papers and relevant content that are not listed here. We uploaded our completed paper lists to Google Drive with detailed reviewed information.
Content
- About our survey
- What is LM4Code?
- Relevant Surveys and Tutorial
- Explanable LM4Code
- Top Researchers in LM4Code
- Relevant Venus
- LLMs in Securty
Papers
Data Collection and Labeling
Unbalanced Distribution
- Deep Learning Based Vulnerability Detection (2021), arxiv, S Chakraborty, R Krishna, Y Ding, et al. [pdf]
- Does data sampling improve deep learning-based vulnerability detection? Yeas! and Nays! (2023), ICSE, X Yang, et al. [pdf]
- On the Value of Oversampling for Deep Learning in Software Defect Prediction (2021), TSE, R Yedida, T Menzies. [pdf]
- Robust Learning of Deep Predictive Models from Noisy and Imbalanced Software Engineering Datasets (2022), ASE, Z Li, et al. [pdf]
- An empirical study of deep learning models for vulnerability detection (2023), arxiv, B Steenhoek, et al. [pdf]
Label Errors
- Robust Learning of Deep Predictive Models from Noisy and Imbalanced Software Engineering Datasets (2022), ASE, Z Li, et al. [pdf]
- XCode: Towards Cross-Language Code Representation with Large-Scale Pre-Training (2022), TOSEM, Z Lin, et al. [pdf]
- Understanding and Tackling Label Errors in Deep Learning-Based Vulnerability Detection (Experience Paper) (2023), ISSTA, X Nie, et al. [pdf]
Data Noise
- Slice-Based Code Change Representation Learning (2023), SANER, F Zhang, et al. [pdf]
- Are we building on the rock? on the importance of data preprocessing for code summarization (2022), FSE, L Shi, et al. [pdf]
- Neural-Machine-Translation-Based Commit Message Generation: How Far Are We? (2018), ASE, Z Liu, et al. [pdf]
System Design and Learning
Data Snooping
- AutoTransform: automated code transformation to support modern code review process (2022), ICSE, Thongtanunam, Patanamon, Chanathip Pornprasit, and Chakkrit Tantithamthavorn. [pdf]
- Can Neural Clone Detection Generalize to Unseen Functionalitiesƒ (2021), ASE, C Liu, et al. [pdf]
- CD-VulD: Cross-Domain Vulnerability Discovery Based on Deep Domain Adaptation (2020), TDSC, S Liu, et al. [pdf]
- Deep just-in-time defect prediction: how far are we? (2021), ISSTA, Z Zeng, et al. [pdf]
- Patching as translation: the data and the metaphor (2020), ASE, Y Ding, et al. [pdf]
- An empirical study of deep learning models for vulnerability detection (2023), ICSE, B Steenhoek, et al. [pdf]
- Keeping Pace with Ever-Increasing Data: Towards Continual Learning of Code Intelligence Models (2302), ICSE, S Gao, et al. [pdf]
- Revisiting Learning-based Commit Message Generation (2023), ICSE, J Dong, Y Lou, D Hao, et al. [pdf]
- Syntax and Domain Aware Model for Unsupervised Program Translation (2302), ICSE, F Liu, J Li, L Zhang. [pdf]
- How Effective Are Neural Networks for Fixing Security Vulnerabilities (2023), ISSTA, Y Wu, N Jiang, HV Pham, et al. [pdf]
- Towards More Realistic Evaluation for Neural Test Oracle Generation (2305), ISSTA, Z Liu, K Liu, X Xia, et al. [pdf]
- On the Evaluation of Neural Code Summarization (2022), ICSE, E Shi, Y Wang, L Du, et al. [pdf]
Spurious Correlations
- Deep Learning Based Vulnerability Detection: Are We There Yet? (2021), TSE, S Chakraborty, R Krishna, Y Ding, et al. [pdf]
- Diet code is healthy: simplifying programs for pre-trained models of code (2022), FSE, Z Zhang, H Zhang, B Shen, et al. [pdf]
- Explaining mispredictions of machine learning models using rule induction (2021), FSE, J Cito, I Dillig, S Kim, et al. [pdf]
- Interpreting Deep Learning-based Vulnerability Detector Predictions Based on Heuristic Searching (2021), TOSEM, D Zou, Y Zhu, S Xu, et al. [pdf]
- Thinking Like a Developer? Comparing the Attention of Humans with Neural Models of Code (2021), ASE, M Paltenghi, M Pradel. [pdf]
- Vulnerability detection with fine-grained interpretations (2021), FSE, Y Li, S Wang, TN Nguyen. [pdf]
- What do they capture? a structural analysis of pre-trained language models for source code (2022), ICSE, Y Wan, W Zhao, H Zhang, et al. [pdf]
- An empirical study of deep learning models for vulnerability detection (2023), ICSE, B Steenhoek, MM Rahman, R Jiles, et al. [pdf]
- Towards Efficient Fine-Tuning of Pre-trained Code Models: An Experimental Study and Beyond (2023), ISSTA, E Shi, Y Wang, H Zhang, et al. [pdf]
Inappropriate Model Design
- Deep Learning Based Vulnerability Detection: Are We There Yet? (2021), TSE, S Chakraborty, R Krishna, Y Ding, et al. [pdf]
- Enhancing DNN-Based Binary Code Function Search With Low-Cost Equivalence Checking (2022), TSE, H Wang, P Ma, Y Yuan, et al. [pdf]
- Improving automatic source code summarization via deep reinforcement learning (2018), ASE, Y Wan, Z Zhao, M Yang, et al.[pdf]
- Patching as translation: the data and the metaphor (2020), ASE, Y Ding, B Ray, P Devanbu, et al.[pdf]
- Reinforcement-Learning-Guided Source Code Summarization Using Hierarchical Attention (2020), TSE, W Wang, Y Zhang, Y Sui, et al. [pdf]
- XCode: Towards Cross-Language Code Representation with Large-Scale Pre-Training (2022), TOSEM, Z Lin, G Li, J Zhang, et al. [pdf]
- RepresentThemAll: A Universal Learning Representation of Bug Reports (2023), ICSE, S Fang, T Zhang, Y Tan, et al. [pdf]
- Template-based Neural Program Repair (2023), ICSE, X Meng, X Wang, H Zhang, et al. [pdf]
Performance Evaluation
Inappropriate Baseline
- Towards More Realistic Evaluation for Neural Test Oracle Generationr (2023), ARXIV, Z Liu, K Liu, X Xia, et al. [pdf]
Inappropriate Evaluation Dataset
- Deep Learning Based Program Generation From Requirements Text: Are We There Yet? (2020), TSE, H Liu, M Shen, J Zhu, et al. [pdf]
- Generating realistic vulnerabilities via neural code editing: an empirical study (2022), FSE, Y Nong, Y Ou, M Pradel, et al. [pdf]
Low Reproducibility
- An extensive study on pre-trained models for program understanding and generation (2022), ISSTA, Z Zeng, H Tan, H Zhang, et al. [pdf]
Inappropriate Performance Measures
- Deep Learning Based Vulnerability Detection: Are We There Yet? (2021), TSE, S Chakraborty, R Krishna, Y Ding, et al. [pdf]
- Improving automatic source code summarization via deep reinforcement learning (2018), ASE, Y Wan, Z Zhao, M Yang, et al. [pdf]
- Multi-task learning based pre-trained language model for code completion (2020), ASE, F Liu, G Li, Y Zhao, et al. [pdf]
- On the Value of Oversampling for Deep Learning in Software Defect Prediction (2021), TSE, R Yedida, T Menzies. [pdf]
- Patching as translation: the data and the metaphor (2020), ASE, Y Ding, B Ray, P Devanbu, et al. [pdf]
- Reinforcement-Learning-Guided Source Code Summarization Using Hierarchical Attention (2020), TSE, W Wang, Y Zhang, Y Sui, et al. [pdf]
- SynShine: Improved Fixing of Syntax Errors (2022), TSE, Ahmed T, Ledesma N R, Devanbu P. [pdf]
- An empirical study of deep learning models for vulnerability detection (2023), ICSE, B Steenhoek, MM Rahman, R Jiles, et al. [pdf]
- Revisiting Learning-based Commit Message Generation (2023), ICSE, J Dong, Y Lou, D Hao, et al. [pdf]
- Tare: Type-Aware Neural Program Repair (2023), ICSE, Q Zhu, Z Sun, W Zhang, et al. [pdf]
- How Effective Are Neural Networks for Fixing Security Vulnerabilities (2023), ISSTA, Y Wu, N Jiang, HV Pham, et al. [pdf]
- Towards More Realistic Evaluation for Neural Test Oracle Generation (2305), ISSTA, Z Liu, K Liu, X Xia, et al. [pdf]
- GitHub Copilot AI pair programmer: Asset or Liability? (2023), JSS, AM Dakhel, V Majdinasab, A Nikanjam, et al. [pdf]
Deployment and Maintainance
Real-World Constraints
- Examining Zero-Shot Vulnerability Repair with Large Language Models (2023), S&P, H Pearce, B Tan, B Ahmad, et al. [pdf]
- A Performance-Sensitive Malware Detection System Using Deep Learning on Mobile Devices (2020), TIFS, R Feng, S Chen, X Xie, et al. [pdf]
- Diet code is healthy: simplifying programs for pre-trained models of code (2022), FSE, Z Zhang, H Zhang, B Shen, et al.[pdf]
- When Code Completion Fails: A Case Study on Real-World Completions (2019), ICSE, VJ Hellendoorn, S Proksch, HC Gall, et al. [pdf]
- Lost at C: A User Study on the Security Implications of Large Language Model Code Assistants (2023), arxiv, G Sandoval, H Pearce, T Nys, et al. [pdf]
- Grounded Copilot: How Programmers Interact with Code-Generating Models (2023), OOPSLA1, S Barke, MB James, N Polikarpova. [pdf]
- LLaMA-Reviewer: Advancing Code Review Automation with Large Language Models through Parameter-Efficient Fine-Tuning (2308), arxiv, J Lu, L Yu, X Li, et al.[pdf]
- Compressing Pre-trained Models of Code into 3 MB (2022), ASE, J Shi, Z Yang, B Xu, et al.[pdf]
Attack Threats
- You Autocomplete Me: Poisoning Vulnerabilities in Neural Code Completion (2021), USENIX Security, R Schuster, C Song, E Tromer, et al. [pdf]
- Adversarial Robustness of Deep Code Comment Generation (2022), TOSEM, Y Zhou, X Zhang, J Shen, et al. [pdf]
- An extensive study on pre-trained models for program understanding and generation (2022), ISSTA, Z Zeng, H Tan, H Zhang, et al. [pdf]
- Generating Adversarial Examples for Holding Robustness of Source Code Processing Models (2020), AAAI, H Zhang, Z Li, G Li, et al. [pdf]
- Semantic Robustness of Models of Source Code (2020), SANER, G Ramakrishnan, J Henkel, Z Wang, et al. [pdf]
- You see what I want you to see: poisoning vulnerabilities in neural code search (2022), FSE, Y Wan, S Zhang, H Zhang, et al. [pdf]
- Contrabert: Enhancing code pre-trained models via contrastive learning (2023), ICSE, S Liu, B Wu, X Xie, et al. [pdf]
- On the robustness of code generation techniques: An empirical study on github copilot (2023), ICSE, A Mastropaolo, L Pascarella, E Guglielmi, et al. [pdf]
- Two sides of the same coin: Exploiting the impact of identifiers in neural code comprehension (2023), ICSE, S Gao, C Gao, C Wang, et al. [pdf]
- Multi-target Backdoor Attacks for Code Pre-trained Models (2023), ACL, Y Li, S Liu, K Chen, et al. [pdf]
- Backdooring Neural Code Search (2023), ACL, W Sun, Y Chen, G Tao, et al. [pdf]
- ReCode: Robustness Evaluation of Code Generation Models (2022), ACL, S Wang, Z Li, H Qian, et al. [pdf]
- Natural Attack for Pre-trained Models of Code (2022), ICSE, Z Yang, J Shi, J He, et al. [pdf]
- Coprotector: Protect open-source code against unauthorized training usage with data poisoning (2022), WWW, Z Sun, X Du, F Song, et al. [pdf]
- On the Security Vulnerabilities of Text-to-SQL Models (2211), ISSRE, X Peng, Y Zhang, J Yang, et al. [pdf]
Security Concerns in Generated Code
- Asleep at the Keyboard? Assessing the Security of GitHub Copilot's Code Contributions (2022), S&P, H Pearce, B Ahmad, B Tan, et al. [pdf]
- Automated repair of programs from large language models (2023), ICSE, Z Fan, X Gao, M Mirchev, et al. [pdf]
- Cctest: Testing and repairing code completion systems (2023), ICSE, Z Li, C Wang, Z Liu, et al. [pdf]
- Analyzing Leakage of Personally Identifiable Information in Language Models (2023), S&P, N Lukas, A Salem, R Sim, et al. [pdf]
- CodexLeaks: Privacy Leaks from Code Generation Language Models in GitHub Copilot (2023), USENIX Security, L Niu, S Mirza, Z Maradni, et al. [pdf]
Language Models for Code Intelligence
Decoder-only Models
GPT-1
- Release Date: 2018-06
- Institute: OpenAI
- Paper: Improving Language Understanding by Generative Pre-Training
GPT-2
- Release Date: 2019-02
- Institute: OpenAI
- Paper: Language Models are Unsupervised Multitask Learners
GPT-3
- Release Date: 2020-05
- Institute: OpenAI
- Paper: Language models are few-shot learners
Codex
- Release Date: 2021-08
- Institute: OpenAI
- Paper: Evaluating Large Language Models Trained on Code
GPT-NeoX
- Release Date: 2022-04
- Access: ckpt
- Paper: GPT-NeoX-20B: An Open-Source Autoregressive Language Model
GPT-Neo
- Release Date: 2021-03
- Source: Github
CodeGen
- Release Date: 2022/03
- Paper: CodeGen: An Open Large Language Model for Code with Multi-Turn Program Synthesis
InstructGPT
- Release Date: 2022/01
- Paper: Training language models to follow instructions with human feedback
CodeGeeX
- Title: CodeGeeX: A Pre-Trained Model for Code Generation with Multilingual Evaluations on HumanEval-X
- Year: 2023
- Paper: Link
GPT-J
- Release Date: 2023/06
- Access: GPT-J-6B, GPT4All-J
- Paper: GPT-J-6B: 6B JAX-Based Transformer
LLaMA
- Release Date: 2023-02
- Institute: Meta
- Paper: LLaMA: Open and Efficient Foundation Language Models
ChatGPT
StableLM-Alpha
- Release Date: 2023/04
- Access: StableLM-Alpha
- Paper: Stability AI Launches the First of its StableLM Suite of Language Models
InCoder
- Paper: "InCoder: A Generative Model for Code Infilling and Synthesis"
- Authors: Daniel Fried et al.
- Release Date: 2023
- Paper: Link
GPT-4
- Release Date: 2023-03
- Institute: OpenAI
- Paper: GPT-4 Technical Report
WizardCoder
- Access: [WizardCoder](https://github.com/nlpxucan/WizardLM
- Release Date: 2023
- Paper: WizardCoder: Empowering Code Large Language Models with Evol-Instruct
PanGu-Coder
- Part of: PanGu-α
- Release Date: 2020
- Paper: "PanGu-α: Large-scale Autoregressive Pretrained Chinese Language Models with Auto-parallel Computation"
OPT
- Release Date: 2022-05
- Access: api, ckpt
- Paper: OPT: Open Pre-trained Transformer Language Models
StarCoder
- Release Date: 2023/05
- Access: starcoder
- Papers: StarCoder: A State-of-the-Art LLM for Code, StarCoder: May the source be with you!
SantaCoder
- Release Date: 2023/01
- Access: santacoder
- Paper: SantaCoder: don't reach for the stars!
PaLM
- Release Date: 2022-04
- Institute: Google
- Paper: PaLM: Scaling Language Modeling with Pathways
Vicuna
- Release Date: 2023/03
- Blog: Link
Flan-UL2
- Release Date: 2023-03
- Institute: Google
- Blog: Flan-UL2 Blog
CPM-Bee
- Release Date: 2022-10
- Institute: Baidu
- Paper: CPM: A Large-scale Generative Chinese Pre-trained Language Model
MT-NLG
- Release Date: 2022-01
- Institute: Microsoft
- Paper: Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model
GLM
- Release Date: 2022-10
- Institute: Tsinghua University
- Paper: GLM-130B: AN OPEN BILINGUAL PRE-TRAINED MODEL
YaLM
- Release Date: 2022-06
- Institute: Yandex
- Blog: YaLM Blog
Alpaca
- Release Date: 2023-03
- Institute: Stanford University
- Access: Alpaca GitHub
RWKV-4
- Release Date: 2022-09
- Institute: Independent (BlinkDL)
- Access: RWKV-4 GitHub
Sparrow
- Release Date: 2022-09
- Institute: DeepMind
- Paper: Improving alignment of dialogue agents via targeted human judgements
Falcon
- Release Date: 2023-05
- Institute: Technology Innovation Institute (TII)
- Access: Falcon Homepage
Code Llama
- Release Date: 2023
- Institute: Meta (Facebook)
- Paper: Code Llama: Open Foundation Models for Code
RedPajama-INCITE
- Release Date: Not specified
- Blog: RedPajama-INCITE Blog
DeciCoder-1B
- Release Date: 2023-08
- Institute: Deci AI
- Blog: DeciCoder Blog
OpenLLaMA
- Release Date: 2023-05
- Institute: Not specified
- Access: OpenLLaMA Access
CodeGPT
- Release Date: 2021
- Paper: CodeXGLUE: A Machine Learning Benchmark Dataset for Code Understanding and Generation
Encoder-only Models
BERT
- Release Date: 2018-10
- Institute: Google
- Paper: BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
ALBERT
- Release Date: 2019
- Paper: ALBERT: A Lite BERT for Self-supervised Learning of Language Representations
RoBERTa
- Release Date: 2019
- Paper: RoBERTa: A Robustly Optimized BERT Pretraining Approach
CodeBERT
- Release Date: 2020-04
- Institute: Microsoft
- Paper: CodeBERT: A Pre-Trained Model for Programming and Natural Languages
GraphCodeBERT
- Release Date: 2022/03
- Access: GraphCodeBERT
- Paper: GraphCodeBERT: Pre-training Code Representations with Data Flow
Encoder-decoder Models
AlphaCode
- Release Date: 2022/02
- Access: AlphaCode
- Institute: DeepMind
T5
- Release Date: 2019
- Paper: Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
- Checkpoint: Link
CodeT5
- Release Date: 2021
- Access: CodeT5
- Paper: CodeT5: Identifier-aware Unified Pre-trained Encoder-Decoder Models for Code Understanding and Generation
CodeT5+
- Release Date: 2023/05
- Access: CodeT5+
- Paper: CodeT5+: Open Code Large Language Models for Code Understanding and Generation
UnixCoder
- Release Date: 2022
- Access: UniXcoder on Hugging Face
- Paper: UniXcoder: Unified Cross-Modal Pre-training for Code Representation
PLBART
- Release Date: 2021
- Paper: Unified Pre-training for Program Understanding and Generation
CodeReviewer
- Release Date: 2022
- Access: CodeReviewer
- Paper: Automating Code Review Activities by Large-Scale Pre-training
Relevant Surveys on LM4Code
- Large Language Models for Software Engineering: Survey and Open Problems, 2023, paper
- Large Language Models for Software Engineering: A Systematic Literature Review, 2023, paper
- A Survey of Large Language Models for Code: Evolution, Benchmarking, and Future Trends, 2023, paper
- Unifying the Perspectives of NLP and Software Engineering: A Survey on Language Models for Code, 2023, paper
- Software testing with large language model: Survey, landscape, and vision, 2023, paper
- Pitfalls in Language Models for Code Intelligence: A Taxonomy and Survey, 2023, paper
- Generative Artificial Intelligence for Software Engineering--A Research Agenda, 2023, paper
- A Survey on Large Language Model (LLM) Security and Privacy: The Good, the Bad, and the Ugly, 2023, paper
- Trustworthy and Synergistic Artificial Intelligence for Software Engineering: Vision and Roadmaps, 2023, paper
- Large language models meet NL2Code: A survey, 2023, paper
- A Survey on Pretrained Language Models for Neural Code Intelligence, 2022, paper
General Surveys on AI4SE
- A systematic literature review on the use of deep learning in software engineering research, TOSEM 2022, paper
- A survey on deep learning for software engineering, CSUR 2022, paper
- Software engineering for AI-based systems: a survey, TOSEM 2021, paper
- Machine/deep learning for software engineering: A systematic literature review, TSE 2022, paper
- Machine Learning Applied to Software Testing: A Systematic Mapping Study, 2019, paper
- A survey of machine learning for big code and naturalness, CSUR 2018, paper
General Surveys on LLM
- Large Language Models: A Comprehensive Survey of Applications, Challenges, Limitations, and Future Prospects, 2023, paper
- A survey of large language models, 2023, paper
- A Survey on Evaluation of Large Language Models, 2023, paper
- Recent advances in natural language processing via large pre-trained language models: A survey, CSUR 2023, paper
- A Survey of GPT-3 Family Large Language Models Including ChatGPT and GPT-4, 2023, paper
- Challenges and Applications of Large Language Models: A Survey, 2023, paper
- Harnessing the power of llms in practice: A survey on chatgpt and beyond, 2023, paper
- A Comprehensive Survey of AI-Generated Content (AIGC): A History of Generative AI from GAN to ChatGPT, 2023, paper
Repositories and Resources for LM4Code
- LLM4SE: Large Language Models for Software Engineering
- Repository
- This repository is associated with prominent software engineering conferences like ICSE, FSE, and ASE.
- Awesome-Code-LLM
- Repository
- This is the repo for one survey - a comprehensive review of LLM researches for code. Works in each category are ordered chronologically. A curated list of language modeling researches for code and related datasets.
- awesome-ai4code-papers
- Repository
- A collection of recent papers, benchmarks and datasets of AI4Code domain.
- ml4code
- Repository
- Research on machine learning for source code.
- awesome-machine-learning-on-source-code
- Repository
- Cool links & research papers related to Machine Learning applied to source code (MLonCode)
- saltudelft/ml4se
- Repository
- A curated list of papers, theses, datasets, and tools related to the application of Machine Learning for Software Engineering
- CUHK-ARISE/ml4code-dataset
- Repository
- A collection of datasets for machine learning for big code
Repositories and Resources for LLM
- Awesome-LLM4Tool: A Curated List of Resources for LLM Tools
- Repository
- Offers a curated list of papers, repositories, tutorials, and resources related to large language models for tools.
- LLMsPracticalGuide: A Curated List of Practical Resources
- Repository
- It includes an evolutionary tree of modern Large Language Models to trace the development over the years
- Hannibal046/Awesome-LLM
- Repository
- Awesome-LLM: a curated list of Large Language Model
- awesome-decentralized-llm
- Repository
- Collection of LLM resources that can be used to build products you can "own" or to perform reproducible research.
- RUCAIBox/LLMSurvey
- Repository
- The official GitHub page for the survey paper "A Survey of Large Language Models".
- tensorchord/Awesome-LLMOps
- Repository
- An awesome & curated list of best LLMOps tools for developers
- luban-agi/Awesome-Domain-LLM
- Repository
- A curated list of domain-specific large language models in Chinese
- underlines/awesome-ml
- Repository
- Curated list of useful LLM / Analytics / Datascience resources
Benchmarks
Bug Repair
Defects4J
- Release year: 2014
- Paper: "Defects4J: A Database of Existing Faults to Enable Controlled Testing Studies for Java Programs"
ManyBugs/IntroClass
- Release year: 2015
- Paper: "The ManyBugs and IntroClass Benchmarks for Automated Repair of C Programs"
BugAID
- Release year: 2016
- Paper: "Discovering Bug Patterns in JavaScript"
CoCoNut
- Release year: 2020
- Paper: "CoCoNuT: combining context-aware neural translation models using ensemble for program repair"
QuixBugs
- Release year: 2017
- Paper: "QuixBugs: a multi-lingual program repair benchmark set based on the quixey challenge"
Bugs.jar
- Release year: 2018
- Paper: "Bugs.jar: a large-scale, diverse dataset of real-world Java bugs"
BugsInPy
- Release year: 2020
- Paper: "BugsInPy: A Database of Existing Bugs in Python Programs to Enable Controlled Testing and Debugging Studies"
DeepFix
- Release year: 2017
- Paper: "DeepFix: Fixing Common C Language Errors by Deep Learning"
Code Generation/Synthesis
CONCODE
- Release year: 2018
- Paper: "Mapping Language to Code in Programmatic Context"
HumanEval
- Release year: 2021
- Paper: "Evaluating Large Language Models Trained on Code"
MBPP/MathQA-Python
- Release year: 2021
- Paper: "Program Synthesis with Large Language Models"
Code Sumarization
CODE-NN
- Release year: 2016
- Paper: "Summarizing Source Code using a Neural Attention Model"
TL-CodeSum
- Release year: 2018
- Paper: "Summarizing Source Code with Transferred API Knowledge"
CodeSearchNet
- Release year: 2019
- Paper: "CodeSearchNet Challenge: Evaluating the State of Semantic Code Search"
Cites
If you find this repository useful, please cite our survey paper:
@article{she2023pitfalls,
title={Pitfalls in Language Models for Code Intelligence: A Taxonomy and Survey},
author={She, Xinyu and Liu, Yue and Zhao, Yanjie and He, Yiling and Li, Li and Tantithamthavorn, Chakkrit and Qin, Zhan and Wang, Haoyu},
journal={arXiv preprint arXiv:2310.17903},
year={2023}
}
@article{hou2023large,
title={Large language models for software engineering: A systematic literature review},
author={Hou, Xinyi and Zhao, Yanjie and Liu, Yue and Yang, Zhou and Wang, Kailong and Li, Li and Luo, Xiapu and Lo, David and Grundy, John and Wang, Haoyu},
journal={arXiv preprint arXiv:2308.10620},
year={2023}
}