Awesome

Leveraging Large Language Models in Code Question Answering: Baselines and Issues

This repository source code related to training, inference, and evaluation described in the article "Leveraging Large Language Models in Code Question Answering: Baselines and Issues".

The repository has two main folders: Training and Testing. Training folder contains the code to fine-tune StarСoder and DeepSeek-Coder models, while Testing folder contains code to generate models predictions and evaluate them.

Links to models:

StarCoder with Grammar Correction
DeepSeek-Coder with Grammar Correction
CodeT5+ for Summaries Generation

Links to datasets:

Unified Dataset
Unified Dataset with Grammatical Corrections
Unified Dataset with Generated Summaries
Testing Dataset Based on ClassEval Dataset
High Quality Subset of Unified Dataset

If you have any question related to the code, send an email with your question to georgyandryuschenko@gmail.com. However, feel free to create a GitHub issue too.