Home

Awesome

Leveraging Large Language Models in Code Question Answering: Baselines and Issues

This repository source code related to training, inference, and evaluation described in the article "Leveraging Large Language Models in Code Question Answering: Baselines and Issues".

The repository has two main folders: Training and Testing. Training folder contains the code to fine-tune StarŠ”oder and DeepSeek-Coder models, while Testing folder contains code to generate models predictions and evaluate them.

Links to models:

  1. StarCoder with Grammar Correction

  2. DeepSeek-Coder with Grammar Correction

  3. CodeT5+ for Summaries Generation

Links to datasets:

  1. Unified Dataset

  2. Unified Dataset with Grammatical Corrections

  3. Unified Dataset with Generated Summaries

  4. Testing Dataset Based on ClassEval Dataset

  5. High Quality Subset of Unified Dataset

If you have any question related to the code, send an email with your question to georgyandryuschenko@gmail.com. However, feel free to create a GitHub issue too.