Home

Awesome

<div align="center">

MGDebugger: Multi-Granularity LLM Debugger

License: MIT Python Version

</div>

Table of Contents

Introduction

MGDebugger is a hierarchical LLM code debugging method designed to isolate, identify, and resolve errors at various levels of granularity. Using a hierarchical bottom-up debugging approach, MGDebugger systematically progresses from individual subfunctions to the overall system, enabling precise error detection and correction.

With MGDebugger, developers can efficiently debug complex codes and functions by performing granular analysis, reducing debugging time, and improving the success rate of resolving complex issues.

<div align="center"> <img src="figures/overview_v1_page.jpg" alt="MGDebugger Overview" width="800"/> <p>MGDebugger System Architecture Overview</p> </div> <div align="center"> <img src="figures/subfunction_debug_page.jpg" alt="Subfunction Debugging" width="800"/> <p>Subfunction Debugging Module</p> </div>

Getting Started

Prerequisites

Before running MGDebugger, ensure your environment meets the following requirements:

Configuring the vLLM Server

To launch the vLLM server with the DeepSeek-Coder-V2-Lite-Instruct model, execute the following command:

python -m vllm.entrypoints.openai.api_server \
    --model deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct \
    --trust-remote-code \
    --dtype auto \
    --api-key token-abc123s \
    --port 18889

This will initialize the model and start the server on port 18889.

Usage

Running the Demo

We've prepared a demo code snippet to showcase MGDebugger's debugging capabilities. You can run the demo by executing the following command after starting the vLLM server:

python demo.py

Running Experiments

Once the vLLM server is up and running, start MGDebugger by executing:

python main.py

Tip: You can modify the MODEL and input_seeds parameters in the config.py file to test different models and input configurations.

Log Management

MGDebugger automatically stores all debugging and error logs in the output_data directory. You can review these logs to gain deeper insights into debugging details and performance analysis.

Performance

The table below highlights the performance of different methods compared to the baseline (No-Debugging) on the HumanEval and MBPP datasets using the DeepSeek-Coder-V2-Lite model.

MethodHumanEval Acc. (%)Δ Acc. (%)HumanEval RSR (%)MBPP Acc. (%)Δ Acc. (%)MBPP RSR (%)
No-Debugging76.8----67.2----
Simple Feedback82.3+5.523.769.4+2.26.7
Self-Edit82.9+6.126.371.2+4.012.2
LDB (Block)84.1+7.331.674.0+6.820.7
Self-Debugging (Expl.)87.2+10.444.773.4+6.218.9
Self-Debugging (Trace)86.0+9.239.572.6+5.316.5
Reflexion90.9+14.160.576.6+9.428.7
Our Approach94.5+17.776.380.0+12.839.0

Our approach achieved the highest accuracy on both HumanEval and MBPP datasets, with a remarkable improvement of +17.7% and +12.8% in accuracy over the baseline, respectively. The Repair Success Rate (RSR) was also significantly higher than other methods, demonstrating the effectiveness of our debugging strategy in fixing diverse code issues.

Contributing

We warmly welcome contributions to MGDebugger! We appreciate your feedback and look forward to building MGDebugger together with the community!