

<div align=center>

AI Copilot with LLaMA.cpp

"VSCode AI coding assistant powered by self-hosted llama.cpp endpoint."


Get started


<img src="examples/chat_demo.gif" alt="chat with llama.cpp server"/>

code completion

<img src="examples/code_completion.gif" alt="code completion"/>

code generate

<img src="examples/code_generate_demo1.gif" alt="code generate"/>

code explain

<img src="examples/explain_code_demo1.gif" alt="explain code"/>

Quick start your model service


  1. Download llama.cpp binary release archive

  2. Unzip llama-bxxx-bin-win-cublas-cuxx.x.x-x64.zip to folder

  3. Download GGUF model file, for example: wizardcoder-python-13b-v1.0.Q4_K_M.gguf

  4. Execute server.exe startup command.

# only use cpu
D:\path_to_unzip_files\server.exe -m D:\path_to_model\wizardcoder-python-13b-v1.0.Q4_K_M.gguf -t 8 -c 1024
# use gpu
D:\path_to_unzip_files\server.exe -m D:\path_to_model\wizardcoder-python-13b-v1.0.Q4_K_M.gguf -t 8 -ngl 81 -c 1024

Linux or MacOS

Please compile the llama.cpp project by yourself, and follow the same startup steps.


All code in this repository is open source (Apache 2).

Quickstart: pnpm install && cd vscode && pnpm run dev to run a local build of the Cody VS Code extension.