Awesome
Andrej Karpathy's llama2.c in one file of pure C#.
llama2.c is a very simple implementation to run inference of models with a Llama2-like transformer-based LLM architecture.
This is a pure C# implementation of the same thing. It is optimized for speed and very simple to understand and modify.
Usage
Requires .net7 or higher.
- First put the stories15M.bin file in the same directory as the executable. You can download it from here
- Get tokenizer from here and put it in the same directory as the executable.
dotnet build -c Release
Generate a random story
.\bin\Release\net7.0\llama2.cs.exe stories15M.bin
Generate a random story with a given prompt
.\bin\Release\net7.0\llama2.cs.exe stories15M.bin -i "A long time ago a"
TODO
- Inference with Llama2 checkpoints
- Use high performance C# types from .net8?
- Add training functionality